[SPARK-36749] The count result of the dimension table filed changes as `exector.memory` changes. - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 2.1.3
Fix Version/s: None
Component/s: Input/Output
Labels:
None
Environment:

Hide

hadoop version is:

2.7.5

spark version is:

2.1.3

job default parameters:

spark.driver.cores=1

spark.driver.memory=512m

spark.executor.instances=1

spark.executor.cores=1

spark.executor.memory=512m

Show
hadoop version is: 2.7.5 spark version is: 2.1.3 job default parameters: spark.driver.cores=1 spark.driver.memory=512m spark.executor.instances=1 spark.executor.cores=1 spark.executor.memory=512m

Language:
- Java

Description

hi~, every one!

Here‘s a very strange questions!!!

The meaning of this sql is count the number of the specified columns in each table after joining the table。as follows:

SELECT cast(COUNT(DISTINCT tps.prod_siginst_id) AS STRING) AS siginst_cnt,
cast(COUNT(DISTINCT qpl.list_id) AS STRING) AS list_cnt,
cast(count(DISTINCT if(tb.brand_source=1,tps.prod_siginst_id,NULL)) AS STRING) AS domestic_siginst_cnt,
cast(count(DISTINCT if(tb.brand_source=2,tps.prod_siginst_id,NULL)) AS STRING) AS import_siginst_cnt,
cast(count(DISTINCT if(qpl.list_name NOT LIKE'%un_normal%',tps.prod_siginst_id,NULL)) AS STRING) AS standard_cnt,
cast(count(DISTINCT if(qpl.list_name LIKE'%un_normal%',tps.prod_siginst_id,NULL)) AS STRING) AS nostandard_cnt
FROM tableA tbi
LEFT JOIN tableB tps ON tbi.prod_inst_id=tps.prod_inst_id
LEFT JOIN tableC qpl ON tbi.prod_type_id=qpl.list_id
LEFT JOIN tableD ON tps.brand_id=tb.brand_id
WHERE tbi.prod_status=1
AND tbi.prod_sell_status=1
AND tb.recommend_flag=1;

and the phenomenon of the question is if i add memory for executor, the count result of the tableC field(list_id,list_name) will changes as well. until the executor‘s memory is big enough, the result doesn't change.

TableC is a dimensional table and the amount of data is fixed.

In my opinions, this job should failed rather than output an incorrect count result if executor is insufficient memory.

Could you please help me check whether this is a bug of spark itself or something wrong with my sql writing?

here is log of this job.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

corrent_result.log
14/Sep/21 03:17
113 kB
LanYang
wrong_result.log
14/Sep/21 03:17
114 kB
LanYang

Activity

People

Assignee:: Unassigned

Reporter:: LanYang

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 14/Sep/21 03:17

Updated:: 12/Dec/22 18:11