Details
Description
hi~, every one!
Here‘s a very strange questions!!!
The meaning of this sql is count the number of the specified columns in each table after joining the table。as follows:
SELECT cast(COUNT(DISTINCT tps.prod_siginst_id) AS STRING) AS siginst_cnt,
cast(COUNT(DISTINCT qpl.list_id) AS STRING) AS list_cnt,
cast(count(DISTINCT if(tb.brand_source=1,tps.prod_siginst_id,NULL)) AS STRING) AS domestic_siginst_cnt,
cast(count(DISTINCT if(tb.brand_source=2,tps.prod_siginst_id,NULL)) AS STRING) AS import_siginst_cnt,
cast(count(DISTINCT if(qpl.list_name NOT LIKE'%un_normal%',tps.prod_siginst_id,NULL)) AS STRING) AS standard_cnt,
cast(count(DISTINCT if(qpl.list_name LIKE'%un_normal%',tps.prod_siginst_id,NULL)) AS STRING) AS nostandard_cnt
FROM tableA tbi
LEFT JOIN tableB tps ON tbi.prod_inst_id=tps.prod_inst_id
LEFT JOIN tableC qpl ON tbi.prod_type_id=qpl.list_id
LEFT JOIN tableD ON tps.brand_id=tb.brand_id
WHERE tbi.prod_status=1
AND tbi.prod_sell_status=1
AND tb.recommend_flag=1;
and the phenomenon of the question is if i add memory for executor, the count result of the tableC field(list_id,list_name) will changes as well. until the executor‘s memory is big enough, the result doesn't change.
TableC is a dimensional table and the amount of data is fixed.
In my opinions, this job should failed rather than output an incorrect count result if executor is insufficient memory.
Could you please help me check whether this is a bug of spark itself or something wrong with my sql writing?
here is log of this job.