Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.3.3
-
None
Description
So it seems that "hive.fetch.task.conversion.threshold" is not respected in Hive 2.3.3, and basically it will always do a Fetch Task, irrelevant of the input size, as long as the conditions are met for either "more" or "minimal" setting of "hive.fetch.task.conversion".
Apologies if this has been reported already, but I could not find any issues which mention this specifically.
The way to reproduce is to set "hive.fetch.task.conversion.threshold=1", which to my understanding should basically always trigger an MR/Tez job, but it does not, and instead does a fetch task.
Tested on various tables from dozens of GB in size to dozens of TBs in size with hundreds and thousands partitions, in ORC and Parquet format. Example table size from statistics:
Table Parameters: | NULL | NULL | |
EXTERNAL | TRUE | ||
numFiles | 234258 | ||
numPartitions | 171898 | ||
numRows | 1719836838331 | ||
rawDataSize | 515766839727247 | ||
totalSize | 189367471403333 |
Please let me know if any additional information is required.