Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-25588

Hive 2.3.3 Fetch Task threshold not respected

    XMLWordPrintableJSON

Details

    Description

      So it seems that "hive.fetch.task.conversion.threshold" is not respected in Hive 2.3.3, and basically it will always do a Fetch Task, irrelevant of the input size, as long as the conditions are met for either "more" or "minimal" setting of "hive.fetch.task.conversion".

      Apologies if this has been reported already, but I could not find any issues which mention this specifically.

      The way to reproduce is to set "hive.fetch.task.conversion.threshold=1", which to my understanding should basically always trigger an MR/Tez job, but it does not, and instead does a fetch task.

      Tested on various tables from dozens of GB in size to dozens of TBs  in size with hundreds and thousands partitions, in ORC and Parquet format. Example table size from statistics:

      Table Parameters: NULL NULL
        EXTERNAL TRUE
        numFiles 234258
        numPartitions 171898
        numRows 1719836838331
        rawDataSize 515766839727247
        totalSize 189367471403333  

      Please let me know if any additional information is required.

      Attachments

        Activity

          People

            Unassigned Unassigned
            campi01 Nedzad Campara
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: