Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-22893

Enhance data size estimation for fields computed by UDFs

    XMLWordPrintableJSON

Details

    Description

      Right now if we have columnstat on a column ; we use that to estimate things about the column; - however if an UDF is executed on a column ; the resulting column is treated as unknown thing and defaults are assumed.

      An improvement could be to give wide estimation(s) in case of frequently used udf.

      For example; consider substr(c,1,1) ; no matter what the input; the output is at most a 1 long string

      Attachments

        1. HIVE-22893.14.patch
          409 kB
          Zoltan Haindrich
        2. HIVE-22893.13.patch
          409 kB
          Zoltan Haindrich
        3. HIVE-22893.12.patch
          408 kB
          Zoltan Haindrich
        4. HIVE-22893.11.patch
          427 kB
          Zoltan Haindrich
        5. HIVE-22893.10.patch
          421 kB
          Zoltan Haindrich
        6. HIVE-22893.09.patch
          421 kB
          Zoltan Haindrich
        7. HIVE-22893.08.patch
          421 kB
          Zoltan Haindrich
        8. HIVE-22893.07.patch
          287 kB
          Zoltan Haindrich
        9. HIVE-22893.06.patch
          292 kB
          Zoltan Haindrich
        10. HIVE-22893.05.patch
          569 kB
          Zoltan Haindrich
        11. HIVE-22893.04.patch
          276 kB
          Zoltan Haindrich
        12. HIVE-22893.03.patch
          549 kB
          Zoltan Haindrich
        13. HIVE-22893.02.patch
          549 kB
          Zoltan Haindrich
        14. HIVE-22893.01.patch
          37 kB
          Zoltan Haindrich

        Issue Links

          Activity

            People

              kgyrtkirk Zoltan Haindrich
              kgyrtkirk Zoltan Haindrich
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h
                  2h