Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-7402

add `approx_distinct` & composable nDV UDAFs

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • None
    • None
    • None
    • None

    Description

      Build composable approximate distinct UDAFs into hive.

      This is useful for approximate queries, particularly for collapsing partial nDV values whenever a partition is added.

      hive> select approx_distinct(ss_item_sk), approx_distinct(ss_quantity)  from tpcds_orc_10000.store_sales;
      
      OK
      403760  100
      Time taken: 238.258 seconds, Fetched: 1 row(s)
      

      Prototype hive UDAF/UDFs at https://github.com/t3rmin4t0r/hive-hll-udf/

      Uses prasanth_j's fast HLL++ impl for the horsepower.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              gopalv Gopal Vijayaraghavan
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: