Details
-
New Feature
-
Status: Resolved
-
Major
-
Resolution: Duplicate
-
None
-
None
-
None
-
None
Description
Build composable approximate distinct UDAFs into hive.
This is useful for approximate queries, particularly for collapsing partial nDV values whenever a partition is added.
hive> select approx_distinct(ss_item_sk), approx_distinct(ss_quantity) from tpcds_orc_10000.store_sales; OK 403760 100 Time taken: 238.258 seconds, Fetched: 1 row(s)
Prototype hive UDAF/UDFs at https://github.com/t3rmin4t0r/hive-hll-udf/
Uses prasanth_j's fast HLL++ impl for the horsepower.
Attachments
Issue Links
- is superceded by
-
HIVE-20490 UDAF: Add an 'approx_distinct' to Hive
- Closed
- relates to
-
HIVE-8397 Approximated cardinality with HyperLogLog UDAF
- Patch Available