Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-21074

Hive bucketed table query pruning does not work for IS NOT NULL condition

    XMLWordPrintableJSON

Details

    Description

      The current version of bucket pruning skips all the predicates when it detects that one of the predicates is a compound type (e.g. NOT(IS_NULL) ) when evaluating AND logical operators.

      This logic is faulty since as long as one of the AND operators is a bucketed column (col = literal), the literal value of that col should be considered in the bucket pruning optimization no matter what. For example:

      SELECT * FROM tbl WHERE bucketed_col = 1 AND (some_compound_expr)

      Then the the value '1' should be considered for pruning in the query plan. This limitation has manifested into a simpler case where a table that I am trying to optimized using bucketing technique is not effective when IS NOT NULL is used. Since IS NOT NULL is parsed into NOT(IS_NULL) (a compound expression), the pruning phase is completed skipped causing unnecessary tasks to be spawned. For instance:

      SELECT * FROM tbl WHERE bucketed_col = 1 AND some_other_col IS NOT NULL

      Will not trigger bucket pruning logic and perform a full table scan.

      Attachments

        1. HIVE-21074.patch
          6 kB
          Thai Bui

        Issue Links

          Activity

            People

              szita Ádám Szita
              thaibui Thai Bui
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m