Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-25734

Wrongly-typed constant in case expression leads to incorrect empty result

    XMLWordPrintableJSON

Details

    Description

       

      The type of constants in case expressions should be inferred, if possible, by the "surrounding" input reference columns, if any.

      Consider the following table and query: 

      create external table test_case (row_seq smallint, row_desc string) stored as parquet;
      insert into test_case values (1, 'a');
      insert into test_case values (2, 'aa');
      insert into test_case values (6, 'aaaaaa');
      
      with base_t as (select row_seq, row_desc,
        case row_seq
          when 1 then '34'
          when 6 then '35'
          when 2 then '36'
        end as zb from test_case where row_seq in (1,2,6))
      select row_seq, row_desc, zb from base_t where zb <> '34';

      The aforementioned query fails by returning an empty results, while "1 a 34" is expected.

       

      To understand the root cause, let's consider the debug input and output of some related CBO rules which are triggered during the evaluation of the query: 

       

      --$0 is the column 'row_seq'
      1. HiveReduceExpressionsRule
      Input: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), <>(CASE(=($0, 1:INTEGER), '34':VARCHAR, =($0, 6:INTEGER), '35':VARCHAR, =($0, 2:INTEGER), '36':VARCHAR, null:VARCHAR), '34':CHAR(2)))
      Output: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), OR(=($0, 6:INTEGER), =($0, 2:INTEGER)), IS NOT TRUE(=($0, 1:INTEGER)))
      2. HivePointLookupOptimizerRule.RexTransformIntoInClause
      Input: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), OR(=($0, 6:INTEGER), =($0, 2:INTEGER)), IS NOT TRUE(=($0, 1:INTEGER)))
      Output: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), IN($0, 6:INTEGER, 2:INTEGER), IS NOT TRUE(=($0, 1:INTEGER)))
      3. HivePointLookupOptimizerRule.RexMergeInClause
      Input: AND(IN($0, 1:SMALLINT, 2:SMALLINT, 6:SMALLINT), IN($0, 6:INTEGER, 2:INTEGER), IS NOT TRUE(=($0, 1:INTEGER)))
      Output: false

      In the first part, we can see that the constants are correctly typed as "SMALLINT" in the first part of the "AND" operand, while they are typed as "INTEGER" for the "CASE" expression, despite the input reference "$0" being available for inferring a more precise type.

      This type difference makes "HivePointLookupOptimizerRule.RexMergeInClause" missing the commonality between the two "IN" expressions, whose intersection is considered empty, hence the empty result.

      Providing a more refined type inference for "case" expressions should fix the issue.

      Attachments

        Issue Links

          Activity

            People

              asolimando Alessandro Solimando
              asolimando Alessandro Solimando
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h
                  1h