Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-27291

Constant reduction in CBO does not work for UNIX_TIMESTAMP

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 4.0.0-alpha-2
    • None
    • CBO
    • None

    Description

      UNIX_TIMESTAMP function always returns the same output given the same input for the duration of the query. In Hive terminology, this function is a runtimeConstant.

      Such functions can be computed statically (reduced) at compile time and this happens successfully for the vast majority of them with the most relevant example being CURRENT_TIMESTAMP().

      However, constant reduction does not work for UNIX_TIMESTAMP in CBO:

      EXPLAIN CBO SELECT unix_timestamp();
      
      HiveProject(_o__c0=[UNIX_TIMESTAMP()])
        HiveTableScan(table=[[_dummy_database, _dummy_table]], table:alias=[_dummy_table])
      
      EXPLAIN CBO SELECT unix_timestamp('2009-03-20', 'yyyy-MM-dd');
      
      CBO PLAN:
      HiveProject(_o__c0=[UNIX_TIMESTAMP(_UTF-16LE'2009-03-20':VARCHAR(2147483647) CHARACTER SET "UTF-16LE", _UTF-16LE'yyyy-MM-dd':VARCHAR(2147483647) CHARACTER SET "UTF-16LE")])
        HiveTableScan(table=[[_dummy_database, _dummy_table]], table:alias=[_dummy_table])
      

      Observe that constant reduction works fine in the physical plan.

      EXPLAIN SELECT unix_timestamp();
      
      STAGE DEPENDENCIES:
        Stage-0 is a root stage
      
      STAGE PLANS:
        Stage: Stage-0
          Fetch Operator
            limit: -1
            Processor Tree:
              TableScan
                alias: _dummy_table
                Row Limit Per Split: 1
                Select Operator
                  expressions: 1682411039L (type: bigint)
                  outputColumnNames: _col0
                  ListSink
      

      Generally, we want to perform constant reduction as much as possible in CBO level cause it can affect expression pushdown in various storage handlers (HIVE-21388) but also predicate simplification/elimination.

      Currently we fail to reduce UNIX_TIMESTAMP in CBO level cause the respective operator is marked as a dynamicFunction and the reduction rules in Calcite explicitly skip reduction in this case.

      As of Calcite 1.28.0, (CALCITE-2736) the reduction of dynamic functions becomes configurable so we may be able to exploit this feature. Alternatively, we will have to treat UNIX_TIMESTAMP in a similar fashion to CURRENT_TIMESTAMP and possibly rely on HiveSqlFunction.

      Attachments

        Issue Links

          Activity

            People

              zabetak Stamatis Zampetakis
              zabetak Stamatis Zampetakis
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: