Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-26638

Replace in-house CBO reduce expressions rules with Calcite's built-in classes

    XMLWordPrintableJSON

Details

    Description

      The goal of this ticket is to remove Hive specific code in HiveReduceExpressionsRule and use exclusively the respective Calcite classes (i.e., ReduceExpressionsRule) to reduce maintenance overhead and facilitate code evolution.

      Currently the only difference between in-house (HiveReduceExpressionsRule) and built-in (ReduceExpressionsRule) reduce expressions rules lies in the way we treat the Filter operator (i.e., FilterReduceExpressionsRule).

      There are four differences when comparing the in-house code with the respective part in Calcite 1.25.0 that are Hive specific.

      Match nullability when reducing expressions
      When we reduce filters we always set matchNullability (last parameter) to false.

      if (reduceExpressions(filter, expList, predicates, true, false)) {
      

      This means that the original and reduced expression can have a slightly different type in terms of nullability; the original is nullable and the reduced is not nullable. When the value is true the type can be preserved by adding a "nullability" CAST, which is a cast to the same type which differs only to if it is nullable or not.

      Hardcoding matchNullability to false was done as part of the upgrade in Calcite 1.15.0 (HIVE-18068) where the behavior of the rule became configurable (CALCITE-2041).

      Remove nullability cast explicitly
      When the expression is reduced we try to remove the nullability cast; if there is one.

      if (RexUtil.isNullabilityCast(filter.getCluster().getTypeFactory(), newConditionExp)) {
      	newConditionExp = ((RexCall) newConditionExp).getOperands().get(0);
      }
      

      The code was added as part of the upgrade to Calcite 1.10.0 (HIVE-13316). However, the code is redundant as of HIVE-18068; setting matchNullability to false no longer generates nullability casts during the reduction.

      Avoid creating filters with condition of type NULL

      if(newConditionExp.getType().getSqlTypeName() == SqlTypeName.NULL) {
      	newConditionExp = call.builder().cast(newConditionExp, SqlTypeName.BOOLEAN);
      }
      

      Hive tries to cast such expressions to BOOLEAN to avoid the weird (and possibly problematic) situation of having a condition with NULL type.

      In Calcite, there is specific code for detecting if the new condition is the NULL literal (with NULL type) and if that's the case it turns the relation to empty.

      } else if (newConditionExp instanceof RexLiteral
        || RexUtil.isNullLiteral(newConditionExp, true)) {
      call.transformTo(createEmptyRelOrEquivalent(call, filter));
      

      Due to that the Hive specific code is redundant if the Calcite rule is used.

      Bail out when input to reduceNotNullableFilter is not a RexCall

      if (!(rexCall.getOperands().get(0) instanceof RexCall)) {
            // If child is not a RexCall instance, we can bail out
            return;
      }
      

      The code was added as part of the upgrade to Calcite 1.10.0 (HIVE-13316) but it does not add any functional value.
      The instanceof check is redundant since the code in reduceNotNullableFilter is a noop when the expression/call is not one of the following: IS_NULL, IS_UNKNOWN, IS_NOT_NULL, which are all rex calls.

      Summary

      All of the Hive specific changes mentioned previously can be safely replaced by appropriate uses of the Calcite APIs without affecting the behavior of CBO.

      Attachments

        Issue Links

          Activity

            People

              zabetak Stamatis Zampetakis
              zabetak Stamatis Zampetakis
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m