Uploaded image for project: 'Calcite'
  1. Calcite
  2. CALCITE-2166

Cumulative cost of RelSubset.best RelNode is increased after calling RelSubset.propagateCostImprovements() for input RelNodes

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Reopened
    • Critical
    • Resolution: Unresolved
    • 1.15.0
    • None
    • core

    Description

      After calling RelSubset.propagateCostImprovements() cumulative cost of RelSubset.best RelNode may be increased due to the increase of the non-cumulative cost caused by changing of input best RelNode.
      To observe this issue, add this code:

                if (subset.best != null) {
                  RelOptCost bestCost = getCost(subset.best, RelMetadataQuery.instance());
                  if (!subset.bestCost.equals(bestCost)) {
                    throw new AssertionError(
                      "relSubset [" + subset.getDescription()
                        + "] has wrong best cost "
                        + subset.bestCost + ". Correct cost is " + bestCost);
                  }
                }
      

      into VolcanoPlanner.validate() method (line 907).
      List of unit tests which fail with this check:

      Failed tests: 
        MaterializationTest.testJoinMaterializationUKFK9:1823->checkMaterialize:198->checkMaterialize:205->checkThatMaterialize:233 relSubset [rel#226287:Subset#8.ENUMERABLE.[]] has wrong best cost {221.5 rows, 128.25 cpu, 0.0 io}. Correct cost is {233.0 rows, 178.0 cpu, 0.0 io}
        ScannableTableTest.testPFPushDownProjectFilterAggregateNested:279 relSubset [rel#12950:Subset#5.ENUMERABLE.[]] has wrong best cost {63.8 rows, 62.308 cpu, 0.0 io}. Correct cost is {70.4 rows, 60.404 cpu, 0.0 io}
        ScannableTableTest.testPFTableRefusesFilterCooperative:221 relSubset [rel#13382:Subset#2.ENUMERABLE.[]] has wrong best cost {81.0 rows, 181.01 cpu, 0.0 io}. Correct cost is {150.5 rows, 250.505 cpu, 0.0 io}
        ScannableTableTest.testProjectableFilterableCooperative:148 relSubset [rel#13611:Subset#2.ENUMERABLE.[]] has wrong best cost {81.0 rows, 181.01 cpu, 0.0 io}. Correct cost is {150.5 rows, 250.505 cpu, 0.0 io}
        ScannableTableTest.testProjectableFilterableNonCooperative:165 relSubset [rel#13754:Subset#2.ENUMERABLE.[]] has wrong best cost {81.0 rows, 181.01 cpu, 0.0 io}. Correct cost is {150.5 rows, 250.505 cpu, 0.0 io}
        FrameworksTest.testUpdate:336->executeQuery:367 relSubset [rel#22533:Subset#2.ENUMERABLE.any] has wrong best cost {19.5 rows, 37.75 cpu, 0.0 io}. Correct cost is {22.575 rows, 52.58 cpu, 0.0 io}
      

      For the test MaterializationTest.testJoinMaterializationUKFK9 initial best plan was:

      EnumerableProject(empid0=[$5], empid00=[$5], deptno0=[$7]): rowcount = 15.0, cumulative cost = {15.0 rows, 45.0 cpu, 0.0 io}, id = 3989
        EnumerableJoin(subset=[rel#3988:Subset#34.ENUMERABLE.[]], condition=[=($1, $7)], joinType=[inner]): rowcount = 15.0, cumulative cost = {116.0 rows, 0.0 cpu, 0.0 io}, id = 4797
          EnumerableFilter(subset=[rel#4274:Subset#47.ENUMERABLE.[0]], condition=[=(CAST($2):VARCHAR CHARACTER SET "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary", 'Bill')]): rowcount = 1.0, cumulative cost = {1.0 rows, 1.0 cpu, 0.0 io}, id = 16522
            EnumerableTableScan(subset=[rel#158:Subset#11.ENUMERABLE.[0]], table=[[hr, m0]]): rowcount = 1.0, cumulative cost = {0.0 rows, 1.0 cpu, 0.0 io}, id = 79
          EnumerableTableScan(subset=[rel#115:Subset#5.ENUMERABLE.[]], table=[[hr, depts]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 62
      

      Its cumulative cost is {221.5 rows, 123.75 cpu, 0.0 io}

      After applying some rules it became:

      EnumerableProject(empid0=[$3], empid00=[$3], deptno0=[$0]): rowcount = 2.25, cumulative cost = {2.25 rows, 6.75 cpu, 0.0 io}, id = 4012
        EnumerableFilter(subset=[rel#4007:Subset#41.ENUMERABLE.[]], condition=[=(CAST($2):VARCHAR CHARACTER SET "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary", 'Bill')]): rowcount = 2.25, cumulative cost = {2.25 rows, 15.0 cpu, 0.0 io}, id = 4811
          EnumerableProject(subset=[rel#4203:Subset#61.ENUMERABLE.[]], deptno=[$7], deptno0=[$1], name0=[$2], empid0=[$5]): rowcount = 15.0, cumulative cost = {15.0 rows, 60.0 cpu, 0.0 io}, id = 4206
            EnumerableJoin(subset=[rel#4204:Subset#52.ENUMERABLE.[]], condition=[=($1, $7)], joinType=[inner]): rowcount = 15.0, cumulative cost = {116.0 rows, 0.0 cpu, 0.0 io}, id = 4795
              EnumerableTableScan(subset=[rel#158:Subset#11.ENUMERABLE.[0]], table=[[hr, m0]]): rowcount = 1.0, cumulative cost = {0.0 rows, 1.0 cpu, 0.0 io}, id = 79
              EnumerableTableScan(subset=[rel#115:Subset#5.ENUMERABLE.[]], table=[[hr, depts]]): rowcount = 100.0, cumulative cost = {100.0 rows, 101.0 cpu, 0.0 io}, id = 62
      

      Its cumulative cost is {233.0 rows, 148.0 cpu, 0.0 io}.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned Assign to me
            volodymyr Vova Vysotskyi

            Dates

              Created:
              Updated:

              Time Tracking

              Estimated:
              Original Estimate - Not Specified
              Not Specified
              Remaining:
              Remaining Estimate - 0h
              0h
              Logged:
              Time Spent - 2.5h
              2.5h

              Slack

                Issue deployment