Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-15539

Optimize complex multi-insert queries in Calcite

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.2.0
    • 2.2.0
    • Parser
    • None

    Description

      Currently multi-insert queries are not optimized by Calcite. Proper integration with Calcite would include creating a spool operator whose output is reused by every insert statement; however, spool operator has not been added to Calcite yet (CALCITE-481).

      In the meantime, and since complex logic for multi-insert queries is in FROM clause, we can optimize the FROM clause with Calcite and connect the optimized result to the original query.

      Initially, we will recognize three different cases:

      • FROM clause is trivial, e.g., table reference, or not supported. No need to optimize with Calcite.
      • FROM clause is a subquery. Optimize the subquery with Calcite.
      • FROM clause is a join. Rewrite join into a subquery and optimize it with Calcite. Change references in INSERT statements to refer to subquery columns.

      This should be beneficial for MERGE statements processing too, since MERGE statements are treated as multi-insert queries by Hive.

      Attachments

        1. HIVE-15539.05.patch
          390 kB
          jcamachorodriguez
        2. HIVE-15539.04.patch
          336 kB
          jcamachorodriguez
        3. HIVE-15539.03.patch
          54 kB
          jcamachorodriguez

        Issue Links

          Activity

            People

              jcamacho Jesús Camacho Rodríguez
              jcamacho Jesús Camacho Rodríguez
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: