Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-12391

SkewJoinOptimizer might not kick in if columns are renamed after TableScanOperator

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 1.3.0, 2.0.0
    • 1.3.0, 2.0.0
    • Logical Optimizer
    • None

    Description

      SkewJoinOptimizer will not kick in if the columns are just renamed after the TS e.g. by the creation of a derived table.

      To reproduce, consider the following example:

      set hive.optimize.skewjoin.compiletime = true;
      
      CREATE TABLE T1(key STRING, val STRING)
      SKEWED BY (key) ON ((2)) STORED AS TEXTFILE;
      
      CREATE TABLE T2(key STRING, val STRING)
      SKEWED BY (key) ON ((3)) STORED AS TEXTFILE;
      

      For this query, SkewJoinOptimizer kicks in:

      SELECT a.*, b.*
      FROM T1 a JOIN T2 b
      ON a.key = b.key
      

      For this one, it does not:

      SELECT a.*, b.*
      FROM 
        (SELECT key as k, val as v FROM T1) a
        JOIN
        (SELECT key as k, val as v FROM T2) b
      ON a.k = b.k;
      

      The reason is that SkewJoinOptimizer does not backtrack the origin of the column. Instead it just uses its name to know if it is produced by a certain TS.

      Attachments

        1. HIVE-12391.patch
          16 kB
          jcamachorodriguez

        Issue Links

          Activity

            People

              jcamacho Jesús Camacho Rodríguez
              jcamacho Jesús Camacho Rodríguez
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: