Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-7785

GROUP BY clause not analyzed prior to rewrite step

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Reopened
    • Minor
    • Resolution: Unresolved
    • Impala 3.0
    • None
    • Frontend
    • None
    • ghx-label-6

    Description

      The FE fails to analyze a GROUP BY clause prior to invoking the rewrite rules, causing the rules to fail to do any rewrites.

      For the SELECT list, the analyzer processes each expression and marks it as analyzed.

      The rewrite rules, however, tend to skip unanalyzed nodes. (And, according to IMPALA-7754, often are not re-analyzed after a rewrite.)

      Consider this simple query:

      SELECT case when string_col is not null then string_col else 'foo' end                                    
      FROM functional.alltypestiny                         
      GROUP BY case when string_col is not null then string_col else 'foo' end                                     
      

      This query works. Now, using the new feature in IMPALA-7655 with a query that will be rewritten to the above:

      SELECT coalesce(string_col, 'foo')                                    
      FROM functional.alltypes                                                  
      GROUP BY coalesce(string_col, 'foo')                                         
      

      The above is rewritten using the new conditional function rewrite rules. Result:

      org.apache.impala.common.AnalysisException:
        select list expression not produced by aggregation output
        (missing from GROUP BY clause?):
        CASE WHEN string_col IS NOT NULL THEN string_col ELSE 'foo' END
      

      The reason is the check used in multiple rewrite rules:

        public Expr apply(Expr expr, Analyzer analyzer) throws AnalysisException {              
          if (!expr.isAnalyzed()) return expr;                                                  
      

      Step though the code. The coalesce() expression in the SELECT clause is analyzed, the one in the GROUP BY is not. This creates a problem because SQL semantics require the identical expression in both clause for them to match. (It also means no other rewrite rules, at least not those with this check, are invoked, leading to an unintended code path.)

      This query makes it a bit clearer:

      SELECT 1 + 2
      FROM functional.alltypestiny
      GROUP BY 1 + 2
      

      This works. But, if we use test code to inspect the "rewritten" GROUP BY, we find that it is still at "1 + 2" while the SELECT expression has been rewritten to "3".

      Seems that, when working with rewrites, we must be very careful because, as the code currently is written, we rewrite some clauses but not others. Then, we have to know when it is safe to have the SELECT clause differ from the GROUP BY clause. (Looks like it is OK for constants to differ, but not for functions...)

      VERY confusing, would be better to just fix the darn thing.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              Paul.Rogers Paul Rogers
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: