Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-47381

Spark SQL: select * from t where (false) parsed as subquery/column alias

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 3.5.0
    • None
    • Spark Core
    • None
    • macOS Sonoma
      Spark 3.5.0
      Java 11

      (but this also happens in GitHub actions, which is Ubuntu)

      The Spark SQL queries are automatically and dynamically generated/compiled from another language by RumbleDB.

    Description

      Given the view (input4d47e3f1d26b42eabea312bd9d99ab43):

       

       

      ----------------
      
                    o
      ----------------
      
      [2B 01 0F 00 00]
      ----------------
      

       

       

      the following Spark SQL query

      select * from input4d47e3f1d26b42eabea312bd9d99ab43 where (FALSE)

       

      outputs:

       

      ----------------
      
                FALSE
      ----------------
      
      [2B 01 0F 00 00]
      ----------------
      

       

       

      instead of an empty DataFrame.

       

      A workaround is this query:

      select o from input4d47e3f1d26b42eabea312bd9d99ab43 where true and (FALSE)

       

      which correctly outputs:

       

      ---
      
        o
      ---
      ---
      

       

       

      It seems that this comes from an ambiguity in the parser, where "where (false)" is parsed as a subquery and column alias rather than as a where clause, as can be seen by this query (projecting to column o):

      select o from input4d47e3f1d26b42eabea312bd9d99ab43 where (FALSE)

       

      Which outputs this error including the query plan:

       

      [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `o` cannot be resolved. Did you mean one of the following? [`FALSE`].; line 1 pos 7;
      'Project ['o]
      +- SubqueryAlias where
         +- Project o#1 AS FALSE#21
            +- SubqueryAlias input4d47e3f1d26b42eabea312bd9d99ab43
               +- View (`input4d47e3f1d26b42eabea312bd9d99ab43`, o#1)
                  +- LogicalRDD o#1, false
      

       

       

      From the Spark SQL grammar perspective, it can be interpreted both ways since this is an ambiguity in the grammar, but the currently implemented grammar precedence rule (parsing as a subquery alias rather than a where clause) can break the execution of automatically generated Spark SQL queries when the where clause is a simple boolean in parentheses.

      Attachments

        Activity

          People

            Unassigned Unassigned
            ghislain.fourny Ghislain Fourny
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: