[SPARK-47381] Spark SQL: select * from t where (false) parsed as subquery/column alias - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Minor
Resolution: Unresolved
Affects Version/s: 3.5.0
Fix Version/s: None
Component/s: Spark Core
Labels:
None
Environment:

macOS Sonoma
Spark 3.5.0
Java 11

(but this also happens in GitHub actions, which is Ubuntu)

The Spark SQL queries are automatically and dynamically generated/compiled from another language by RumbleDB.

Language:
- sql

Description

Given the view (input4d47e3f1d26b42eabea312bd9d99ab43):

----------------

              o
----------------

[2B 01 0F 00 00]
----------------

the following Spark SQL query

select * from input4d47e3f1d26b42eabea312bd9d99ab43 where (FALSE)

outputs:

----------------

          FALSE
----------------

[2B 01 0F 00 00]
----------------

instead of an empty DataFrame.

A workaround is this query:

select o from input4d47e3f1d26b42eabea312bd9d99ab43 where true and (FALSE)

which correctly outputs:

---

  o
---
---

It seems that this comes from an ambiguity in the parser, where "where (false)" is parsed as a subquery and column alias rather than as a where clause, as can be seen by this query (projecting to column o):

select o from input4d47e3f1d26b42eabea312bd9d99ab43 where (FALSE)

Which outputs this error including the query plan:

[UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `o` cannot be resolved. Did you mean one of the following? [`FALSE`].; line 1 pos 7;
'Project ['o]
+- SubqueryAlias where
   +- Project o#1 AS FALSE#21
      +- SubqueryAlias input4d47e3f1d26b42eabea312bd9d99ab43
         +- View (`input4d47e3f1d26b42eabea312bd9d99ab43`, o#1)
            +- LogicalRDD o#1, false

From the Spark SQL grammar perspective, it can be interpreted both ways since this is an ambiguity in the grammar, but the currently implemented grammar precedence rule (parsing as a subquery alias rather than a where clause) can break the execution of automatically generated Spark SQL queries when the where clause is a simple boolean in parentheses.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Ghislain Fourny

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 13/Mar/24 13:27

Updated:: 13/Mar/24 13:40