Details
-
Bug
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
3.5.0
-
None
-
None
-
macOS Sonoma
Spark 3.5.0
Java 11(but this also happens in GitHub actions, which is Ubuntu)
The Spark SQL queries are automatically and dynamically generated/compiled from another language by RumbleDB.
Description
Given the view (input4d47e3f1d26b42eabea312bd9d99ab43):
---------------- o ---------------- [2B 01 0F 00 00] ----------------
the following Spark SQL query
select * from input4d47e3f1d26b42eabea312bd9d99ab43 where (FALSE)
outputs:
---------------- FALSE ---------------- [2B 01 0F 00 00] ----------------
instead of an empty DataFrame.
A workaround is this query:
select o from input4d47e3f1d26b42eabea312bd9d99ab43 where true and (FALSE)
which correctly outputs:
--- o --- ---
It seems that this comes from an ambiguity in the parser, where "where (false)" is parsed as a subquery and column alias rather than as a where clause, as can be seen by this query (projecting to column o):
select o from input4d47e3f1d26b42eabea312bd9d99ab43 where (FALSE)
Which outputs this error including the query plan:
[UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name `o` cannot be resolved. Did you mean one of the following? [`FALSE`].; line 1 pos 7; 'Project ['o] +- SubqueryAlias where +- Project o#1 AS FALSE#21 +- SubqueryAlias input4d47e3f1d26b42eabea312bd9d99ab43 +- View (`input4d47e3f1d26b42eabea312bd9d99ab43`, o#1) +- LogicalRDD o#1, false
From the Spark SQL grammar perspective, it can be interpreted both ways since this is an ambiguity in the grammar, but the currently implemented grammar precedence rule (parsing as a subquery alias rather than a where clause) can break the execution of automatically generated Spark SQL queries when the where clause is a simple boolean in parentheses.