Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-43760

Incorrect attribute nullability after RewriteCorrelatedScalarSubquery leads to incorrect query results

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.4.0
    • 3.4.1, 3.5.0
    • SQL

    Description

      The following query:

       

      select * from (
       select t1.id c1, (
        select t2.id c from range (1, 2) t2
        where t1.id = t2.id  ) c2
       from range (1, 3) t1 ) t
      where t.c2 is not null
      -- !query schema
      struct<c1:bigint,c2:bigint>
      -- !query output
      1	1
      2	NULL
       

       
      should return 1 row, because the second row is supposed to be removed by IsNotNull predicate. However, due to a wrong nullability propagation after subquery decorrelation, the output of the subquery is declared as not-nullable (incorrectly), so the predicate is constant folded into True.

      Attachments

        Activity

          People

            gubichev Andrey Gubichev
            gubichev Andrey Gubichev
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: