Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-46743

Count bug introduced for scalar subquery when using TEMPORARY VIEW, as compared to using table

    XMLWordPrintableJSON

Details

    Description

      Using the temp view reproduces COUNT bug, returns nulls instead of 0.

      With a table:

      scala> spark.sql("""CREATE TABLE outer_table USING parquet AS SELECT * FROM VALUES
           |     (1, 1),
           |     (2, 1),
           |     (3, 3),
           |     (6, 6),
           |     (7, 7),
           |     (9, 9) AS inner_table(a, b)""")
      
      val res6: org.apache.spark.sql.DataFrame = []
      
      scala> spark.sql("CREATE TABLE null_table USING parquet AS SELECT CAST(null AS int) AS a, CAST(null as int) AS b ;")
      
      val res7: org.apache.spark.sql.DataFrame = []
      
      scala> spark.sql("""SELECT ( SELECT COUNT(null_table.a) AS aggAlias FROM null_table WHERE null_table.a = outer_table.a) FROM outer_table""").collect()
      
      val res8: Array[org.apache.spark.sql.Row] = Array([0], [0], [0], [0], [0], [0]) 

      With a view:

       

      spark.sql("CREATE TEMPORARY VIEW outer_view(a, b) AS VALUES (1, 1), (2, 1),(3, 3), (6, 6), (7, 7), (9, 9);")
      
      spark.sql("CREATE TEMPORARY VIEW null_view(a, b) AS SELECT CAST(null AS int), CAST(null as int);")
      
      spark.sql("""SELECT ( SELECT COUNT(null_view.a) AS aggAlias FROM null_view WHERE null_view.a = outer_view.a) FROM outer_view""").collect()
      
      val res2: Array[org.apache.spark.sql.Row] = Array([null], [null], [null], [null], [null], [null])

       

       

      Attachments

        Issue Links

          Activity

            People

              andyylam Andy Lam
              andyylam Andy Lam
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: