Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-45657

Caching SQL UNION of different column data types does not work inside Dataset.union

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.3.2, 3.4.0, 3.4.1
    • 3.5.0
    • SQL
    • None

    Description

       

      Cache SQL UNION of 2 sides with different column data types

      scala> spark.sql("select 1 id union select 's2' id").cache()  

      Dataset.union does not leverage the cache

      scala> spark.sql("select 1 id union select 's2' id").union(spark.sql("select 's3'")).queryExecution.optimizedPlan
      res15: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan =
      Union false, false
      :- Aggregate [id#109], [id#109]
      :  +- Union false, false
      :     :- Project [1 AS id#109]
      :     :  +- OneRowRelation
      :     +- Project [s2 AS id#108]
      :        +- OneRowRelation
      +- Project [s3 AS s3#111]
         +- OneRowRelation 

      SQL UNION of the cached SQL UNION does use the cache! Please note `InMemoryRelation` used.

      scala> spark.sql("(select 1 id union select 's2' id) union select 's3'").queryExecution.optimizedPlan
      res16: org.apache.spark.sql.catalyst.plans.logical.LogicalPlan =
      Aggregate [id#117], [id#117]
      +- Union false, false
         :- InMemoryRelation [id#117], StorageLevel(disk, memory, deserialized, 1 replicas)
         :     +- *(4) HashAggregate(keys=[id#100], functions=[], output=[id#100])
         :        +- Exchange hashpartitioning(id#100, 500), ENSURE_REQUIREMENTS, [plan_id=241]
         :           +- *(3) HashAggregate(keys=[id#100], functions=[], output=[id#100])
         :              +- Union
         :                 :- *(1) Project [1 AS id#100]
         :                 :  +- *(1) Scan OneRowRelation[]
         :                 +- *(2) Project [s2 AS id#99]
         :                    +- *(2) Scan OneRowRelation[]
         +- Project [s3 AS s3#116]
            +- OneRowRelation 

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            jzhuge John Zhuge
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: