Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-27282

Spark incorrect results when using UNION with GROUP BY clause

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Incomplete
    • 2.3.2
    • None
    • Spark Shell, Spark Submit, SQL
    • None
    • I'm using :

      IntelliJ  IDEA ==> 2018.1.4

      spark-sql and spark-core ==> 2.3.2.3.1.0.0-78 (for HDP 3.1)

      scala ==> 2.11.8

    Description

      When using UNION clause after a GROUP BY clause in spark, the results obtained are wrong.

      The following example explicit this issue:

      CREATE TABLE test_un (
      col1 varchar(255),
      col2 varchar(255),
      col3 varchar(255),
      col4 varchar(255)
      );
      
      INSERT INTO test_un (col1, col2, col3, col4)
      VALUES (1,1,2,4),
      (1,1,2,4),
      (1,1,3,5),
      (2,2,2,null);
      

      I used the following code :

      val x = Toolkit.HiveToolkit.getDataFromHive("test","test_un")
      
      val  y = x
         .filter(col("col4")isNotNull)
        .groupBy("col1", "col2","col3")
        .agg(count(col("col3")).alias("cnt"))
        .withColumn("col_name", lit("col3"))
        .select(col("col1"), col("col2"), col("col_name"),col("col3").alias("col_value"), col("cnt"))
      
      val z = x
        .filter(col("col4")isNotNull)
        .groupBy("col1", "col2","col4")
        .agg(count(col("col4")).alias("cnt"))
        .withColumn("col_name", lit("col4"))
        .select(col("col1"), col("col2"), col("col_name"),col("col4").alias("col_value"), col("cnt"))
      
      y.union(z).show()
      

       And i obtained the following results:

      col1 col2 col_name col_value cnt
      1 1 col3 5 1
      1 1 col3 4 2
      1 1 col4 5 1
      1 1 col4 4 2

      Expected results:

      col1 col2 col_name col_value cnt
      1 1 col3 3 1
      1 1 col3 2 2
      1 1 col4 4 2
      1 1 col4 5 1

      But when i remove the last row of the table, i obtain the correct results.

      (2,2,2,null)

      Attachments

        Activity

          People

            Unassigned Unassigned
            sofia Sofia
            Votes:
            1 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: