Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-27764 Feature Parity between PostgreSQL and Spark
  3. SPARK-29708

Different answers in aggregates of duplicate grouping sets

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4, 3.0.0
    • 2.4.5, 3.0.0
    • SQL

    Description

      A query below with multiple grouping sets seems to have different answers between PgSQL and Spark;

      postgres=# create table gstest4(id integer, v integer, unhashable_col bit(4), unsortable_col xid);
      
      postgres=# insert into gstest4
      postgres-# values (1,1,b'0000','1'), (2,2,b'0001','1'),
      postgres-#        (3,4,b'0010','2'), (4,8,b'0011','2'),
      postgres-#        (5,16,b'0000','2'), (6,32,b'0001','2'),
      postgres-#        (7,64,b'0010','1'), (8,128,b'0011','1');
      INSERT 0 8
      
      postgres=# select unsortable_col, count(*)
      postgres-#   from gstest4 group by grouping sets ((unsortable_col),(unsortable_col))
      postgres-#   order by text(unsortable_col);
       unsortable_col | count 
      ----------------+-------
                    1 |     8
                    1 |     8
                    2 |     8
                    2 |     8
      (4 rows)
      
      scala> sql("""create table gstest4(id integer, v integer, unhashable_col /* bit(4) */ byte, unsortable_col /* xid */ integer) using parquet""")
      
      scala> sql("""
           | insert into gstest4
           | values (1,1,tinyint('0'),1), (2,2,tinyint('1'),1),
           |        (3,4,tinyint('2'),2), (4,8,tinyint('3'),2),
           |        (5,16,tinyint('0'),2), (6,32,tinyint('1'),2),
           |        (7,64,tinyint('2'),1), (8,128,tinyint('3'),1)
           | """)
      res21: org.apache.spark.sql.DataFrame = []
      
      scala> 
      
      scala> sql("""
           | select unsortable_col, count(*)
           |   from gstest4 group by grouping sets ((unsortable_col),(unsortable_col))
           |   order by string(unsortable_col)
           | """).show
      +--------------+--------+
      |unsortable_col|count(1)|
      +--------------+--------+
      |             1|       8|
      |             2|       8|
      +--------------+--------+
      

      Attachments

        Issue Links

          Activity

            People

              maropu Takeshi Yamamuro
              maropu Takeshi Yamamuro
              Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: