Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-45592

AQE and InMemoryTableScanExec correctness bug

    XMLWordPrintableJSON

Details

    Description

      The following query should return 1000000

      import org.apache.spark.storage.StorageLevel
      
      val df = spark.range(0, 1000000, 1, 5).map(l => (l, l))
      val ee = df.select($"_1".as("src"), $"_2".as("dst"))
        .persist(StorageLevel.MEMORY_AND_DISK)
      
      ee.count()
      val minNbrs1 = ee
        .groupBy("src").agg(min(col("dst")).as("min_number"))
        .persist(StorageLevel.MEMORY_AND_DISK)
      val join = ee.join(minNbrs1, "src")
      join.count()

      but on spark 3.5.0 there is a correctness bug causing it to return `104800` or some other smaller value.

      Attachments

        Issue Links

          Activity

            People

              eejbyfeldt Emil Ejbyfeldt
              eejbyfeldt Emil Ejbyfeldt
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: