Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-37375 Umbrella: Storage Partitioned Join (SPJ)
  3. SPARK-45652

SPJ: Handle empty input partitions after dynamic filtering

    XMLWordPrintableJSON

Details

    Description

      When the number of input partitions become 0 after dynamic filtering, in BatchScanExec, currently SPJ will fail with error:

      java.util.NoSuchElementException: None.get
      	at scala.None$.get(Option.scala:529)
      	at scala.None$.get(Option.scala:527)
      	at org.apache.spark.sql.execution.datasources.v2.BatchScanExec.filteredPartitions$lzycompute(BatchScanExec.scala:108)
      	at org.apache.spark.sql.execution.datasources.v2.BatchScanExec.filteredPartitions(BatchScanExec.scala:65)
      	at org.apache.spark.sql.execution.datasources.v2.BatchScanExec.inputRDD$lzycompute(BatchScanExec.scala:136)
      	at org.apache.spark.sql.execution.datasources.v2.BatchScanExec.inputRDD(BatchScanExec.scala:135)
      

      This is because groupPartitions will return None for this case.

      Attachments

        Activity

          People

            csun Chao Sun
            csun Chao Sun
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: