Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-40034

PathOutputCommitters to work with dynamic partition overwrite

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 3.5.0
    • 3.5.0
    • Spark Core, SQL
    • None

    Description

      sibling of MAPREDUCE-7403: allow PathOutputCommitter implementation to declare that they support the semantics required by spark dynamic partitioning:

      • rename to work as expected
      • working dir to be on same fs as final dir

      They will do this through implementing StreamCapabilities and adding a new probe, "mapreduce.job.committer.dynamic.partitioning" ; the spark side changes are to

      • postpone rejection of dynamic partition overwrite until the output committer is created
      • allow it if the committer implements StreamCapabilities and returns true for {{hasCapability("mapreduce.job.committer.dynamic.partitioning")))

      this isn't going to be supported by the s3a committers, they don't meet the requirements. The manifest committer of MAPREDUCE-7341 running against abfs and gcs does work.

      Attachments

        Issue Links

          Activity

            People

              stevel@apache.org Steve Loughran
              stevel@apache.org Steve Loughran
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: