Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-33739

Jobs committed through the S3A Magic committer don't report the bytes written

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 3.0.1
    • 3.2.0
    • SQL
    • None

    Description

      The spark statistics tracking doesn't correctly assess the size of the uploaded files as it only calls getFileStatus on the zero byte objects -not the yet-to-manifest files. Which, given they don't exist yet, isn't easy to do.

      HADOOP-17414 will attach the final length as a custom header to the marker object, and implement getXAttr in the S3A FS to probe for it.

      BasicWriteStatsTracker can probe for this custom Xattr if the size of the generated file is 0 bytes; if found and parseable use that as the declared length of the output.

      Attachments

        Issue Links

          Activity

            People

              stevel@apache.org Steve Loughran
              stevel@apache.org Steve Loughran
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: