Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-17547

Temporary shuffle data files may be leaked following exception in write

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.5.3, 1.6.0, 2.0.0
    • 1.6.3, 2.0.1, 2.1.0
    • Shuffle, Spark Core
    • None

    Description

      SPARK-8029 modified shuffle writers to first stage their data to a temporary file in the same directory as the final destination file and then to atomically rename the file at the end of the write job. However, this change introduced the potential for the temporary output file to be leaked if an exception occurs during the write because the shuffle writers' existing error cleanup code doesn't handle this new temp file.

      This is easy to fix: we just need to add a finally block to ensure that the temporary file is guaranteed to be either moved or deleted before existing the shuffle write method.

      Attachments

        Activity

          People

            joshrosen Josh Rosen
            joshrosen Josh Rosen
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: