Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25299 Use remote storage for persisting shuffle data
  3. SPARK-28607

Don't hold a reference to two partitionLengths arrays

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 3.0.0
    • 3.0.0
    • Shuffle, Spark Core
    • None

    Description

      SPARK-28209 introduced the new shuffle writer API and its usage in BypassMergeSortShuffleWriter. However, the design of the API forces the partition lengths to be tracked both in the implementation of the plugin and also by the higher-level writer. This leads to redundant memory usage. We should only track the lengths of the partitions in the implementation of the plugin and propagate this information back up to the writer as the return value of commitAllPartitions.

      Attachments

        Issue Links

          Activity

            People

              mcheah Matt Cheah
              mcheah Matt Cheah
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: