Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-36772

FinalizeShuffleMerge fails with an exception due to attempt id not matching

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 3.2.0
    • 3.2.0
    • Shuffle
    • None

    Description

      As part of driver request to external shuffle services (ESS) to finalize the merge, it also passes its application attempt id so that ESS can validate the request is from the correct attempt.
      This attempt id is fetched from the TransportConf passed in when creating the ExternalBlockStoreClient - and the transport conf leverages a cloned copy of the SparkConf passed to it.

      Application attempt id is set as part of SparkContext initialization.
      But this happens after driver SparkEnv has already been created.

      Hence the attempt id that ExternalBlockStoreClient uses will always end up being -1 : which will not match the attempt id at ESS (which is based on spark.app.attempt.id) : resulting in merge finalization to always fail (" java.lang.IllegalArgumentException: The attempt id -1 in this FinalizeShuffleMerge message does not match with the current attempt id 1 stored in shuffle service for application ...")

      Attachments

        Activity

          People

            zhouyejoe Ye Zhou
            mridulm80 Mridul Muralidharan
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: