Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-47568

Fix race condition between maintenance thread and task thead for RocksDB snapshot

    XMLWordPrintableJSON

Details

    Description

      There are currently some race conditions between maintenance thread and task thread which can result in corrupted checkpoint state.

      1. The maintenance thread currently relies on class variable lastSnapshot to find the latest checkpoint and uploads it to DFS. This checkpoint can be modified at commit time by Task thread if a new snapshot is created.
      2. The task thread does not reset lastSnapshot at load time, which can result in newer snapshots (if a old version is loaded) being considered valid and uploaded to DFS. This results in VersionIdMismatch errors.

      This issue proposes to fix these issues by guarding latestSnapshot variable modification, and setting latestSnapshot properly at load time.

      Attachments

        Activity

          People

            bhuwan.sahni Bhuwan Sahni
            bhuwan.sahni Bhuwan Sahni
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: