Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-41813

UnsafeHashedRelation read method needs to confirm the correctness of the data

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.4.8, 3.3.1
    • None
    • SQL

    Description

      Recently, we have encountered the thread-safe issue of[ SPARK-31511|https://issues.apache.org/jira/browse/SPARK-31511] in production.  The version 2.4.3 we use has not been fixed yet, which leads to data errors.  I think this is a serious error, the data broadcast by the Driver is inconsistent with the data of the Executor.  The Executor side should confirm the correctness of the data when reading the data.  The numKeys and numValues read from the file header should be consistent with the real data read.  This judgment should be added to prevent wrong data from being calculated.

      Attachments

        Activity

          People

            Unassigned Unassigned
            peacewong Heping Wang
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: