Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-14187

NullPointerException or IllegalStateException at IsmReaderImpl in Dataflow

Details

    • Bug
    • Status: Resolved
    • P2
    • Resolution: Fixed
    • None
    • 2.39.0
    • runner-dataflow
    • None

    Description

      Problem

      Dataflow Java batch jobs with large side input intermittently throws NullPointerException or IllegalStateException.

      (all error logs in the Dataflow job is here.)

      Hypothesis

      The initializeForKeyedRead is not synchronized. Multiple threads can enter the method so that initialize the index for the same shard and update indexPerShard without synchronization. And, the overKeyComponents also accesses indexPerShard without synchronization. As indexPerShard is just a HashMap which is not thread-safe, it can cause NullPointerException and IllegalStateException above.

      Suggestion

      I think it can fix this issue if we change the type of indexPerShard to a thread-safe map (e.g. ConcurrentHashMap).

      Attachments

        Activity

          People

            baeminbo Minbo Bae
            baeminbo Minbo Bae
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 3h 10m
                3h 10m

                Slack

                  Issue deployment