Uploaded image for project: 'Apache Storm'
  1. Apache Storm
  2. STORM-3620

Data corruption can happen when components are multi-threaded because of non thread-safe serializer

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 2.0.0, 2.1.0
    • 2.2.0
    • None
    • None

    Description

      OutputCollector is not thread-safe in 2.x.

      It can cause data corruption if multiple threads in the same executor calls OutputCollector to emit data at the same time:

      1. Every executor has an instance of ExecutorTransfer
      https://github.com/apache/storm/blob/00f48d60e75b28e11a887baba02dc77876b2bb3d/storm-client/src/jvm/org/apache/storm/executor/Executor.java#L146

      2. Every ExecutorTransfer has its own serializer

      https://github.com/apache/storm/blob/00f48d60e75b28e11a887baba02dc77876b2bb3d/storm-client/src/jvm/org/apache/storm/executor/ExecutorTransfer.java#L44

      3. Every executor has its own outputCollector

      https://github.com/apache/storm/blob/00f48d60e75b28e11a887baba02dc77876b2bb3d/storm-client/src/jvm/org/apache/storm/executor/bolt/BoltExecutor.java#L146-L147

      4. When outputCollector is called to emit to remote workers, it uses ExecutorTransfer to transfer data

      https://github.com/apache/storm/blob/00f48d60e75b28e11a887baba02dc77876b2bb3d/storm-client/src/jvm/org/apache/storm/executor/ExecutorTransfer.java#L66

      5. which will try to serialize data

      https://github.com/apache/storm/blob/00f48d60e75b28e11a887baba02dc77876b2bb3d/storm-client/src/jvm/org/apache/storm/daemon/worker/WorkerTransfer.java#L116

      6. But serializer is not thread-safe

      https://github.com/apache/storm/blob/00f48d60e75b28e11a887baba02dc77876b2bb3d/storm-client/src/jvm/org/apache/storm/serialization/KryoTupleSerializer.java#L33-L43


      But in the doc, http://storm.apache.org/releases/2.1.0/Concepts.html, it says outputCollector is thread-safe.

      Its perfectly fine to launch new threads in bolts that do processing asynchronously. OutputCollector is thread-safe and can be called at any time.
      

      We should either fix it to make it thread-safe, or update the document to not mislead users

      Attachments

        Issue Links

          Activity

            People

              ethanli Ethan Li
              ethanli Ethan Li
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 10m
                  1h 10m