Uploaded image for project: 'Thrift'
  1. Thrift
  2. THRIFT-488

ThreadManager crashing bugs

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 0.1
    • None
    • C++ - Library
    • None
    • Mac OS X 10.5.6, Xcode

    • Patch Available

    Description

      The ThreadManager::Impl and the ThreadManager::Worker classes work together to execute client threads. There are race conditions between the two classes that can cause Bus Error exceptions in certain cases. In general, these occur when a ThreadManager instance is destructed, so might not be seen in long running programs. They happen frequently enough, though, that looped repetitions of ThreadManagerTests::blockTest() (part of the concurrency_test program) fail quite often.

      These errors are generally not seen with the current version of ThreadManagerTests::blockTest() due to errors in the test itself that cause failures at a far higher frequency. In order to see them, you need to apply the patches that are attached to THRIFT-487 (https://issues.apache.org/jira/browse/THRIFT-487).

      Test procedure:
      1) Apply the patch from THRIFT-487 for the Tests.cpp file.
      2) Run make in lib/cpp in order to rebuild concurrency_test
      3) Run concurrency_test with the command line argument "thread-manager" and observe that the test hangs in no time.
      4) Apply the patch from THRIFT-487 for the ThreadManagerTests.h file.
      5) Run make in lib/cpp
      6) Run concurrency_test as before. Observe that now it runs for longer (generally) and usually fails with an assert in Monitor.cpp. This failure is because of one of the bugs in ThreadManager.
      7) Apply the attached patch file for ThreadManager.cpp
      8) Run make in lib/cpp
      9) Run concurrency_test as before. It should just run, and run, and run.

      Note that there is a possible path through the original ThreadManager::Worker::run() method where active never becomes true. In practice, exercising this code path is difficuly. The way that I exercised it was to edit line 322 in the patched version of ThreadManager.cpp. I changed the for statement to read:
      for (size_t ix = 0; ix < value + 1; ix++)
      so that the ThreadManager always created more workers than were needed. That extra worker caused quite a bit of trouble until I moved his handling up to the top of the run() method. I don't understand how this situation could occur in real life, but this code appears to handle it correctly.

      Attachments

        1. ThreadManager.cpp.patch
          5 kB
          Rush Manbert
        2. ThreadManagerAssertPatch.txt
          0.9 kB
          Rush Manbert
        3. ThreadManagerWeakPtr.txt
          9 kB
          Rush Manbert

        Issue Links

          Activity

            People

              Unassigned Unassigned
              rush Rush Manbert
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: