Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-1251 Python 3 Support
  3. BEAM-8651

Python 3 portable pipelines sometimes fail with errors in StockUnpickler.find_class()

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Resolved
    • Priority: Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.17.0
    • Component/s: sdk-py-core
    • Labels:
      None

      Description

      Several Beam users reported an intermittent error which happens during unpickling in StockUnpickler.find_class. A similar error happens consistently when user's pipelines have instances of super() in their main module, and use --save_main_session, see: BEAM-6158.

      In this case the error happens only sometimes, and super() calls don't play a role.

      So far I've seen reports of the error on Python 3.5, 3.6, and 3.7.1, on Flink and Dataflow runners. On Dataflow runner so far I have seen this in streaming pipelines only, which use portable SDK worker.

      Typical stack trace:

      File "python3.5/site-packages/apache_beam/runners/worker/bundle_processor.py", line 1148, in _create_pardo_operation
          dofn_data = pickler.loads(serialized_fn)                                       
        File "python3.5/site-packages/apache_beam/internal/pickler.py", line 265, in loads
          return dill.loads(s)                                                           
        File "python3.5/site-packages/dill/_dill.py", line 317, in loads                 
          return load(file, ignore)                                                      
        File "python3.5/site-packages/dill/_dill.py", line 305, in load                  
          obj = pik.load()                                                               
        File "python3.5/site-packages/dill/_dill.py", line 474, in find_class            
          return StockUnpickler.find_class(self, module, name)                           
      AttributeError: Can't get attribute 'ClassName' on <module 'ModuleName' from 'python3.5/site-packages/filename.py'>
      

      According to Guenther from [1]:

      This looks exactly like a race condition that we've encountered on Python
      3.7.1: There's a bug in some older 3.7.x releases that breaks the
      thread-safety of the unpickler, as concurrent unpickle threads can access a
      module before it has been fully imported. See
      https://bugs.python.org/issue34572 for more information.

      The traceback shows a Python 3.6 venv so this could be a different issue
      (the unpickle bug was introduced in version 3.7). If it's the same bug then
      upgrading to Python 3.7.3 or higher should fix that issue. One potential
      workaround is to ensure that all of the modules get imported during the
      initialization of the sdk_worker, as this bug only affects imports done by
      the unpickler.

      Opening this for visibility. Current open questions are:

      1. Find a minimal example to reproduce this issue.
      2. Figure out whether users are still affected by this issue on Python 3.7.3.
      3. Communicate a workarounds for 3.5, 3.6 users affected by this.

      [1]
      https://lists.apache.org/thread.html/5581ddfcf6d2ae10d25b834b8a61ebee265ffbcf650c6ec8d1e69408@%3Cdev.beam.apache.org%3E

        Attachments

        1. beam8651.py
          0.4 kB
          Guenther Starnberger

          Issue Links

            Activity

              People

              • Assignee:
                tvalentyn Valentyn Tymofieiev
                Reporter:
                tvalentyn Valentyn Tymofieiev
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 6h 10m
                  6h 10m