Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-1251 Python 3 Support
  3. BEAM-7540

deadlock using save_main_session and logging caused by threading.RLock pickling

    XMLWordPrintableJSON

    Details

    • Type: Sub-task
    • Status: Open
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: sdk-py-core
    • Labels:
      None
    • Environment:
      Python 3.5
      Linux
      apache-beam 2.12.0 & 2.13.0
      dill 0.2.9

      Description

      If you set save_main_session = True and have a logging.Logger instance in your _main_ module, calling a logger method after Pipeline.run has been called, the process will hang and never exit.

      Python 3 Pipeline that reproduces the error (code also available at https://gist.github.com/joar/f021db55eca4fa9e9fd7dfd67cc011b9):

      import logging
      
      import apache_beam as beam
      from apache_beam.options.pipeline_options import PipelineOptions, SetupOptions
      
      _log = logging.getLogger(__name__)
      
      
      def main(argv=None):
          logging.basicConfig(level=logging.INFO)
      
          pipeline_options = PipelineOptions(argv)
      
          setup_options = pipeline_options.view_as(SetupOptions)  # type: SetupOptions
          setup_options.save_main_session = True
      
          _log.info("Running pipeline")
      
          with beam.Pipeline(runner="DirectRunner", options=pipeline_options) as p:
              p | beam.Create(["hello", "world"]) | beam.Map(lambda x: print(x))
      
          print("""
          Call to _log.info will now deadlock, since the logging handler's
          threading.RLock() has been passed through dill.
          
          When you press Ctrl-C, the traceback should confirm that the process is 
          stuck at:
          
            File "/usr/lib/python3.5/logging/__init__.py", line 810, in acquire
              self.lock.acquire()
          """)
          _log.info("Pipeline done")
          print("Launching nukes")
      
      
      if __name__ == '__main__':
          main()
      

       I have opened an issue with dill as well: https://github.com/uqfoundation/dill/issues/321

      This issue does (sadly) not happen on Python 2.

      Just to be clear: A workaround is to not use save_main_session = True.

        Attachments

          Activity

            People

            • Assignee:
              tvalentyn Valentyn Tymofieiev
              Reporter:
              joar Joar Wandborg
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated: