Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-14514

Beam python SDK ignores pickle_library option in pipeline.run()

Details

    • Bug
    • Status: Open
    • P2
    • Resolution: Unresolved
    • 2.38.0
    • None
    • sdk-py-core
    • None

    Description

      Context:

      In the Python SDK, you can specify the Pipeline argument --pickle_library which dictates which library to use to pickle variables to send them from the executing machine to the workers (when save_main_session is True).

      Issue:

      pickle_library options is ignored in the pipeline.run() function, which reverts to using dill (the default one).

      https://github.com/apache/beam/blob/master/sdks/python/apache_beam/pipeline.py#L570

      Reproduce:

      Add --pickle_library cloudpickle to pipeline options and notice that dill is used for this session dump, even though cloudpickle is provided.

       

      I found this out because dill parser throws an exception for my use case, but cloud pickle doesn't.

      Attachments

        Activity

          People

            Ryan.Thompson Ryan Thompson
            dctelus dctelus
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated: