Details
-
Bug
-
Status: Open
-
P2
-
Resolution: Unresolved
-
2.38.0
-
None
-
None
Description
Context:
In the Python SDK, you can specify the Pipeline argument --pickle_library which dictates which library to use to pickle variables to send them from the executing machine to the workers (when save_main_session is True).
Issue:
pickle_library options is ignored in the pipeline.run() function, which reverts to using dill (the default one).
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/pipeline.py#L570
Reproduce:
Add --pickle_library cloudpickle to pipeline options and notice that dill is used for this session dump, even though cloudpickle is provided.
I found this out because dill parser throws an exception for my use case, but cloud pickle doesn't.