Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-1251

Python 3 Support

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: In Progress
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: sdk-py-core
    • Labels:
      None

      Description

      I have been trying to use google datalab with python3. As I see there are several packages that does not support python3 yet which google datalab depends on. This is one of them.

      https://github.com/GoogleCloudPlatform/DataflowPythonSDK/issues/6

        Attachments

          Issue Links

          1.
          Support Python native types in Beam typehints Sub-task Open Udi Meiri

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 0.5h
          2.
          Make the coders package compatible with Python 3 Sub-task Resolved Luke Zhu  
          3.
          Enable tests to run in Python 3 Sub-task Closed Luke Zhu

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 4.5h
          4.
          Finish io futurize stage 2: fix the missing pylint3 check in tox.ini Sub-task Resolved Matthias Feys

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2.5h
          5.
          Create a tox environment that uses Py3 interpreter for pre/post commit test suites, once codebase supports Py3. Sub-task Resolved Matthias Feys

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h
          6.
          Add an SDK harness container with Python 3 interpreter for portable pipelines. Sub-task Resolved Valentyn Tymofieiev

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 40m
          7.
          Exercise Python 3 SDK harness container in ValidatesContainer Jenkins test suite. Sub-task Resolved Mark Liu

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 4h 10m
          8.
          Finish Python 3 porting for coders module Sub-task Resolved Robbe

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 4h
          9.
          Finish Python 3 porting for examples module Sub-task Resolved Robbe

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2.5h
          10.
          Finish Python 3 porting for internal module Sub-task Resolved Robbe

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 0.5h
          11.
          Finish Python 3 porting for io module Sub-task Closed Juta Staes

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 22h 10m
          12.
          Finish Python 3 porting for metrics module Sub-task Resolved Robbe  
          13.
          Finish Python 3 porting for options module Sub-task Resolved Manu Zhang

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 3h 40m
          14.
          Finish Python 3 porting for portability module Sub-task Resolved Robbe  
          15.
          Finish Python 3 porting for runners module Sub-task Resolved Robbe

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 6h 20m
          16.
          Finish Python 3 porting for testing module Sub-task Resolved Robbe

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 5h 50m
          17.
          Finish Python 3 porting for transforms module Sub-task Resolved Robbe

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 3h
          18.
          Finish Python 3 porting for typehints module Sub-task Closed Robbe

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1.5h
          19.
          Finish Python 3 porting for utils module Sub-task Resolved Robbe

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h
          20.
          Finish Python 3 porting for unpackaged files Sub-task Resolved Robbe

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 4h 10m
          21.
          Add tox suites to exercise unit tests using Python3 interpreter with cython, and with gcp dependencies. Sub-task Resolved Robbe

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h 50m
          22.
          Several tests fail on Python 3 with TypeError: 'cmp' is an invalid keyword argument for this function Sub-task Closed Unassigned

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 3.5h
          23.
          Several tests fail on Python 3 with Failed assert: [<some number>] == [nan] Sub-task Resolved Robbe  
          24.
          Side inputs don't work on Python 3 Sub-task Closed Robert Bradshaw

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1.5h
          25.
          Several tests fail on Python 3 with: unsupported operand type(s) for +: 'int' and 'EmptySideInput' Sub-task Resolved Unassigned  
          26.
          Some tests use assertItemsEqual method, not available in Python 3 Sub-task Resolved Matthias Feys

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 0.5h
          27.
          Several tests fail on Python 3 with TypeError: unorderable types: str() < int() Sub-task Resolved Robbe

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 6h 20m
          28.
          Several tests fail on Python 3 with: Runtime type violation detected Sub-task Closed Unassigned  
          29.
          Several IO tests hang indefinitely during execution on Python 3. Sub-task Resolved Robbe

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 50m
          30.
          Avro IO does not work with avro-python3 package out-of-the-box on Python 3, several tests fail with AttributeError (module 'avro.schema' has no attribute 'parse') Sub-task Resolved Simon

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h
          31.
          Several IO tests fail in Python 3 with RuntimeError('dictionary changed size during iteration',)} Sub-task Resolved Ruoyun Huang

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 6h
          32.
          Redesign test_split_at_fraction_exhaustive tests for Python 3 Sub-task In Progress Unassigned

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 4.5h
          33.
          Several VcfIO tests fail in Python 3 with TypeError: cannot use a string pattern on a bytes-like object Sub-task Open Simon  
          34.
          Several typehints tests fail on Python 3 with ValueError: no signature found for builtin <method 'upper' of 'str' objects> Sub-task Resolved Robbe  
          35.
          Add tox suites for various Python 3 versions (3.5, 3.6, 3.7) Sub-task Closed Robbe

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 5h 20m
          36.
          Default coder breaks with large ints on Python 3 Sub-task Resolved Robert Bradshaw

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h 40m
          37.
          Disable compare parameter in Top.Of() combiner when executing in Python 3. Sub-task Resolved Robert Bradshaw

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h
          38.
          Util test on annotations fails Sub-task Resolved Ruoyun Huang

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 5h 10m
          39.
          Using methods in map is broken on Python 3 Sub-task Resolved Unassigned  
          40.
          Validates runner tests fail with: Cannot convert bytes value to JSON value Sub-task Closed Mark Liu  
          41.
          wordcount_fnapi_it failed on TestDataflowRunner because of JSON string decoding error Sub-task Resolved Mark Liu

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 4h 50m
          42.
          Support DoFns with Keyword-only arguments in Python 3. Sub-task In Progress yoshiki obata

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 8h
          43.
          TFRecordio not Py3 compatible Sub-task Closed Robbe

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h 50m
          44.
          Enable WordCount example on DataflowRunner on Python 3 Sub-task Resolved Mark Liu

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 4h 40m
          45.
          Gradle setupVirtualenv supports Python 3 Sub-task Resolved Mark Liu

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 3h 50m
          46.
          Revert dill pip install from github commit Sub-task Resolved Valentyn Tymofieiev  
          47.
          Gcsio batch delete broken in Python 3 Sub-task Resolved Mark Liu

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h
          48.
          Using --save_main_session fails on Python 3 when main module has superclass constructor calls. Sub-task Open Valentyn Tymofieiev

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 3.5h
          49.
          Opcounters sampling test fails for some random seeds on Python3 Sub-task Resolved Robbe  
          50.
          TypeError in DataflowRunner: dict_values does not support indexing Sub-task Resolved Mark Liu

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 40m
          51.
          Dill fails to pickle avro.RecordSchema classes on Python 3. Sub-task Open Valentyn Tymofieiev

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 7.5h
          52.
          Parallel tox (unit) tests run on Jenkins Sub-task Closed Mark Liu

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 12h 10m
          53.
          BigQuery IO does not work in Python 3 Sub-task Resolved Valentyn Tymofieiev

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h
          54.
          TypeHints Py3 Error: TrivialInferenceTest.testTupleListComprehension fails on Python 3 Sub-task Resolved Udi Meiri  
          55.
          GCS IO tests are very flaky under Python 3.5 Sub-task Resolved Juta Staes

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 3h 50m
          56.
          Dataflow Python runner should use a Python-3 compatible container when starting a Python 3 pipeline. Sub-task Resolved Valentyn Tymofieiev

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 0.5h
          57.
          Add integration test on DirectRunner in Python 3 Sub-task Resolved Mark Liu

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h
          58.
          Beam Python SDK release qualification should verify supported Python 3 versions. Sub-task Closed Valentyn Tymofieiev

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 4h 20m
          59.
          Stager should stage Python 3 wheels for Beam SDK once they are released. Sub-task Closed Valentyn Tymofieiev

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 20m
          60.
          Release Python 3 wheels with first Beam SDK release that supports Python 3. Sub-task Closed Robert Bradshaw  
          61.
          Add PostCommit suite for integration tests on DataflowRunner Sub-task Resolved Mark Liu

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 26h 20m
          62.
          Exercise Dataflow runner integration tests in a postcommit suite for Python 3.5 and 3.6 Sub-task Resolved Juta Staes

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 3h 10m
          63.
          Dataflow ValidatesRunner test suite should also exercise ValidatesRunner tests under Python 3. Sub-task Closed Frederik Bode

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 14h 40m
          64.
          Exercise direct runner integration tests in a postcommit suite for Python 3.5 and 3.6. Sub-task Resolved Juta Staes  
          65.
          SDK source tarball is different when created on Python 2 and Python 3 Sub-task Resolved Valentyn Tymofieiev  
          66.
          Typehinting depends on typing changes in Python 3.5.3 Sub-task Resolved Robbe

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1.5h
          67.
          Bigquery Tornadoes IT is broken in Python3 PostCommit test suite. Sub-task Resolved Pablo Estrada

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 6h 50m
          68.
          Block size difference in avro library on Python3 causes some AvroIO tests to fail. Sub-task Closed Valentyn Tymofieiev

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h 20m
          69.
          BigQuery IO does not support bytes in Python 3 Sub-task Closed Juta Staes

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 20h 50m
          70.
          Add Streaming wordcount test to Dataflow ValidatesContainer test suite Sub-task Open Unassigned  
          71.
          python 3 test_hourly_team_score_it fails with bigquery job id already exists Sub-task Closed Unassigned  
          72.
          test_multimap_side_input in fn_api.runner_test fails on Python 3.6 Sub-task Closed Robbe  
          73.
          Add Python3 performance benchmarks Sub-task Resolved Mark Liu

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 16h 10m
          74.
          Configurable Python interpreter version in Gradle Sub-task Closed Mark Liu

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h
          75.
          Design Py3-compatible typehints annotation support in Beam 3. Sub-task Open Udi Meiri

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 11h 50m
          76.
          Enable use_fastavro experiment on Dataflow Runner for all Py3 jobs. Sub-task Resolved Frederik Bode  
          77.
          Add DirectRunnerIT test suite to Python3 Postcommit suite. Sub-task Closed Juta Staes  
          78.
          TypeError caused by using str variable as header argument in apache_beam.io.textio.WriteToText Sub-task Resolved yoshiki obata

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h
          79.
          Rename ToStringCoder into ToBytesCoder Sub-task Open Francesco Perera  
          80.
          Dataflow runner should set use_fastavro experiment on Python 3. Sub-task Resolved Valentyn Tymofieiev

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h 40m
          81.
          Add ValidatesRunner test suite for Flink on Python 3. Sub-task Open Valentyn Tymofieiev

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2.5h
          82.
          Enable Python3 tests for Spark Sub-task Open Kyle Weaver  
          83.
          Support Py3 Dataclasses Sub-task Closed yoshiki obata

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h
          84.
          Revise BQ integration tests to clearly communicate that BQ IO expects base64-encoded bytes.  Sub-task Resolved Juta Staes  
          85.
          apache_beam.io.avroio_test.TestAvro.test_dynamic_work_rebalancing_exhaustive is very slow Sub-task Resolved Valentyn Tymofieiev

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h 40m
          86.
          Clean up Python 2 codepaths once Beam no longer supports Python 2. Sub-task Open Unassigned  
          87.
          FastAvroTest has slow test_dynamic_exhaustive on Python 2 and 3. Sub-task Closed Unassigned  
          88.
          Create a Wordcount-on-Flink Python 3 test suite. Sub-task Resolved Valentyn Tymofieiev

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 5h 10m
          89.
          Document Python 3 support in Beam starting from 2.14.0 Sub-task Closed Rose Nguyen

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h 10m
          90.
          Add Python 3.6, 3.7 as supported qualifiers to setup.py. Sub-task Resolved Valentyn Tymofieiev

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1.5h
          91.
          Improve Avro IO integration test coverage on Python 3. Sub-task Open Valentyn Tymofieiev

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 5h 10m
          92.
          Add smoke integration tests to Precommit test suites on Python 3 Sub-task Closed Valentyn Tymofieiev

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h
          93.
          Add SDK harness containers for Py 3.6, Py 3.7 Sub-task Open Hannah Jiang  
          94.
          deadlock using save_main_session and logging Sub-task Open Valentyn Tymofieiev  
          95.
          Add integration tests for HDFS Sub-task Resolved Frederik Bode

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 5h 10m
          96.
          Add ITs to check IO behavior with bytes and unicode strings Sub-task Resolved Juta Staes

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h 50m
          97.
          Accept Py3 wheels in SDK harness container. Sub-task Open Unassigned  
          98.
          Unify test suite configuration structure across Py2 and Py 3 suites Sub-task Closed Mark Liu

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 3h
          99.
          Python 3 test parallelization causes test flakines due to ModuleNotFoundError. Sub-task Resolved Mark Liu  
          100.
          Implement support of PEP 484 annotations for user functions in transforms such as ParDo, Combine in Py3. Sub-task Open Udi Meiri  
          101.
          Migrate to "typing" module typing types in Beam typehints (on Py2 and Py3). Sub-task Open Udi Meiri  
          102.
          Allow retries of PostCommit test suites per Python version Sub-task Resolved Valentyn Tymofieiev

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 2h 40m
          103.
          Use a Python3-compatible profiler in apache_beam.utils.profiler Sub-task Open Unassigned  
          104.
          Add key type conversion in from and to client entity in Datastore v1new IO. Sub-task Closed Udi Meiri

          100%

          Original Estimate - Not Specified Original Estimate - Not Specified
          Time Spent - 1h 40m
          105.
          Add Python 2 deprecation warnings starting from 2.17.0 release. Sub-task Open Unassigned  
          106.
          Generate Python SDK docs using Python 3 Sub-task Open Unassigned  
          107.
          UserScore example fails on Python 3.5 as of 2.13.0 and 2.14.0 with Dataflow runner Sub-task Closed Unassigned  
          108.
          Run pylint in Python 3 Sub-task Open Unassigned  
          109.
          Add a Python 3 test scenario for MongoDB IO Sub-task Open Yichi Zhang  

            Activity

              People

              • Assignee:
                tvalentyn Valentyn Tymofieiev
                Reporter:
                eyad.alsibai@gmail.com Eyad Sibai
              • Votes:
                39 Vote for this issue
                Watchers:
                63 Start watching this issue

                Dates

                • Created:
                  Updated:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 343h 20m
                  343h 20m