Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-13905

Apache Beam Python: Dataframe Transforms break when the option runtime_type_check is enabled.

Details

    • Bug
    • Status: Open
    • P2
    • Resolution: Unresolved
    • 2.35.0, 2.36.0, 2.37.0, 2.38.0
    • None
    • dsl-dataframe
    • None
    • OS: Linux
      Python 3.8.12

    Description

      We have discovered a potential bug whereas when you execute a pipeline that contains
      a DataframeTransform with the "runtime_type_check" option set to True, a cryptic
      error is raised by Apache Beam typecheckng.

      Simple example to reproduce the bug:
          

      from apache_beam.options.pipeline_options import PipelineOptions
      from apache_beam import Pipeline, Create, Row
      from apache_beam.dataframe.transforms import DataframeTransform
      pipeline = Pipeline(options=PipelineOptions(runtime_type_check=True))
      pipeline | Create([Row(val1=1)]) | DataframeTransform(lambda df: df)
      pipeline.run()

      This raises a apache_beam.typehints.decorators.TypeCheckError:

      File ".....lib/python3.8/site-packages/apache_beam/typehints/typehints.py", line 416, in check_constraint
          raise SimpleTypeHintError
      apache_beam.typehints.decorators.TypeCheckError: According to type-hint expected output should be of type <class 'apache_beam.typehints.schemas.BeamSchema_118086df_671f_4643_a929_ba65de48e7e8'>. Instead, received 'BeamSchema_118086df_671f_4643_a929_ba65de48e7e8(val1=1)', an instance of type <class 'apache_beam.typehints.schemas.BeamSchema_118086df_671f_4643_a929_ba65de48e7e8'>. [while running 'DataframeTransform/Unbatch 'placeholder_DataFrame_140623617251840'/ParDo(_UnbatchNoIndex)'] 

       

      Attachments

        Activity

          People

            bhulette Brian Hulette
            benwah Benoit Clennett-Sirois

            Dates

              Created:
              Updated:

              Slack

                Issue deployment