Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-6068

Wordcount example fails to read from gcs shakespare text file

Details

    • Bug
    • Status: Resolved
    • P2
    • Resolution: Not A Problem
    • None
    • 2.9.0
    • sdk-py-core
    • None

    Description

      Symptom: 

      In a synced-to-head repo, following command fails:

      python m apache_beam.examples.wordcount   input gs://dataflow-samples/shakespeare/kinglear.txt   output gs://$USER-test/tmp   runner DataflowRunner   project google.com:clouddfe   temp_location gs://$USER-test/temp-it   experiment beam_fn_api   -sdk_location dist/apache-beam-2.9.0.dev0.tar.gz

       

      error message being: 

      File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
      "_main_", fname, loader, pkg_name)
      File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
      exec code in run_globals
      File "/usr/local/google/home/ruoyun/projects/beam2/sdks/python/apache_beam/examples/wordcount.py", line 136, in <module>
      run()
      File "/usr/local/google/home/ruoyun/projects/beam2/sdks/python/apache_beam/examples/wordcount.py", line 90, in run
      lines = p | 'read' >> ReadFromText(known_args.input)
      File "apache_beam/io/textio.py", line 524, in _init_
      skip_header_lines=skip_header_lines)
      File "apache_beam/io/textio.py", line 119, in _init_
      validate=validate)
      File "apache_beam/io/filebasedsource.py", line 121, in _init_
      self._validate()
      File "apache_beam/options/value_provider.py", line 137, in _f
      return fnc(self, *args, **kwargs)
      File "apache_beam/io/filebasedsource.py", line 178, in _validate
      match_result = FileSystems.match([pattern], limits=[1])[0]
      File "apache_beam/io/filesystems.py", line 187, in match
      return filesystem.match(patterns, limits)
      File "apache_beam/io/filesystem.py", line 705, in match
      raise BeamIOError("Match operation failed", exceptions)
      apache_beam.io.filesystem.BeamIOError: Match operation failed with exceptions {'gs://dataflow-samples/shakespeare/kinglear.txt': TypeError("_init_() got an unexpected keyword argument 'response_encoding'",)}

       

       

      However, I can run the similar command by reverting to 2.8 release and rebuild everything. This command succeeds: 

      python m apache_beam.examples.wordcount   input=gs://dataflow-samples/shakespeare/kinglear.txt  --output=gs://test$USER/portable/   -runner DataflowRunner --project $GCP_PROJECT  --staging_location gs://test$USER/staging_wc -temp_location gs://test$USER/tmp \ --sdk_location=./dist/apache-beam-2.8.0.dev0.tar.gz

       

       

       

       

      Attachments

        Activity

          People

            markflyhigh Mark Liu
            ruoyun Ruoyun Huang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: