Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-5026

Portable flink wordcount fails sometimes due to non-existent source path in FileBasedSink._check_state_for_finalize_write

Details

    Description

      Running portable flink wordcount locally:

      In one terminal:

      ./gradlew :beam-runners-flink_2.11-job-server:runShadow

      In another:

      python -m apache_beam.examples.wordcount --harness_docker_image <image> --input /etc/profile --output /tmp/py-wordcount-direct --experiments=beam_fn_api --runner=PortableRunner --job_endpoint=localhost:8099 --sdk_location=container

      Typically, the first time I run this for a given job-server instance, I see a failure like this (full output):

      File "apache_beam/runners/common.py", line 661, in apache_beam.runners.common._OutputProcessor.process_outputs
      def process_outputs(self, windowed_input_element, results):
      File "apache_beam/runners/common.py", line 676, in apache_beam.runners.common._OutputProcessor.process_outputs
      for result in results:
      File "/usr/local/lib/python2.7/site-packages/apache_beam/io/iobase.py", line 1074, in <genexpr>
      return (window.TimestampedValue(v, window.MAX_TIMESTAMP) for v in outputs)
      File "/usr/local/lib/python2.7/site-packages/apache_beam/io/filebasedsink.py", line 271, in finalize_write
      self._check_state_for_finalize_write(writer_results, num_shards))
      File "/usr/local/lib/python2.7/site-packages/apache_beam/io/filebasedsink.py", line 249, in _check_state_for_finalize_write
      src, dst))
      BeamIOError: src and dst files do not exist. src: /tmp/beam-temp-py-wordcount-direct-6a0d8862908c11e88de8025000000001/5cfa9f22-9246-41fb-adef-ca04d5a5fe50.py-wordcount-direct, dst: /tmp/py-wordcount-direct-00000-of-00001 with exceptions None [while running 'write/Write/WriteImpl/FinalizeWrite'] with exceptions None
      

      This is after a fix to a slightly earlier failure in FileBasedSink documented on BEAM-4742 which I've been working on in #5903.

      It typically occurs only on the first run of wordcount against a given job-server instance.

      I'm curious whether others see this, whether it's some race condition in the FileBasedSink, LocalFileSystem, my macbook's disk, or somewhere else, or whether some temporary directory is getting created on the first run (for each job-server) that explains why subsequent wordcount runs succeed, etc.

      Attachments

        Issue Links

          Activity

            People

              angoenka Ankur Goenka
              rdub Ryan Williams
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 20m
                  20m