Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-9743

TFRecordCodec not attempt to fully read/write

Details

    Description

      The same issue has been pointed out and the issues were marked resolved. But they were still remaining parts....

      https://issues.apache.org/jira/browse/BEAM-5412?jql=text%20~%20%22tfrecord%22

       

      Issue # 1: TFRecordCodec only tries once to read the header/footer. This is likely to fail around the end of channel buffer.  

      Issue # 2: (minor) TFRecordCodec currently does not checks how much it writes. 

       

      Seems like it only happens with Zstd compression (or any other picky input stream that refuse to read fully). ZstdInputStream seems very picky at giving out data.

      The parts with the issue are

      https://github.com/apache/beam/blob/c7911043510a266078a3dc8faef7a1dbe1f598c5/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java#L672

      https://github.com/apache/beam/blob/c7911043510a266078a3dc8faef7a1dbe1f598c5/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java#L699

       

      And not so problem within the beam application (As all (or most) of WritableByteChannels in beam-java-sdk-core are backed by some OutputStream), but still not following the WritableByteChannel specification, 

      https://github.com/apache/beam/blob/c7911043510a266078a3dc8faef7a1dbe1f598c5/sdks/java/core/src/main/java/org/apache/beam/sdk/io/TFRecordIO.java#L720-L727

       

      ReadableByteChannel/WritableByteChannel Javadoc specifies that they are not required to read/write fully, and can refuse to read/write time to time.

      Attachments

        Issue Links

          Activity

            People

              lukemin89 Kyoungha Min
              lukemin89 Kyoungha Min
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 3h 10m
                  3h 10m