Uploaded image for project: 'Beam'
  1. Beam
  2. BEAM-2229

GcsFileSystem attempts to create invalid Metadata

Details

    • Bug
    • Status: Resolved
    • P4
    • Resolution: Fixed
    • 2.0.0
    • 2.0.0
    • io-java-gcp
    • None

    Description

      This is the first issue I've raised on Apache's JIRA; if I have made any mistakes in compiling this ticket then I apologise and would welcome any feedback.

      When matching a path spec, GcsFileSystem.toMetadata() will sometimes attempt to build an instance of org.apache.beam.sdk.io.fs.MatchResult.Metadata without first setting sizeBytes[1]. This always results in an error in the autovalue-generated builder for MatchResult.Metadata as sizeBytes is a required field[2].

      I propose that GcsFileSystem set sizeBytes to 0 when there is no size returned by GCS, which will presumably happen when the path spec refers either to a directory, or to a non-existent file. GcsFileSystem.toMetadata() could be updated as follows:

      Before

          if (size != null) {
            ret.setSizeBytes(size.longValue());
          }
      

      After

          if (size != null) {
            ret.setSizeBytes(size.longValue());
          } else {
            ret.setSizeBytes(0);
          }
      

      [1] https://github.com/apache/beam/blob/5bfd3e049c0ca0744165b0243a645e8e427032d5/sdks/java/extensions/google-cloud-platform-core/src/main/java/org/apache/beam/sdk/extensions/gcp/storage/GcsFileSystem.java#L240-L242
      [2] https://gist.github.com/joshdifabio/fe543b97e02e7ddac8edb73be38deb06#file-autovalue_matchresult_metadata-java-L102-L110

      Attachments

        Activity

          People

            dhalperi Dan Halperin
            joshdifabio Josh Di Fabio
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 0.5h
                0.5h
                Remaining:
                Remaining Estimate - 0.5h
                0.5h
                Logged:
                Time Spent - Not Specified
                Not Specified