Uploaded image for project: 'Flink'
  1. Flink
  2. FLINK-33694

GCS filesystem does not respect gs.storage.root.url config option

    XMLWordPrintableJSON

Details

    • Hide
      This fix resolves the issue where the `gs.storage.root.url` setting in the Hadoop configuration was not being acknowledged by the Sink. Warning: If you have been using this property to configure the GCS Source, please ensure that your tests or pipelines are not adversely affected by the GCS Sink now also correctly adhering to this configuration.
      Show
      This fix resolves the issue where the `gs.storage.root.url` setting in the Hadoop configuration was not being acknowledged by the Sink. Warning: If you have been using this property to configure the GCS Source, please ensure that your tests or pipelines are not adversely affected by the GCS Sink now also correctly adhering to this configuration.

    Description

      The GCS FileSystem's RecoverableWriter implementation uses the GCS SDK directly rather than going through Hadoop. While support has been added to configure credentials correctly based on the standard Hadoop implementation configuration, no other options are passed through to the underlying client.

      Because this only affects the RecoverableWriter-related codepaths, it can result in very surprising differing behavior whether the FileSystem is being used as a source or a sink—while a gs://-URI FileSource may work fine, a gs://-URI FileSink may not work at all.

      We use fake-gcs-server in testing, and so we override the Hadoop GCS FileSystem config option gs.storage.root.url. However, because this option is not considered when creating the GCS client for the RecoverableWriter codepath, in a FileSink the GCS FileSystem attempts to write to the real GCS service rather than fake-gcs-server. At the same time, a FileSource works as expected, reading from fake-gcs-server.

      The fix should be fairly straightforward, reading the gs.storage.root.url config option from the Hadoop FileSystem config in GSFileSystemOptions and, if set, passing it to storageOptionsBuilder in GSFileSystemFactory.

      The only workaround for this is to build a custom flink-gs-fs-hadoop JAR with a patch and use it as a plugin.

      Attachments

        Activity

          People

            plucas Patrick Lucas
            plucas Patrick Lucas
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: