Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-13618

IllegalArgumentException when accessing Swift object with name containing space character

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Patch Available
    • Major
    • Resolution: Unresolved
    • 2.6.0
    • 3.0.0-alpha4
    • fs/swift
    • None
    • Linux EL6

    • Patch

    Description

      We are using Spark and hadoop-openstack-2.6.0.jar (compile('org.apache.hadoop:hadoop-openstack:2.6.0')) to access Oracle Storage Service which is Swift-based:

      DataFrame df = hiveCtx.read().format("com.databricks.spark.csv").option(...).load(objectName);

      When accessing a Swift URL like "swift://Linda.oracleswift/non-matching records.csv" where the object name "non-matching records.csv" contains a space character, the following exception is thrown:

      2016-08-23 15:56:03 DEBUG SwiftNativeFileSystem:126 - SwiftFileSystem initialized
      java.lang.IllegalArgumentException: Illegal character in path at index 13: /non-matching records.csv
      at java.net.URI.create(URI.java:859)
      at org.apache.hadoop.fs.swift.util.SwiftObjectPath.<init>(SwiftObjectPath.java:59)
      at org.apache.hadoop.fs.swift.util.SwiftObjectPath.fromPath(SwiftObjectPath.java:183)
      at org.apache.hadoop.fs.swift.util.SwiftObjectPath.fromPath(SwiftObjectPath.java:145)
      at org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystemStore.toObjectPath(SwiftNativeFileSystemStore.java:434)
      at org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystemStore.getObjectMetadata(SwiftNativeFileSystemStore.java:211)
      at org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystemStore.getObjectMetadata(SwiftNativeFileSystemStore.java:181)
      at org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystem.getFileStatus(SwiftNativeFileSystem.java:173)
      at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:64)
      at org.apache.hadoop.fs.Globber.doGlob(Globber.java:272)
      at org.apache.hadoop.fs.Globber.glob(Globber.java:151)
      at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1653)
      at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:259)
      ...

      Apparently it is complaining about the space character. However, checking the debug messages earlier before this error is raised we can see:

      2016-08-23 15:56:03 DEBUG SwiftNativeFileSystem:122 - Initializing SwiftNativeFileSystem against URI swift://Linda.oracleswift/non-matching%20records.csv and working dir swift://Linda.oracleswift/user/syang
      2016-08-23 15:56:03 DEBUG RestClientBindings:141 - Filesystem swift://Linda.oracleswift/non-matching%20records.csv is using configuration keys fs.swift.service.oracleswift
      ...

      The space character has already been encoded into "%20" and so it seems the Swift URL enters into SwiftNativeFileSystem is properly encoded.
      Because of this error any Swift object with file name contains space character (and may be slash '/' character as well?) cannot be accessed.

      As an additional data point, if we first encode the object name("non-matching records.csv"=>"non-matching%20records.csv") before giving it to OpenStack Swift API, a different error is raised. This time somehow the path separator '/' after the container name 'Linda' got encoded by SwiftNativeFileSystemStore:

      2016-08-23 10:56:41 DEBUG SwiftRestClient:1731 - Status code = 400
      2016-08-23 10:56:41 DEBUG SwiftRestClient:1445 - Method HEAD on https://storage.oraclecorp.com/v1/Storage-dfisher/Linda%2Fnon-matching%20records.csv failed, status code: 400, status line: HTTP/1.1 400 Bad Request
      BadRequest: Bad request against https://storage.oraclecorp.com/v1/Storage-dfisher/Linda%2Fnon-matching%20records.csv HEAD https://storage.oraclecorp.com/v1/Storage-dfisher/Linda%2Fnon-matching%20records.csv => 400
      at org.apache.hadoop.fs.swift.http.SwiftRestClient.buildException(SwiftRestClient.java:1456)
      at org.apache.hadoop.fs.swift.http.SwiftRestClient.perform(SwiftRestClient.java:1403)
      at org.apache.hadoop.fs.swift.http.SwiftRestClient.headRequest(SwiftRestClient.java:1016)
      at org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystemStore.stat(SwiftNativeFileSystemStore.java:257)
      at org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystemStore.getObjectMetadata(SwiftNativeFileSystemStore.java:212)
      at org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystemStore.getObjectMetadata(SwiftNativeFileSystemStore.java:181)
      at org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystem.getFileStatus(SwiftNativeFileSystem.java:173)

      So here it always error out whether the Swift object name is URL-encoded or not.

      Attachments

        1. avro_test.zip
          6.12 MB
          Steve Yang
        2. HADOOP-13618.patch
          2 kB
          Yulei Li

        Activity

          People

            charlse Yulei Li
            syang Steve Yang
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: