Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-25738

LOAD DATA INPATH doesn't work if hdfs conf includes port

Rank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersCreate sub-taskConvert to sub-taskLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Blocker
    • Resolution: Fixed
    • 2.4.0
    • 2.4.0
    • SQL
    • None

    Description

      LOAD DATA INPATH throws java.net.URISyntaxException: Malformed IPv6 address at index 8 if your hdfs conf includes a port for the namenode.

      This is because the URI is passing in the value of the hdfs conf "fs.defaultFS" in for the host. Note that variable is called authority, but the 4-arg URI constructor actually expects a host: https://docs.oracle.com/javase/7/docs/api/java/net/URI.html#URI(java.lang.String,%20java.lang.String,%20java.lang.String,%20java.lang.String)

      val defaultFSConf = sparkSession.sessionState.newHadoopConf().get("fs.defaultFS")
      ...
      val newUri = new URI(scheme, authority, pathUri.getPath, pathUri.getFragment)
      

      https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala#L386

      This was introduced by SPARK-23425.

      Workaround: specify a fully qualified path, eg. instead of

      LOAD DATA INPATH '/some/path/on/hdfs'
      

      use

      LOAD DATA INPATH 'hdfs://fizz.buzz.com:8020/some/path/on/hdfs'
      

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            irashid Imran Rashid
            irashid Imran Rashid
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment