Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-23122

FSHDFSUtils#isSameHdfs doesn't handle azure wasb filesystems correctly.

    XMLWordPrintableJSON

Details

    Description

      FSHDFSUtils#isSameHdfs retrieves the Canonical Service Name from Hadoop to determine if source and destination are on the same filesystem. This method "getCanonicalServiceName()" returns IP address for the file system, which can be same for two different file systems but actually there are two separate storage accounts,  which incorrectly causes isSameHdfs to return true even when they are different. 

      It seems this API should not be used  to check if the src and target are in the same filesystem, according to the Hadoop API declaration . The token cache is the only user of the canonical service name, and uses it to lookup this FileSystem's service tokens.

      This error was found while doing a bulk load on hbase from one file system to another file system. Since getCanonicalServiceName() was returning same address for both the storage accounts, the two file systems were getting identified as same filesystem. When the HBase bulk load commands runs, it tries to find the file on the default file system and hence it fails for FileNotFoundException.

       

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            Suman268 suman kumari
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: