Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-23338

Spark unable to run on HDP deployed Azure Blob File System

    XMLWordPrintableJSON

Details

    • Important

    Description

      Hello,

      It is impossible to run Spark on the BLOB storage file system deployed on HDP.
      I am unable to run Spark as it is giving errors related to HiveSessionState, HiveExternalCatalog and various Azure File storage exceptions.
      I request you to kindly help in case you have a suggestion to address this. Or is it that my exercise is futile and Spark is not configured to run on BLOB storage after all.

      Thanks in advance.

       

      Detailed Description:

       

      We are unable to access spark/spark2 when we change the file system storage form HDFS to WASB. We are using HDP 2.6 platform and running Hadoop 2.7.3. All other services are working fine.

      I have set the following configurations:

      HDFS:

      core-site-

      fs.defaultFS = wasb://CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net

      fs.AbstractFileSystem.wasb.impl = org.apache.hadoop.fs.azure.Wasb

      fs.AbstractFileSystem.wasbs.impl = org.apache.hadoop.fs.azure.Wasbs

      fs.azure.selfthrottling.read.factor = 1.0

      fs.azure.selfthrottling.write.factor = 1.0

      fs.azure.account.key.STORAGE_ACCOUNT_NAME.blob.core.windows.net = KEY

      spark.hadoop.fs.azure.account.key.STORAGE_ACCOUNT_NAME.blob.core.windows.net = KEY

      SPARK2:

      spark.eventLog.dir = wasb://CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net/spark2-history/

      spark.history.fs.logDirectory = wasb://CONTAINER@STORAGE_ACCOUNT_NAME.blob.core.windows.net/spark2-history/

      In spite of trying multiple times and irrespective of alternative configurations, the spark-shell command is yielding the below results:

      $ spark-shell
      Setting default log level to "WARN".
      To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
      java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':
      at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:983)
      at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:110)
      at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109)
      at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878)
      at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878)
      at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
      at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
      at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
      at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
      at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
      at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:878)
      at org.apache.spark.repl.Main$.createSparkSession(Main.scala:96)
      ... 47 elided
      Caused by: java.lang.reflect.InvocationTargetException: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
      at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
      at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
      at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
      at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
      at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:980)
      ... 58 more
      Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
      at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:176)
      at org.apache.spark.sql.internal.SharedState.<init>(SharedState.scala:86)
      at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
      at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
      at scala.Option.getOrElse(Option.scala:121)
      at org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:101)
      at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:100)
      at org.apache.spark.sql.internal.SessionState.<init>(SessionState.scala:157)
      at org.apache.spark.sql.hive.HiveSessionState.<init>(HiveSessionState.scala:32)
      ... 63 more
      Caused by: java.lang.reflect.InvocationTargetException: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: org.apache.hadoop.fs.azure.AzureException: java.util.NoSuchElementException: An error occurred while enumerating the result, check the original exception for details.
      at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
      at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
      at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
      at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
      at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:173)
      ... 71 more
      Caused by: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: org.apache.hadoop.fs.azure.AzureException: java.util.NoSuchElementException: An error occurred while enumerating the result, check the original exception for details.
      at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
      at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
      at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
      at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
      at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
      at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:358)
      at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:262)
      at org.apache.spark.sql.hive.HiveExternalCatalog.<init>(HiveExternalCatalog.scala:65)
      ... 76 more
      Caused by: java.lang.RuntimeException: org.apache.hadoop.fs.azure.AzureException: java.util.NoSuchElementException: An error occurred while enumerating the result, check the original exception for details.
      at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
      at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:188)
      ... 84 more
      Caused by: org.apache.hadoop.fs.azure.AzureException: java.util.NoSuchElementException: An error occurred while enumerating the result, check the original exception for details.
      at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.retrieveMetadata(AzureNativeFileSystemStore.java:2027)
      at org.apache.hadoop.fs.azure.NativeAzureFileSystem.getFileStatus(NativeAzureFileSystem.java:2081)
      at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1447)
      at org.apache.hadoop.fs.azure.NativeAzureFileSystem.conditionalRedoFolderRename(NativeAzureFileSystem.java:2137)
      at org.apache.hadoop.fs.azure.NativeAzureFileSystem.getFileStatus(NativeAzureFileSystem.java:2104)
      at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1447)
      at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:596)
      at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
      at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
      ... 85 more
      Caused by: java.util.NoSuchElementException: An error occurred while enumerating the result, check the original exception for details.
      at com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:113)
      at org.apache.hadoop.fs.azure.StorageInterfaceImpl$WrappingIterator.hasNext(StorageInterfaceImpl.java:130)
      at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.retrieveMetadata(AzureNativeFileSystemStore.java:2006)
      ... 93 more
      Caused by: com.microsoft.azure.storage.StorageException: The server encountered an unknown failure: OK
      at com.microsoft.azure.storage.StorageException.translateException(StorageException.java:101)
      at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:199)
      at com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:109)
      ... 95 more
      Caused by: java.lang.ClassCastException: org.apache.xerces.parsers.XIncludeAwareParserConfiguration cannot be cast to org.apache.xerces.xni.parser.XMLParserConfiguration
      at org.apache.xerces.parsers.SAXParser.<init>(Unknown Source)
      at org.apache.xerces.parsers.SAXParser.<init>(Unknown Source)
      at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.<init>(Unknown Source)
      at org.apache.xerces.jaxp.SAXParserImpl.<init>(Unknown Source)
      at org.apache.xerces.jaxp.SAXParserFactoryImpl.newSAXParser(Unknown Source)
      at com.microsoft.azure.storage.core.Utility.getSAXParser(Utility.java:668)
      at com.microsoft.azure.storage.blob.BlobListHandler.getBlobList(BlobListHandler.java:72)
      at com.microsoft.azure.storage.blob.CloudBlobContainer$6.postProcessResponse(CloudBlobContainer.java:1284)
      at com.microsoft.azure.storage.blob.CloudBlobContainer$6.postProcessResponse(CloudBlobContainer.java:1248)
      at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:146)
      ... 96 more
      <console>:14: error: not found: value spark
      import spark.implicits._
      ^
      <console>:14: error: not found: value spark
      import spark.sql

       

       

       

      It would be immensely helpful if anyone could assist in resolving the above. It may happen that we have missed out on configuring an important aspect of HDFS or Spark, as a result of which it is unable to locate certain JARS and is getting incompatible with the BLOB storage.

      Kindly assist !

      PS: I have made sure the required jars of azure-storage and Hadoop-azure are made available in the spark and the Hadoop lib folders. I have even tried to specify the same explicitly when starting spark-shell but to no effect.

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            Subham Subhankar
            Subhankar Subhankar
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: