Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-47881

Not working HDFS path for hive.metastore.jars.path

    XMLWordPrintableJSON

Details

    • Question
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.4.2
    • None
    • SQL
    • None

    Description

      I trying to use Hive Metastore version 3.1.3 with Spark version 3.4.2, but encountering an error when specifying the path to the metastore JARs on HDFS.

      According to the official documentation, followed the guidelines and specified the path using an HDFS URI:

      spark.sql.hive.metastore.version     3.1.3
      spark.sql.hive.metastore.jars        path
      spark.sql.hive.metastore.jars.path   hdfs://namespace/spark/hive3_lib/* 

      However, when tested it, encountered an error stating that the URI schema in HiveClientImpl.scala is not file.

      Caused by: java.lang.ExceptionInInitializerError: java.lang.IllegalArgumentException: URI scheme is not "file"
        at org.apache.spark.sql.hive.client.HiveClientImpl$.newHiveConf(HiveClientImpl.scala:1296)
        at org.apache.spark.sql.hive.client.HiveClientImpl.newState(HiveClientImpl.scala:174)
        at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:139)
        at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
        at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:315)
        at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:517)
        at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:377)
        at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:70)
        at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:69)
        at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$databaseExists$1(HiveExternalCatalog.scala:223)
        at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
        at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:101)
        ... 143 more 

      To resolve this, changed the spark.sql.hive.metastore.jars.path to a local file path instead of an HDFS path, and it worked fine. I think I followed the instructions correctly, but are there any specific configurations or preferences required to use HDFS paths?

      Attachments

        Activity

          People

            Unassigned Unassigned
            jungho.choi Jungho Choi
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: