Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-44339

spark3-shell errors org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table <hive_table_name>. Permission denied: user [AD user] does not have [SELECT] privilege on [<database>/<hive table>] when reads hive view

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 3.3.0
    • None
    • Spark Shell, Spark Submit
    • None
    • CDP 7.1.7 Ranger, kerberized and hadoop impersonation enabled.

    Description

      Problem statement 

      A hive view is created using beeline to restrict the users from accessing the original hive table since the data contains sensitive information. 

      For illustration purpose, let's consider a sensitive table as emp_db.employee with columns id, name, salary created through beeline by user 'userA'

       

      create external table emp_db.employee (id int, name string, salary double) location '<hdfs_path>'

       

      A view is created using beeline by the same user 'userA'

       

      ate view empview_db.emp_v  as select id,name from emp_db.employee' 

       

      From Ranger UI, we define a policy under Hadoop SQL Policies that will let 'userB' to access database - empview_db  and table - emp_v with SELECT permission.

       

      Steps to replicate 

      1. ssh to edge node where beeline is available using userB
      2. Try executing following queries
        1. select * from emp_db.employee  ;
        2. desc formatted empview_db.emp_v;
        3. Above queries works fine without any issues.
      3. Now, try using spark3-shell using userB 
        # spark3-shell --deploy-mode client
        Setting default log level to "WARN".
        To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
        23/07/08 01:24:09 WARN HiveConf: HiveConf of name hive.masking.algo does not exist
        Spark context Web UI available at http://xxxxxxx:4040
        Spark context available as 'sc' (master = yarn, app id = application_xxx_xxx).
        Spark session available as 'spark'.
        Welcome to
              ____              __
             / __/__  ___ _____/ /__
            _\ \/ _ \/ _ `/ __/  '_/
           /___/ .__/\_,_/_/ /_/\_\   version 3.3.0.3.3.7180.0-274
              /_/
                 
        Using Scala version 2.12.15 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_181)
        Type in expressions to have them evaluated.
        Type :help for more information.scala> spark.table("empview_db.emp_v").schema
        23/07/08 01:24:30 WARN HiveClientImpl: Detected HiveConf hive.execution.engine is 'tez' and will be reset to 'mr' to disable useless hive logic
        Hive Session ID = b1e3c813-aea9-40da-9012-949e82d4205e
        org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table employee. Permission denied: user [userB] does not have [SELECT] privilege on [emp_db/employee]
          at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:110)
          at org.apache.spark.sql.hive.HiveExternalCatalog.tableExists(HiveExternalCatalog.scala:877)
          at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.tableExists(ExternalCatalogWithListener.scala:146)
          at org.apache.spark.sql.catalyst.catalog.SessionCatalog.tableExists(SessionCatalog.scala:488)
          at org.apache.spark.sql.catalyst.catalog.SessionCatalog.requireTableExists(SessionCatalog.scala:224)
          at org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableRawMetadata(SessionCatalog.scala:514)
          at org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableMetadata(SessionCatalog.scala:500)
          at org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog.loadTable(V2SessionCatalog.scala:66)
          at org.apache.spark.sql.connector.catalog.CatalogV2Util$.loadTable(CatalogV2Util.scala:311)
          at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.$anonfun$lookupRelation$3(Analyzer.scala:1206)
          at scala.Option.orElse(Option.scala:447)
          at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.$anonfun$lookupRelation$1(Analyzer.scala:1205)
          at scala.Option.orElse(Option.scala:447)
          at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupRelation(Analyzer.scala:1197)
          at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$13.applyOrElse(Analyzer.scala:1068)
          at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$13.applyOrElse(Analyzer.scala:1032)
          at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$3(AnalysisHelper.scala:138)
          at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
          at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$1(AnalysisHelper.scala:138)
          at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:323)
          at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning(AnalysisHelper.scala:134)
          at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning$(AnalysisHelper.scala:130)
          at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUpWithPruning(LogicalPlan.scala:30)
          at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$2(AnalysisHelper.scala:135)
          at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1228)
          at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1227)
          at org.apache.spark.sql.catalyst.plans.logical.OrderPreservingUnaryNode.mapChildren(LogicalPlan.scala:208)
          at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$1(AnalysisHelper.scala:135)
          at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:323)
          at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning(AnalysisHelper.scala:134)
          at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning$(AnalysisHelper.scala:130)
          at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUpWithPruning(LogicalPlan.scala:30)
          at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$2(AnalysisHelper.scala:135)
          at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1228)
          at org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1227)
          at org.apache.spark.sql.catalyst.plans.logical.OrderPreservingUnaryNode.mapChildren(LogicalPlan.scala:208)
          at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$1(AnalysisHelper.scala:135)
          at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:323)
          at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning(AnalysisHelper.scala:134)
          at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning$(AnalysisHelper.scala:130)
          at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUpWithPruning(LogicalPlan.scala:30)
          at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:1032)
          at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:991)
          at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
          at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
          at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
          at scala.collection.immutable.List.foldLeft(List.scala:91)
          at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
          at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
          at scala.collection.immutable.List.foreach(List.scala:431)
          at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
          at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:227)
          at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.$anonfun$resolveViews$2(Analyzer.scala:1012)
          at org.apache.spark.sql.internal.SQLConf$.withExistingConf(SQLConf.scala:158)
          at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.$anonfun$resolveViews$1(Analyzer.scala:1012)
          at org.apache.spark.sql.catalyst.analysis.AnalysisContext$.withAnalysisContext(Analyzer.scala:166)
          at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$resolveViews(Analyzer.scala:1004)
          at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$resolveViews(Analyzer.scala:1020)
          at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$13.$anonfun$applyOrElse$47(Analyzer.scala:1068)
          at scala.Option.map(Option.scala:230)
          at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$13.applyOrElse(Analyzer.scala:1068)
          at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$13.applyOrElse(Analyzer.scala:1032)
          at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$3(AnalysisHelper.scala:138)
          at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
          at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$1(AnalysisHelper.scala:138)
          at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:323)
          at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning(AnalysisHelper.scala:134)
          at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning$(AnalysisHelper.scala:130)
          at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUpWithPruning(LogicalPlan.scala:30)
          at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:1032)
          at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.apply(Analyzer.scala:991)
          at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$2(RuleExecutor.scala:211)
          at scala.collection.LinearSeqOptimized.foldLeft(LinearSeqOptimized.scala:126)
          at scala.collection.LinearSeqOptimized.foldLeft$(LinearSeqOptimized.scala:122)
          at scala.collection.immutable.List.foldLeft(List.scala:91)
          at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1(RuleExecutor.scala:208)
          at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$execute$1$adapted(RuleExecutor.scala:200)
          at scala.collection.immutable.List.foreach(List.scala:431)
          at org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:200)
          at org.apache.spark.sql.catalyst.analysis.Analyzer.org$apache$spark$sql$catalyst$analysis$Analyzer$$executeSameContext(Analyzer.scala:227)
          at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$execute$1(Analyzer.scala:223)
          at org.apache.spark.sql.catalyst.analysis.AnalysisContext$.withNewAnalysisContext(Analyzer.scala:172)
          at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:223)
          at org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:187)
          at org.apache.spark.sql.catalyst.rules.RuleExecutor.$anonfun$executeAndTrack$1(RuleExecutor.scala:179)
          at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:88)
          at org.apache.spark.sql.catalyst.rules.RuleExecutor.executeAndTrack(RuleExecutor.scala:179)
          at org.apache.spark.sql.catalyst.analysis.Analyzer.$anonfun$executeAndCheck$1(Analyzer.scala:208)
          at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.markInAnalyzer(AnalysisHelper.scala:330)
          at org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:207)
          at org.apache.spark.sql.execution.QueryExecution.$anonfun$analyzed$1(QueryExecution.scala:76)
          at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
          at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$2(QueryExecution.scala:186)
          at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:511)
          at org.apache.spark.sql.execution.QueryExecution.$anonfun$executePhase$1(QueryExecution.scala:186)
          at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
          at org.apache.spark.sql.execution.QueryExecution.executePhase(QueryExecution.scala:185)
          at org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:76)
          at org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:74)
          at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:66)
          at org.apache.spark.sql.Dataset$.$anonfun$ofRows$1(Dataset.scala:91)
          at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
          at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:89)
          at org.apache.spark.sql.DataFrameReader.table(DataFrameReader.scala:607)
          at org.apache.spark.sql.SparkSession.table(SparkSession.scala:600)
          ... 47 elided
        Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to fetch table employee. Permission denied: user [userB] does not have [SELECT] privilege on [emp_db/employee]
          at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1462)
          at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1411)
          at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1391)
          at org.apache.spark.sql.hive.client.Shim_v0_12.getTable(HiveShim.scala:639)
          at org.apache.spark.sql.hive.client.HiveClientImpl.getRawTableOption(HiveClientImpl.scala:429)
          at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$tableExists$1(HiveClientImpl.scala:444)
          at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
          at org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:321)
          at org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:248)
          at org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:247)
          at org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:301)
          at org.apache.spark.sql.hive.client.HiveClientImpl.tableExists(HiveClientImpl.scala:444)
          at org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$tableExists$1(HiveExternalCatalog.scala:877)
          at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
          at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:101)
          ... 151 more
        Caused by: org.apache.hadoop.hive.metastore.api.MetaException: Permission denied: user [userB] does not have [SELECT] privilege on [emp_db/employee]
          at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_req_result$get_table_req_resultStandardScheme.read(ThriftHiveMetastore.java)
          at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_req_result$get_table_req_resultStandardScheme.read(ThriftHiveMetastore.java)
          at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$get_table_req_result.read(ThriftHiveMetastore.java)
          at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:88)
          at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_get_table_req(ThriftHiveMetastore.java:2378)
          at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_table_req(ThriftHiveMetastore.java:2365)
          at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getTable(HiveMetaStoreClient.java:2047)
          at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.getTable(SessionHiveMetaStoreClient.java:206)
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          at java.lang.reflect.Method.invoke(Method.java:498)
          at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:213)
          at com.sun.proxy.$Proxy48.getTable(Unknown Source)
          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
          at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
          at java.lang.reflect.Method.invoke(Method.java:498)
          at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:3514)
          at com.sun.proxy.$Proxy48.getTable(Unknown Source)
          at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:1453)
          ... 165 more
        
        

      Expected behavior - we want spark to behave just like beeline where SELECT * from <view-name> and DESC formatted <view-name> on view works fine without any errors. 

      The CDP 7.1.7 documentation link https://docs.cloudera.com/cdp-private-cloud-base/7.1.7/developing-spark-applications/topics/spark-interaction-with-hive-views.html?  describes 'Interacting Hive Views'. However, the explanation doesn't fit well with the behavior we see from spark3-shell for hive views.

      Looking forward for feedback and inputs that may unblock my use case. Please let me know if  you need any further information. 

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            amarrocks85 Amar Gurung
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: