Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-11871

INSERT statement does not respect Ranger policies for HDFS

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • Frontend
    • None
    • ghx-label-12

    Description

      In a cluster with Ranger auth (and with legacy catalog mode), even if you provide RWX to cm_hdfs -> all-path for the user impala, inserting into a table whose HDFS POSIX permissions happen to exclude impala access will result in an

      "AnalysisException: Unable to INSERT into target table (default.t1) because Impala does not have WRITE access to HDFS location: hdfs://nightly-71x-vx-2.nightly-71x-vx.root.hwx.site:8020/warehouse/tablespace/external/hive/t1"

       

      [root@nightly-71x-vx-3 ~]# hdfs dfs -getfacl /warehouse/tablespace/external/hive/t1
      
      file: /warehouse/tablespace/external/hive/t1 
      owner: hive 
      group: supergroup
      user::rwx
      user:impala:rwx #effective:r-x
      group::rwx #effective:r-x
      mask::r-x
      other::---
      default:user::rwx
      default:user:impala:rwx
      default:group::rwx
      default:mask::rwx
      default:other::--- 

      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

      ANALYSIS

      Stack trace from a version of Cloudera's distribution of Impala (impalad version 3.4.0-SNAPSHOT RELEASE (build db20b59a093c17ea4699117155d58fe874f7d68f)):

      at org.apache.impala.catalog.FeFsTable$Utils.checkWriteAccess(FeFsTable.java:585)
      at org.apache.impala.analysis.InsertStmt.analyzeWriteAccess(InsertStmt.java:545)
      at org.apache.impala.analysis.InsertStmt.analyze(InsertStmt.java:391)
      at org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:463)
      at org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:426)
      at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1570)
      at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1536)
      at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1506)
      at org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:155)

      The exception occurs at analysis time, so I tested and succeeded in writing directly into the said directory.

      [root@nightly-71x-vx-3 ~]# hdfs dfs -touchz /warehouse/tablespace/external/hive/t1/test
      [root@nightly-71x-vx-3 ~]# hdfs dfs -ls /warehouse/tablespace/external/hive/t1/
      Found 8 items
      rw-rw---+ 3 hive supergroup 417 2023-01-27 17:37 /warehouse/tablespace/external/hive/t1/000000_0
      rw-rw---+ 3 hive supergroup 417 2023-01-27 17:44 /warehouse/tablespace/external/hive/t1/000000_0_copy_1
      rw-rw---+ 3 hive supergroup 417 2023-01-27 17:49 /warehouse/tablespace/external/hive/t1/000000_0_copy_2
      rw-rw---+ 3 hive supergroup 417 2023-01-27 17:53 /warehouse/tablespace/external/hive/t1/000000_0_copy_3
      rw-rw---+ 3 impala hive 355 2023-01-27 17:17 /warehouse/tablespace/external/hive/t1/4c4477c12c51ad96-3126b52d00000000_2029811630_data.0.parq
      rw-rw---+ 3 impala hive 355 2023-01-27 17:39 /warehouse/tablespace/external/hive/t1/9945b25bb37d1ff2-473c147800000000_574471191_data.0.parq
      drwxrwx---+ - impala hive 0 2023-01-27 17:39 /warehouse/tablespace/external/hive/t1/_impala_insert_staging
      rw-rw---+ 3 impala supergroup 0 2023-01-27 18:01 /warehouse/tablespace/external/hive/t1/test

      Reviewing the code[1], I traced the TAccessLevel to the catalogd. And if I add user impala to group supergroup on the catalogd host, this query will succeed past the authorization.

      Additionally, this query does not trip up during analysis when catalog v2 is enabled because the method getFirstLocationWithoutWriteAccess() is not implemented there yet and always returns null[2].

      [1] https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L494-L504

      [2] https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java#L295-L298

      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

      Ideally, when Ranger authorization is in place, we should:
      1) Not check access level during analysis
      2) Incorporate Ranger ACLs during analysis

      Attachments

        Issue Links

          Activity

            People

              fangyurao Fang-Yu Rao
              fangyurao Fang-Yu Rao
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated: