Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-17114

HDFS Directory Level Access

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • hdfs
    • None

    Description

      Problem: Currently, checking and setting ACLs on file-level is time-consuming and API-intensive for large HDFS clusters with billions of files, particularly for use-cases where permissions and ACLs should be uniform across all nested files within a directory. For example, Hive table files and directories should have the same permissions and ACLs. 

      Solution like default ACLs doesn’t work if:

      1. If a user moves or rename directories with nested files. Moved directory with files don’t inherit default ACLs of the new location.
      2. If a user wants to change access to all files under some path prefixes then the user needs to update permissions and ACLs for all files in the directory. It takes hours or even days if there are millions of files under directory.

       

      Proposed solution: 

      Use ancestor directory POSIX permissions and ACLs to check access to files. When a user tries to access file “/a/b/c.txt” , the new model will use the closest ancestor directory “/a/b” ACLs and permissions to check access to file “c.txt”. If the user doesn’t have access to the directory then there are 2 options:

      1. Fallback to default HDFS file POSIX permission and ACLs check on file level. So the user has access to the file when: [the user has access to ancestor directory] OR [the user has access to file].
      2. Throw AccessControlException.

      The feature can be enabled only for some prefixes or for all files in the HDFS cluster via configuration.

      Idea of solution in diagram:

      Alternative  solutions:

      1. Use federated authorization model for HDFS path prefixes. Implementation:  Apache Ranger and Apache Sentry provides an AuthZ plugin to check access to files. Check is implemented by matching file path to managed resource with path prefix. All files under the prefix path will use the resource policy managed by the framework.The plugin will default to HDFS permissions and ACLs if there is no matching prefix.
        1. Cons: 
          1. Requires set up of external service to manage policies.
          2. Adding external dependency will impact HDFS NN availability.
      2. Similar to the solution of Sentry and Ranger but use native HDFS directory permissions and ACLs instead of federated policies. The problem is to find which directory permissions/ACLs to check for the requested file. There are 2 solutions:
        1. Maintain list of prefixes as Rangers plugin and use permissions and ACLs of prefix directory to check access to all nested files and directories.
        2. Using flags on directory HDFS-15638. For example, set a flag with HDFS Extended Attributes. When a user tries to access a file, HDFS will traverse ancestors and check if there is any directory with the flag. If directory with flag exists: then use directory permissions, otherwise default to file permissions.

      Attachments

        Activity

          People

            Unassigned Unassigned
            arturmys Artur Myseliuk
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: