Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-15829

Use xattr to support HDFS TTL on Observer namenode

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Patch Available
    • Minor
    • Resolution: Unresolved
    • None
    • None
    • dfsclient, namenode
    • None

    Description

      Overview

       
      HDFS TTL is implemented using the xattr mechanism provided by HDFS. When a user sets a TTL to a file or directory, HDFS creates an xattr named "ttl" for the file or directory, and stores the value set by the user in this xattr. A service called TtlService runs on HDFS standby or Observer(Recommended ). It scans the in-memony inode map regularly, reads the value of xattr "ttl" from each INode, and calculates whether the ttl has expired. If so, it will get the full file path from Inode and add it to expired file list. After scan it will create a DFSClient and delete the expired file list in bach. other option is to trigger a Yarn job to delete them in parallel。

      Protocol

      Add two xattr 
      "user.ttl":  value of TTL by minutes, identify the time that file or folder will be expired.
      "user. ttlproperty": value is TTL types, including,

      • SINCELASTWRITE = 0x1       # caculate the TTL from last writing.
      • KEEPEMPTYDIR = 0x2;          # if keep the empty dir
      • KEEPEMPTYSUBDIR = 0x4;  # if keep subdir empty.

       
      Nested TTL
      TTL supports setting for each directory and file on a path, so that after setting, the setting of the lower-level subdirectory or file will take effect. If a directory or file does not have a time to live, it will inherit the settings of the nearest ancestor directory. The following is an illustrative example. Suppose there is such a directory tree:
       

      /A/B/E  
      /A/C  
      /A/D 

       
      That is, B, C and D under directory A. And there is file E under directory B. Suppose the user sets the TTL of A to 2 days, the TTL of B to 3 days, the TTL of E to 1 day, and the TTL of C and D is not set. Then the file E will be cleared after 1 day. After 2 days, C and D will be cleared. The settings inherited from directory A are used here. Please note that at this time, directory A will not be cleared because it is not empty. After 3 days, B will be cleared because its own settings expire. After B is cleared, because A’s settings have already expired and A has become an empty directory, it will also be cleared.

      Usage

      Fro the first version, provide API to set the TTL,  will add comand line later.
       

      /**
       * Set TTL to a file.
       * @param fs the file system.
       * @param path the target file to set TTL.
       * @param path the TTL value.
       * @param property the type of TTL.
       * @throws IOException
       */
      public static void setTTl(FileSystem fs, Path path, int value, int property) 
        

      Example

       

      TtlInfo.setTTl(fs, file, System.currentTimeMillis() / 1000 / 60 + 60, 0); #The file will be expired in an 60 minutes. 
      
      TtlInfo.setTTl(fs, file, 60, TtlInfo.SINCELASTWRITE); #The file will be expired after 60 minutes since last write.

       

      Attachments

        1. HDFS-15829.001.patch
          29 kB
          Yang Yun
        2. HDFS-15829.patch
          29 kB
          Yang Yun

        Activity

          People

            hadoop_yangyun Yang Yun
            hadoop_yangyun Yang Yun
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: