Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-19138

CSE-KMS S3A: Support for InstructionFile to store ECEK meta info

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • command, tools
    • None

    Description

      Task: Support for InstructionFile to store ECEK meta info 

      Current implementation/Context:  

      Hadoop-aws supports CSE-KMS. During CSE, key encryption info needs to be kept somewhere. AWS SDK supports two ways:

      1. S3 Object's metadata : Current integration in haddop-aws only supports this approach.
        1. But S3 metadata has limitation of 2 KB size.
        2. Also, metadata can not be updated independently. It would be complete object read/write operation even if we only need to change the metadata.  
      2. Instruction file approach: It's a small file containing meta-info in the same bucket at the same location. This approach needs one extra trip to S3 Read/Write operation but could be useful if business needs frequent metadata changes.

      Use case: to implement KMS RE-ENCRYPT, where only CEK(DEK) needs to be encrypted with new key material. Here instruction file approach could be useful.

      Plus there could be many other use cases based on different business needs.

      My analysis: I tried to enable this by setting CryptoStorageMode.InstructionFile in 

      CryptoConfigurationV2 while building AmazonS3EncryptionClientV2Builder. 

      Note: ObjectMetadata is the default value.

      Result: Write operation worked but read failed due to missing instruction file.

      RCA: On debugging, I found following:

      On put request, say myfile.txt : 

      • First , S3AFileSystem writes the file to S3 like myfile.txt_COPYING_
      • Second, it writes the corresponding instruction file as  myfile.txt_COPYING_.instruction
      • Third, it calls rename.
        • Rename here means copy the file bytes to myfile.txt and
        • delete the myfile.txt_COPYING
      • Here problem occurs, 
        • AmazonS3EncryptionClientV2 class, after deleting any file it looks for corresponding instruction file and if found it deletes that one also. As a result, it deletes myfile.txt_COPYING_.instruction as well.

      Related  Code:

      com.amazonaws.services.s3.AmazonS3EncryptionClientV2.deleteObject() // part of aws sdk bundle

      Possible solution: S3AFileSystem (part of hadoop-aws) needs to be updated to first rename the instruction file , then the original file. This way deletion of instruction file can be avoided.

      It also requires config changes to take Objemetadata/InstructionFile as config parameter.

      Let's discuss if we have any better solution and can be incorporated.

      Once we agree on one common solution, I can work on implementation part.

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            vikkumar Vikas Kumar
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: