Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-13345 S3Guard: Improved Consistency for S3A
  3. HADOOP-14236

S3Guard: S3AFileSystem::rename() should move non-listed sub-directory entries in metadata store

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • None
    • HADOOP-13345
    • fs/s3
    • None

    Description

      After running integration test ITestS3AFileSystemContract, I found the following items are not cleaned up in DynamoDB:

      parent=/mliu-s3guard/user/mliu/s3afilesystemcontract/testRenameDirectoryAsExisting/dir, child=subdir
      parent=/mliu-s3guard/user/mliu/s3afilesystemcontract/testRenameDirectoryAsExistingNew/newdir/subdir, child=file2
      

      At first I thought it’s similar to HADOOP-14226 or HADOOP-14227, and we need to be careful when cleaning up test data.

      Then I found it’s a bug in the code of integrating S3Guard with S3AFileSystem: for rename we miss sub-directory items to put (dest) and delete (src). The reason is that in S3A, we delete those fake directory objects if they are not necessary, e.g. non-empty. So when we list the objects to rename, the object summaries will only return file objects. This has two consequences after rename:

      1. there will be left items for src path in metadata store - left-overs will confuse get(Path) which should return null
      2. we are not persisting the whole subtree for dest path to metadata store - this will break the DynamoDBMetadataStore invariant: if a path exists, all its ancestors will also exist in the table.

      UPDATE: modified test case ITestS3AFileSystemContract:: testRenameDirectoryAsExistingDirectory() will fail w/o this patch; it passes w/ this patch. If the test case makes sense, the proposal follows.
      Existing tests are not complaining about this though. If this is a real bug, let’s address it here.

      Attachments

        1. HADOOP-14236-HADOOP-13345.000.patch
          5 kB
          Mingliang Liu
        2. HADOOP-14236-HADOOP-13345.001.patch
          6 kB
          Mingliang Liu
        3. HADOOP-14236-HADOOP-13345.002.patch
          6 kB
          Mingliang Liu
        4. HADOOP-14236-HADOOP-13345.003.patch
          9 kB
          Mingliang Liu
        5. HADOOP-14236-HADOOP-13345.004.patch
          9 kB
          Mingliang Liu

        Issue Links

          Activity

            People

              liuml07 Mingliang Liu
              liuml07 Mingliang Liu
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: