Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-18477 Über-jira: S3A Hadoop 3.3.9 features
  3. HADOOP-18842

Support Overwrite Directory On Commit For S3A Committers

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.4.0
    • None
    • fs/s3

    Description

      The goal is to add a new kind of commit mechanism in which the destination directory is cleared off before committing the file.

      Use Case

      In case of dynamicPartition insert overwrite queries, The destination directory which needs to be overwritten are not known before the execution and hence it becomes a challenge to clear off the destination directory.

       

      One approach to handle this is, The underlying engines/client will clear off all the destination directories before calling the commitJob operation but the issue with this approach is that, In case of failures while committing the files, We might end up with the whole of previous data being deleted making the recovery process difficult or time consuming.

       

      Solution

      Based on mode of commit operation either INSERT or OVERWRITE , During commitJob operations, The committer will map each destination directory with the commits which needs to be added in the directory and if the mode is OVERWRITE , The committer will delete the directory recursively and then commit each of the files in the directory. So in case of failures (worst case) The number of destination directory which will be deleted will be equal to the number of threads if we do it in multi-threaded way as compared to the whole data if it was done in the engine side.

      Attachments

        Issue Links

          Activity

            People

              srahman Syed Shameerur Rahman
              srahman Syed Shameerur Rahman
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated: