Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-18477 Über-jira: S3A Hadoop 3.3.9 features
  3. HADOOP-18651

Add "versions" tool to s3a command line entry point

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.3.9
    • None
    • fs/s3
    • None

    Description

      having just implemented some version command support in the cloudstore jar, I can see benefit in actually implementing it in hadoop-aws module

      https://github.com/steveloughran/cloudstore/blob/trunk/src/main/site/versioned-objects.md

      https://github.com/steveloughran/cloudstore/blob/trunk/src/main/extra/org/apache/hadoop/fs/s3a/extra/)

      this code

      • uses v1 sdk by asking the s3a fs for it; this will break with the move to v2 sdk
      • doesn't have any tests
      • doesn't have any review, maintenance plan
      • bypasses audit log/referrer header creation

      we could just say "use the aws CLI", but there are some benefits in using the s3a connector code

      • support for s3a:// urls
      • can use the s3a auth/signing chain (knox, etc)
      • plus proxy, region settings etc.
      • could integrate with other bits of the stack (e.g spark RDD to get at all versions of objects)
      • would be really useful to have a tool to purge all directory delete markers down a path, to speed up listing on versioned buckets.
      • gets bundled everywhere

      For use by downstream code we would want to have a public/evolving API to access operations, e.g.

      1. taking an S3AFileStatus for rename/purge/restore operations
      2. listing all versions of objects under a path within a given time range and mapping to RemoteIterator.
      3. HADOOP-16387. S3A openFile() options to allow etag/version to be set

      Core code straightforward (it takes exactly two days to write, excluding tests), public API and tests more work.

      note, we should also move the entry point to being "s3a" with "s3guard" retained for compatibility)

      Attachments

        Activity

          People

            Unassigned Unassigned
            stevel@apache.org Steve Loughran
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: