Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-16829 Über-jira: S3A Hadoop 3.3.1 features
  3. HADOOP-16185

S3Guard: Optimize performance of handling OOB operations in non-authoritative mode

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Minor
    • Resolution: Won't Fix
    • 3.3.0
    • None
    • fs/s3
    • None

    Description

      HADOOP-15999 modifies the S3Guard's non-authoritative mode, so when S3Guard runs non-authoritative, every fs.getFileStatus will check S3 because we don't handle the MetadataStore as a single source of truth. This has a negative performance impact.

       

      In other words HADOOP-15999 is going to reinstate the HEAD on every read, so making non-auth S3Guard a bit slower. We could think about addressing that by moving the checks into the input stream itself. That is: the first GET which returns data will also act as the metadata check. That'd mean the read context will need updating with some "metastoreProcessHeader" callback to invoke on the first GET.

      The good news is that because it's reading a file, its only one HTTP HEAD request: no need to do any of the other two directory probes except in the case that the file isn't there.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              gabor.bota Gabor Bota
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: