Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-11694 Über-jira: S3a phase II: robustness, scale and performance
  3. HADOOP-13203

S3A: Support fadvise "random" mode for high performance readPositioned() reads

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsVotersWatch issueWatchersConvert to IssueMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.8.0
    • 2.8.0, 3.0.0-alpha1
    • fs/s3
    • None
    • Reviewed
    • Hide
      S3A has added support for configurable input policies. Similar to fadvise, this configuration provides applications with a way to specify their expected access pattern (sequential or random) while reading a file. S3A then performs optimizations tailored to that access pattern. See site documentation of the fs.s3a.experimental.input.fadvise configuration property for more details. Please be advised that this feature is experimental and subject to backward-incompatible changes in future releases.
      Show
      S3A has added support for configurable input policies. Similar to fadvise, this configuration provides applications with a way to specify their expected access pattern (sequential or random) while reading a file. S3A then performs optimizations tailored to that access pattern. See site documentation of the fs.s3a.experimental.input.fadvise configuration property for more details. Please be advised that this feature is experimental and subject to backward-incompatible changes in future releases.

    Description

      Currently file's "contentLength" is set as the "requestedStreamLen", when invoking S3AInputStream::reopen(). As a part of lazySeek(), sometimes the stream had to be closed and reopened. But lots of times the stream was closed with abort() causing the internal http connection to be unusable. This incurs lots of connection establishment cost in some jobs. It would be good to set the correct value for the stream length to avoid connection aborts.

      I will post the patch once aws tests passes in my machine.

      Attachments

        1. HADOOP-13203-branch-2-001.patch
          3 kB
          Rajesh Balamohan
        2. HADOOP-13203-branch-2-002.patch
          6 kB
          Rajesh Balamohan
        3. HADOOP-13203-branch-2-003.patch
          6 kB
          Rajesh Balamohan
        4. HADOOP-13203-branch-2-004.patch
          6 kB
          Rajesh Balamohan
        5. stream_stats.tar.gz
          716 kB
          Rajesh Balamohan
        6. HADOOP-13203-branch-2-005.patch
          28 kB
          Steve Loughran
        7. HADOOP-13203-branch-2-006.patch
          43 kB
          Steve Loughran
        8. HADOOP-13203-branch-2-007.patch
          42 kB
          Steve Loughran
        9. HADOOP-13203-branch-2-008.patch
          53 kB
          Steve Loughran
        10. HADOOP-13203-branch-2-009.patch
          57 kB
          Steve Loughran
        11. HADOOP-13203-branch-2-010.patch
          57 kB
          Steve Loughran

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            rajesh.balamohan Rajesh Balamohan Assign to me
            rajesh.balamohan Rajesh Balamohan
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment