[HADOOP-13203] S3A: Support fadvise "random" mode for high performance readPositioned() reads - ASF JIRA

Log work

Agile Board

Rank to Top

Rank to Bottom

Attach files

Attach Screenshot

Bulk Copy Attachments

Bulk Move Attachments

Voters

Watch issue

Watchers

Convert to Issue

Move

Link

Clone

Labels

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Sub-task
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: 2.8.0
Fix Version/s: 2.8.0, 3.0.0-alpha1
Component/s: fs/s3
Labels:
None

Target Version/s:

2.8.0
Hadoop Flags:

Reviewed
Release Note:

Hide
S3A has added support for configurable input policies. Similar to fadvise, this configuration provides applications with a way to specify their expected access pattern (sequential or random) while reading a file. S3A then performs optimizations tailored to that access pattern. See site documentation of the fs.s3a.experimental.input.fadvise configuration property for more details. Please be advised that this feature is experimental and subject to backward-incompatible changes in future releases.

Show
S3A has added support for configurable input policies. Similar to fadvise, this configuration provides applications with a way to specify their expected access pattern (sequential or random) while reading a file. S3A then performs optimizations tailored to that access pattern. See site documentation of the fs.s3a.experimental.input.fadvise configuration property for more details. Please be advised that this feature is experimental and subject to backward-incompatible changes in future releases.

Description

Currently file's "contentLength" is set as the "requestedStreamLen", when invoking S3AInputStream::reopen(). As a part of lazySeek(), sometimes the stream had to be closed and reopened. But lots of times the stream was closed with abort() causing the internal http connection to be unusable. This incurs lots of connection establishment cost in some jobs. It would be good to set the correct value for the stream length to avoid connection aborts.

I will post the patch once aws tests passes in my machine.