[NIFI-7886] FetchAzureBlobStorage, FetchS3Object, and FetchGCSObject processors should be able to fetch ranges - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Minor
Resolution: Fixed
Affects Version/s: 1.12.0, 1.13.0
Fix Version/s: 1.14.0
Component/s: Extensions
Labels:
- azureblob
- gcs
- s3

Description

Azure Blob Storage, AWS S3, and Google Cloud Storage all support retrieving byte ranges of stored objects. Current versions of NiFi processors for these services do not support fetching by byte range.

Allowing to fetch by range would allow multiple enhancements:

Parallelized downloads
- Faster speeds if the bandwidth delay product of the connection is lower than the available bandwidth
- Load distribution over a cluster
Cost savings
- If the file is large and only part of the file is needed, the desired part of the file can be downloaded, saving bandwidth costs by not retrieving unnecessary bytes
- Download failures would only need to retry the failed segment, rather than the full file
Download extremely large files
- Ability to download files that are larger than the available content repo by downloading a segment and moving it off to a system with more capacity before downloading another segment

Some of these enhancements would require an upstream processor to generate multiple flow files, each covering a different part of the overall range. Something like this:
ListS3 -> ExecuteGroovyScript (to split into multiple flow files with different range attributes) -> FetchS3Object.

Attachments

Issue Links

causes

NIFI-8506 Azure Integration tests fail

Resolved

is duplicated by

NIFI-7934 FetchS3Object processor should support range requesting

Resolved

links to

GitHub Pull Request #4576

Activity

People

Assignee:: Paul Kelly

Reporter:: Paul Kelly

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 06/Oct/20 10:05

Updated:: 30/Apr/21 17:15

Resolved:: 27/Apr/21 19:21

Time Tracking

Estimated:

Not Specified

Remaining:

Logged: