[BEAM-12875] File systems are not registered when ArtifactRetrievalService is created by Spark runner - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: P2
Resolution: Fixed
Affects Version/s: 2.32.0
Fix Version/s: 2.35.0
Component/s: runner-spark
Labels:
None

Description

I am new to this codebase so apologies if I have any misunderstandings, but from what I can tell when SparkExecutableStageFunction is called an ArtifactRetrievalService is created (if the job bundle factory's environment cache is cold) to be called by the worker harness.

The issue is that FileSystems.setDefaultPipelineOptions is not called before this, so no filesystems are registered. If one is using cloud storage such as S3 to stage artifacts, then the ArtifactRetrievalService will not be able to retrieve the artifacts and throw an exception:
java.lang.IllegalArgumentException: No filesystem found for scheme s3

This doesn't affect other runners such as the Flink runner because it calls FileSystems.setDefaultPipelineOptions in its executable stage function

Attachments

Issue Links

links to

GitHub Pull Request #15502

Activity

People

Assignee:: Unassigned

Reporter:: Rogan Morrow

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 11/Sep/21 10:37

Updated:: 11/Oct/21 21:35

Resolved:: 11/Oct/21 21:35

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

1h 20m