[SQOOP-2861] Sqoop2: Scheduler Pool Support - ASF JIRA

XML

Word

Printable

JSON

Details

Type: New Feature
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 2.0.0
Fix Version/s: None
Component/s: sqoop2-framework
Labels:
None

Description

Provide a mechanism to limit cluster-wide sqoop access to a particular FROM resource. The use case is to configure a yarn scheduler pool that will limit the vcores and ram available for jobs accessing a sensitive resource. A subset of sqoop2 jobs could be configured to run in this pool, whereas other sqoop2 jobs would fall back to the default pool configured for the sqoop2 server.

The throttling extractor mechanics are useful for preventing a single job from saturating the resource, but this mechanism cannot limit aggregate resource access across jobs. This ticket aims to enable the use of scheduler pools for scenarios when multiple sqoop2 jobs would access a resource.

Possible implementation strategies:

Enable clients to pass through job-specific mapreduce configuration, such as key=value pairs in the CLI. A sqoop2 client would specify the scheduler pool by passing a mapreduce.job.queuename from the CLI
Expose scheduler semantics to the client. An execution engine can subsequently decide to honor the scheduler request. For example, a pool property can be interpreted and then set as the mapreduce.job.queuename value of the hadoop configuration from the mapreduce execution engine.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Scott Kuehn

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 02/Mar/16 02:12

Updated:: 02/Mar/16 02:12