Uploaded image for project: 'Sqoop (Retired)'
  1. Sqoop (Retired)
  2. SQOOP-2861

Sqoop2: Scheduler Pool Support

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 2.0.0
    • None
    • sqoop2-framework
    • None

    Description

      Provide a mechanism to limit cluster-wide sqoop access to a particular FROM resource. The use case is to configure a yarn scheduler pool that will limit the vcores and ram available for jobs accessing a sensitive resource. A subset of sqoop2 jobs could be configured to run in this pool, whereas other sqoop2 jobs would fall back to the default pool configured for the sqoop2 server.

      The throttling extractor mechanics are useful for preventing a single job from saturating the resource, but this mechanism cannot limit aggregate resource access across jobs. This ticket aims to enable the use of scheduler pools for scenarios when multiple sqoop2 jobs would access a resource.

      Possible implementation strategies:

      1. Enable clients to pass through job-specific mapreduce configuration, such as key=value pairs in the CLI. A sqoop2 client would specify the scheduler pool by passing a mapreduce.job.queuename from the CLI
      2. Expose scheduler semantics to the client. An execution engine can subsequently decide to honor the scheduler request. For example, a pool property can be interpreted and then set as the mapreduce.job.queuename value of the hadoop configuration from the mapreduce execution engine.

      Attachments

        Activity

          People

            Unassigned Unassigned
            skuehn Scott Kuehn
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: