Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-7173

Add ability to shuffle intermediate map task output to a distributed filesystem

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Patch Available
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 2.9.2
    • Fix Version/s: None
    • Component/s: mrv2
    • Labels:
      None
    • Target Version/s:

      Description

      If nodes are lost during the course of a mapreduce job, the map tasks that ran on those nodes need to be re-run. Writing intermediate map task output to a distributed file system eliminates this problem in environments in which nodes are frequently lost, for example, in clusters that make heavy use of Google's Preemptible VMs or AWS's Spot Instances.

      Example Usage:

      Job-scoped properties:

      1. Don't re-run an already-finished map task when we realize the node it ran on is now unusable:

      mapreduce.map.rerun-if-node-unusable=false (see MAPREDUCE-7168)

      2. On the map side, use a new implementation of MapOutputFile that provides paths relative to the staging dir for the job (which is cleaned up when the job is done):

      mapreduce.task.general.output.class=org.apache.hadoop.mapred.HCFSOutputFiles

      3. On the reduce side, use a new implementation of ShuffleConsumerPlugin that fetches map task output directly from a distributed filesystem:

      mapreduce.job.reduce.shuffle.consumer.plugin.class=org.apache.hadoop.mapreduce.task.reduce.HCFSShuffle

      4. (Optional) Edit the buffer size for the output stream used when writing map task output

      mapreduce.map.shuffle.output.buffer.size=8192

      Cluster-scoped properties (see YARN-9106):

      1. When gracefully decommissioning a node, only wait for the containers on that node to finish, not the applications associated with those containers (we don't need to wait on the applications to finish since this node is not serving shuffle data)

      yarn.resourcemanager.decommissioning-nodes-watcher.wait-for-applications=false

      2. When gracefully decommissioning a node, do not wait for app masters running on the node to finish so that this node can be decommissioned as soon as possible (failover to an app master on another node that isn't being decommissioned is pretty quick)

      yarn.resourcemanager.decommissioning-nodes-watcher.wait-for-app-masters=false

        Attachments

        1. MAPREDUCE-7173.patch
          33 kB
          Mikayla Konst

          Activity

            People

            • Assignee:
              mkonst Mikayla Konst
              Reporter:
              mkonst Mikayla Konst
            • Votes:
              2 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated: