Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-7168

Add option to not kill already-done map tasks when node becomes unusable

Log workAgile BoardRank to TopRank to BottomAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Patch Available
    • Minor
    • Resolution: Unresolved
    • 2.9.2
    • None
    • mrv2
    • None
    • Google Compute Engine (Dataproc), Java 8

    • Patch

    Description

      When a node becomes unusable, if there are still reduce tasks running, all completed map tasks that were run on that node are killed so that they can be re-run on a different node. This is because the node can no longer serve shuffle data, so the map task output cannot be fetched by the reducers.

      If map tasks do not write their shuffle data locally, killing already-done map tasks will make the job lose map progress unnecessarily. This change prevents map progress from being lost when shuffle data is not written locally by providing a property mapreduce.map.rerun-if-node-unusable that can be set to false to prevent killing already-done map tasks.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            mkonst Mikayla Konst Assign to me
            mkonst Mikayla Konst

            Dates

              Created:
              Updated:

              Slack

                Issue deployment