Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-7282

MR v2 commit algorithm should be deprecated and not the default

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Won't Fix
    • 3.3.0, 3.2.1, 3.1.3, 3.3.1
    • None
    • mrv2

    Description

      The v2 MR commit algorithm moves files from the task attempt dir into the dest dir on task commit -one by one

      It is therefore not atomic

      1. if a task commit fails partway through and another task attempt commits -unless exactly the same filenames are used, output of the first attempt may be included in the final result
      2. if a worker partitions partway through task commit, and then continues after another attempt has committed, it may partially overwrite the output -even when the filenames are the same

      Both MR and spark assume that task commits are atomic. Either they need to consider that this is not the case, we add a way to probe for a committer supporting atomic task commit, and the engines both add handling for task commit failures (probably fail job)

      Better: we remove this as the default, maybe also warn when it is being used

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              stevel@apache.org Steve Loughran
              Votes:
              0 Vote for this issue
              Watchers:
              22 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h