Uploaded image for project: 'Hadoop Map/Reduce'
  1. Hadoop Map/Reduce
  2. MAPREDUCE-2269

Rumen TopologyBuilder ignores hostname info in ReduceAttemptFinishedEvent

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Minor
    • Resolution: Unresolved
    • 0.22.0
    • None
    • tools/rumen
    • None

    Description

      Rumen's TopologyBuilder component attempts to build up a view of a complete cluster over time by processing many jobs' history files (per discussion with Dick King). It appears to be designed to take a greedy approach to this, pulling hostnames and rack info out of any JobHistory events that have them.

      In particular, it pulls split locations out of TaskStartedEvent and hostnames out of TaskAttemptUnsuccessfulCompletionEvent (used for all task types) and TaskAttemptFinishedEvent (used only for setup and cleanup task attempts). It omits hostnames in TaskAttemptStartedEvents produced by map attempts (perhaps intentional given the split info from TaskStartedEvents?) and in ReduceAttemptFinishedEvents (apparently unintentional). The latter resulted in an empty topology and an ArrayIndexOutOfBoundsException in a reduce-only unit test (TestTaskPerformanceSplitTranscription modified for an upcoming feature).

      I'm not sure if this is intended behavior or a bug; feel free to close if the former. It seemed like TaskAttemptFinishedEvent might have been mistakenly believed to cover REDUCE_ATTEMPT_FINISHED. (If so, the fix to TopologyBuilder.java is trivial.)

      Attachments

        Activity

          People

            Unassigned Unassigned
            roelofs Greg Roelofs
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: