Uploaded image for project: 'Apache Tez'
  1. Apache Tez
  2. TEZ-4140

Tez DAG Recovery: Discrepancy In Scheduling Vertices During Vertex Recovery

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.8.2, 0.9.0, 0.8.4, 0.9.1, 0.9.2
    • 0.10.0, 0.9.3
    • None
    • None

    Description

      Issue:

      During vertex recovery, the initialization stage of vertex is skipped if

      1) VertexInputInitializerEvent
      2) VertexReconfigureDoneEvent

      are seen in the recovery data. Further the initialization stage is skipped by replacing any VertexManagerPlugin (Eg: ShuffleVertexManager, CustomVertexManager etc) by NoOpVertexManager. There are couple of issues in replacing VertexManagerPlugin with NoOpVertexManager

      1) Completeness of any VertexManagerPlugin is only after the tasks are launched in that vertex, So using NoOpVertexManager without checking whether tasks for that particular vertex were launched in previous run might result in some kind of discrepancy in deciding when and how many tasks should be launched in that vertex during recovery.

      2) Maintaining vertex dependency:
      Say for example we have two vertices v1 and v2 and v2 is dependent on v1 (v1 ---> v2), and for some reasons if v1 was not able to skip initialization stage and v2 was able to skip initialization stage and there is a chance that v2 might get scheduled before v1 since NoOpVertexManager is used.

      The above mentioned problem is what i have faced. Attached a DAG for reference:

      In the DAG, Reducer 7 is dependent on Reducer 6 and for some reason during Tez Recovery, Reducer 6's initialization stage was not skipped where as Reducer 7's initialization stage was skipped and NoOpVertexManager was used instead of ShuffleVertexManager which went on to launch all the tasks in Reducer 7 before waiting in for Reducer 6's completion. Initially it was decided that Reducer 6 will be launching 14 tasks and as per that information, Tasks launched in Reducer 7 was waiting for 14 shuffle inputs but later due to AutoReduce parallelism No. of tasks in Reducer 6 was adjusted to 1 and the Reducer 7's tasks didn't know about this and kept on waiting for 14 shuffle inputs but in actual there was only 1, hence the query was stuck. This can also lead to deadlock when no. of containers are limited and Reducer 7 ends up using all of them.

      Proposed Solution:
      In addition to the condition of VertexInputInitializerEvent and VertexReconfigureDoneEvent, introduce couple more conditions:

      1) Check whether tasks were launched in the vertex in the previous run before replacing VertexManagerPlugin with NoOpVertexManager

      2) All the parent vertices should have skipped initialization stage before the child vertex does it. This is required to maintain vertex dependency

      Attachments

        1. DAG.png
          334 kB
          Syed Shameerur Rahman
        2. TEZ-4140.01.patch
          20 kB
          Syed Shameerur Rahman
        3. TEZ-4140.02.patch
          20 kB
          László Bodor

        Issue Links

          Activity

            People

              srahman Syed Shameerur Rahman
              srahman Syed Shameerur Rahman
              Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 0.5h
                  0.5h