Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-21785

Add task queue/runtime stats per LLAP daemon to output

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 3.1.1
    • 4.0.0-alpha-1
    • llap
    • None

    Description

      There are several scenarios, where we want to investigate if a particular LLAP daemon is performing faster or slower than the others in the cluster. In these scenarios, it is specifically important to figure out if tasks spent significant time, waiting for an available executor (queued) vs. on the execution itself. Also, a skew in task-to-daemon assignment is interesting.

      This patch adds these statistics to the TezCounters and therefore to the job output on a per LLAP daemon base. Here is an example.

      INFO : LlapTaskRuntimeAgg by daemon:
      INFO :    Count-host-1.example.com: 41
      INFO :    Count-host-2.example.com: 39
      INFO :    Count-host-3.example.com: 45
      INFO :    QueueTime-host-1.example.com: 51437776
      INFO :    QueueTime-host-2.example.com: 35758306
      INFO :    QueueTime-host-3.example.com: 47168327
      INFO :    RunTime-host-1.example.com: 165151539295
      INFO :    RunTime-host-2.example.com: 141729193528
      INFO :    RunTime-host-3.example.com: 166876988771

      The "Count-" are simple task counts for the appended host name (LLAP daemon)

      The "QueueTime-" values tell, how long tasks waited in the TaskExecutorService's queue before getting actually executed.

      The "RunTime-" values cover the time from execution start to finish (where finish can either be successful execution or a killed/failed execution).

      For the new counts to appear in the output, both - the preexisting hive.tez.exec.print.summary and the new hive.llap.task.time.print.summary have to be set to true.

       
      <property>
        <name>hive.tez.exec.print.summary</name>
        <value>true</value>
      </property>
      <property>
        <name>hive.llap.task.time.print.summary</name>
        <value>true</value>
      </property>

      Attachments

        1. HIVE-21785.2.patch
          10 kB
          Oliver Draese
        2. HIVE-21785.1.patch
          10 kB
          Oliver Draese
        3. HIVE-21785.patch
          10 kB
          Oliver Draese

        Activity

          People

            odraese Oliver Draese
            odraese Oliver Draese
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: