Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-26179

In tez reuse container mode, asyncInitOperations are not clear.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 1.2.1
    • None
    • Hive, Tez
    • engine: Tez (Note: tez.am.container.reuse.enabled is true)

       

    Description

      In our cluster, we found error like this.

      Vertex failed, vertexName=Map 1, vertexId=vertex_1650608671415_321290_1_11, diagnostics=[Task failed, taskId=task_1650608671415_321290_1_11_000422, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1650608671415_321290_1_11_000422_0:java.lang.RuntimeException: java.lang.RuntimeException: Hive Runtime Error while closing operators
          at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
          at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:135)
          at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
          at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
          at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
          at java.security.AccessController.doPrivileged(Native Method)
          at javax.security.auth.Subject.doAs(Subject.java:422)
          at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
          at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
          at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
          at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
          at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125)
          at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57)
          at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78)
          at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
          at java.lang.Thread.run(Thread.java:748)
      Caused by: java.lang.RuntimeException: Hive Runtime Error while closing operators
          at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:349)
          at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:161)
          ... 16 more
      Caused by: java.lang.NullPointerException
          at org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:488)
          at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:684)
          at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:698)
          at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:338)
          ... 17 more
      

      When tez reuse container is enable, and use MapJoinOperator, if same tasks's different taskattemp execute in same container, will throw NPE.

      By my debug, I found the second task attempt use first task's asyncInitOperations. asyncInitOperations are not clear when close op, then second taskattemp may use first taskattepmt's mapJoinTables which HybridHashTableContainer.HashPartition is closed, so throw NPE.

      We must clear asyncInitOperations when op is closed.

      Attachments

        Issue Links

          Activity

            People

              zhengchenyu Chenyu Zheng
              zhengchenyu Chenyu Zheng
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m