Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.2.1
-
None
-
engine: Tez (Note: tez.am.container.reuse.enabled is true)
Description
In our cluster, we found error like this.
Vertex failed, vertexName=Map 1, vertexId=vertex_1650608671415_321290_1_11, diagnostics=[Task failed, taskId=task_1650608671415_321290_1_11_000422, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1650608671415_321290_1_11_000422_0:java.lang.RuntimeException: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:135) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:125) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:57) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:78) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.RuntimeException: Hive Runtime Error while closing operators at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:349) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:161) ... 16 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.MapJoinOperator.closeOp(MapJoinOperator.java:488) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:684) at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:698) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.close(MapRecordProcessor.java:338) ... 17 more
When tez reuse container is enable, and use MapJoinOperator, if same tasks's different taskattemp execute in same container, will throw NPE.
By my debug, I found the second task attempt use first task's asyncInitOperations. asyncInitOperations are not clear when close op, then second taskattemp may use first taskattepmt's mapJoinTables which HybridHashTableContainer.HashPartition is closed, so throw NPE.
We must clear asyncInitOperations when op is closed.
Attachments
Issue Links
- links to