Details
-
Bug
-
Status: Patch Available
-
Major
-
Resolution: Unresolved
-
2.8.4
-
None
Description
We use node label and async schedule in our cluster, and found our rm throw many IllegalArgumentException which stack is blow:
2018-09-11 11:59:23,585 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: AysncSchedulerThread encounter runtime exception! java.lang.IllegalArgumentException: Comparison method violates its general contract! at java.util.TimSort.mergeHi(TimSort.java:899) at java.util.TimSort.mergeAt(TimSort.java:516) at java.util.TimSort.mergeForceCollapse(TimSort.java:457) at java.util.TimSort.sort(TimSort.java:254) at java.util.Arrays.sort(Arrays.java:1512) at java.util.ArrayList.sort(ArrayList.java:1454) at java.util.Collections.sort(Collections.java:175) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.sortAndGetChildrenAllocationIterator(ParentQueue.java:685) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainersToChildQueues(ParentQueue.java:698) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue.assignContainers(ParentQueue.java:521) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateOrReserveNewContainers(CapacityScheduler.java:1720) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainerOnSingleNode(CapacityScheduler.java:1715) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1806) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.allocateContainersToNode(CapacityScheduler.java:1446) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.schedule(CapacityScheduler.java:487) at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler$AsyncScheduleThread.run(CapacityScheduler.java:516)
the reason is partitionToLookAt in PartitionedQueueComparator is a string and each Async Schedule thread will set it. It cause the compare function in PartitionedQueueComparator incorrect and finally casues the IllegalArgumentException.