Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
2.6.0
-
None
Description
During startup of SLS, after startRM() in SLSRunner.start(), BaseContainerTokenSecretManager not yet generate its onw internal key or it's not yet exposed to the other thread, then NM registration will fail because of the following exception. Finally, the whole SLS process will crash.
Exception in thread "main" java.lang.NullPointerException at org.apache.hadoop.yarn.server.security.BaseContainerTokenSecretManager.getCurrentKey(BaseContainerTokenSecretManager.java:81) at org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService.registerNodeManager(ResourceTrackerService.java:300) at org.apache.hadoop.yarn.sls.nodemanager.NMSimulator.init(NMSimulator.java:105) at org.apache.hadoop.yarn.sls.SLSRunner.startNM(SLSRunner.java:202) at org.apache.hadoop.yarn.sls.SLSRunner.start(SLSRunner.java:143) at org.apache.hadoop.yarn.sls.SLSRunner.main(SLSRunner.java:528) 17/05/11 10:21:06 INFO resourcemanager.ResourceManager: Recovery started 17/05/11 10:21:06 INFO recovery.ZKRMStateStore: Watcher event type: None with state:SyncConnected for path:null for Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore in state org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED