Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-15116

NPE in ResourceManager when ZooKeeper goes down temporary (HA Mode)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.0.0-beta1
    • None
    • ha
    • None

    Description

      In an HA enabled cluster (3.0), we found that RM is failing to start with an NPE from ActiveStandbyElector. Zookeeper was down at this time, hence client retries were coming for a while

      2017-12-13 18:21:22,460 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server 127.0.0.1/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
      2017-12-13 18:21:22,544 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService failed in state INITED; cause: java.lang.NullPointerException
      java.lang.NullPointerException
              at org.apache.hadoop.ha.ActiveStandbyElector$3.run(ActiveStandbyElector.java:1039)
              at org.apache.hadoop.ha.ActiveStandbyElector$3.run(ActiveStandbyElector.java:1036)
              at org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1101)
              at org.apache.hadoop.ha.ActiveStandbyElector.zkDoWithRetries(ActiveStandbyElector.java:1093)
              at org.apache.hadoop.ha.ActiveStandbyElector.createWithRetries(ActiveStandbyElector.java:1036)
              at org.apache.hadoop.ha.ActiveStandbyElector.ensureParentZNode(ActiveStandbyElector.java:347)        at org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.serviceInit(ActiveStandbyElectorBasedElectorService.java:110)
              at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
              at org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:108)
              at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceInit(ResourceManager.java:326)
              at org.apache.hadoop.service.AbstractService.init(AbstractService.java:164)
              at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1420)
      2017-12-13 18:21:22,545 INFO org.apache.hadoop.ha.ActiveStandbyElector: Yielding from election
      2017-12-13 18:21:22,545 INFO org.apache.hadoop.service.AbstractService: Service ResourceManager failed in state INITED; cause: java.lang.NullPointerException
      

      Attachments

        Activity

          People

            Unassigned Unassigned
            sunilg Sunil G
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: