Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-19204 branch-1.2 times out and is taking 6-7 hours to complete
  3. HBASE-19205

Backport HBASE-18441 ZookeeperWatcher#interruptedException should throw exception

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 1.4.0, 1.3.2, 1.2.7
    • Zookeeper
    • None

    Description

      In a branch-1.2 test run, I see a bunch of this in timed out test run:

      709 2017-11-07 12:27:16,152 DEBUG [Time-limited test] zookeeper.ZooKeeperWatcher(760): hconnection-0xb1ad4ca0x0, quorum=localhost:50828, baseZNode=/hbase Received InterruptedException, doing nothing here
      710 java.lang.InterruptedException

      Which is us swallowing interrupts out of zk.

      The suppression of the interrupt is messing us up.... We start to see this:

          5018 2017-11-07 12:30:18,276 WARN  [Time-limited test] zookeeper.ZKUtil(378): master:367040x0, quorum=localhost:56068, baseZNode=/hbase Unable to set watcher on znode /hbase/backup-maste         rs/ve0528.halxg.cloudera.com,36704,1510086438842
          5019 java.lang.InterruptedException
          5020   at java.lang.Object.wait(Native Method)
          5021   at java.lang.Object.wait(Object.java:502)
          5022   at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1342)
          5023   at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:1040)
          5024   at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.exists(RecoverableZooKeeper.java:220)
          5025   at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:365)
          5026   at org.apache.hadoop.hbase.zookeeper.ZKUtil.createEphemeralNodeAndWatch(ZKUtil.java:978)
          5027   at org.apache.hadoop.hbase.zookeeper.ZKUtil.createEphemeralNodeAndWatch(ZKUtil.java:980)
          5028   at org.apache.hadoop.hbase.zookeeper.ZKUtil.createEphemeralNodeAndWatch(ZKUtil.java:980)
          5029   at org.apache.hadoop.hbase.zookeeper.ZKUtil.createEphemeralNodeAndWatch(ZKUtil.java:980)
          5030   at org.apache.hadoop.hbase.zookeeper.ZKUtil.createEphemeralNodeAndWatch(ZKUtil.java:980)
          5031   at org.apache.hadoop.hbase.zookeeper.ZKUtil.createEphemeralNodeAndWatch(ZKUtil.java:980)
          5032   at org.apache.hadoop.hbase.zookeeper.ZKUtil.createEphemeralNodeAndWatch(ZKUtil.java:980)
          5033   at org.apache.hadoop.hbase.zookeeper.ZKUtil.createEphemeralNodeAndWatch(ZKUtil.java:980)
          5034   at org.apache.hadoop.hbase.zookeeper.ZKUtil.createEphemeralNodeAndWatch(ZKUtil.java:980)
          5035   at org.apache.hadoop.hbase.zookeeper.ZKUtil.createEphemeralNodeAndWatch(ZKUtil.java:980)
          5036   at org.apache.hadoop.hbase.zookeeper.ZKUtil.createEphemeralNodeAndWatch(ZKUtil.java:980)
          5037   at org.apache.hadoop.hbase.zookeeper.ZKUtil.createEphemeralNodeAndWatch(ZKUtil.java:980)
      
      ...
      

      Later I'm getting....

      15578737 Caused by: java.lang.StackOverflowError
      15578738   at java.security.AccessController.doPrivileged(Native Method)
      15578739   at java.io.PrintWriter.<init>(PrintWriter.java:116)
      15578740   at java.io.PrintWriter.<init>(PrintWriter.java:100)
      15578741   at org.apache.log4j.DefaultThrowableRenderer.render(DefaultThrowableRenderer.java:58)
      15578742   at org.apache.log4j.spi.ThrowableInformation.getThrowableStrRep(ThrowableInformation.java:87)
      15578743   at org.apache.log4j.spi.LoggingEvent.getThrowableStrRep(LoggingEvent.java:413)
      15578744   at org.apache.log4j.WriterAppender.subAppend(WriterAppender.java:313)
      15578745   at org.apache.log4j.WriterAppender.append(WriterAppender.java:162)
      15578746   at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
      15578747   at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
      15578748   at org.apache.log4j.Category.callAppenders(Category.java:206)
      15578749   at org.apache.log4j.Category.forcedLog(Category.java:391)
      15578750   at org.apache.log4j.Category.log(Category.java:856)
      15578751   at org.apache.commons.logging.impl.Log4JLogger.warn(Log4JLogger.java:208)
      15578752   at org.apache.hadoop.hbase.zookeeper.ZKUtil.watchAndCheckExists(ZKUtil.java:378)
      15578753   at org.apache.hadoop.hbase.zookeeper.ZKUtil.createEphemeralNodeAndWatch(ZKUtil.java:978)
      15578754   at org.apache.hadoop.hbase.zookeeper.ZKUtil.createEphemeralNodeAndWatch(ZKUtil.java:980)
      15578755   at org.apache.hadoop.hbase.zookeeper.ZKUtil.createEphemeralNodeAndWatch(ZKUtil.java:980)
      15578756   at org.apache.hadoop.hbase.zookeeper.ZKUtil.createEphemeralNodeAndWatch(ZKUtil.java:980)
      15578757   at org.apache.hadoop.hbase.zookeeper.ZKUtil.createEphemeralNodeAndWatch(ZKUtil.java:980)
      15578758   at org.apache.hadoop.hbase.zookeeper.ZKUtil.createEphemeralNodeAndWatch(ZKUtil.java:980)
      15578759   at org.apache.hadoop.hbase.zookeeper.ZKUtil.createEphemeralNodeAndWatch(ZKUtil.java:980)
      15578760   at org.apache.hadoop.hbase.zookeeper.ZKUtil.createEphemeralNodeAndWatch(ZKUtil.java:980)
      ....
      

      Let me backport the carp84 patch at least.

      Attachments

        Issue Links

          Activity

            People

              stack Michael Stack
              stack Michael Stack
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: