Uploaded image for project: 'Qpid'
  1. Qpid
  2. QPID-8366

[Broker-J] The loss of BDB HA majority on invocation of house keeping operations can crash the broker

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • qpid-java-broker-7.1.0, qpid-java-broker-7.0.4, qpid-java-broker-7.0.5, qpid-java-broker-7.0.6, qpid-java-broker-7.0.7, qpid-java-broker-7.1.1, qpid-java-broker-7.1.2, qpid-java-broker-7.0.8, qpid-java-broker-7.1.3, qpid-java-broker-7.1.4
    • Broker-J
    • None

    Description

      The ConnectionScopedRuntimeException thrown from VirtualHost House Keeping thread on invocation of MessageStore operations like checkMessageStatus can crash the broker. An example of such exception stack trace (from Qpid Broker version 7.0.6) is provided below:

      2019-09-27 07:53:38,168 ERROR [virtualhost-test-pool-1] (o.a.q.s.Main) - Uncaught exception, shutting down.
      org.apache.qpid.server.util.ConnectionScopedRuntimeException: Required number of nodes not reachable
              at org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade.handleDatabaseException(ReplicatedEnvironmentFacade.java:495)
              at org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade.commit(ReplicatedEnvironmentFacade.java:332)
              at org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore.removeMessage(AbstractBDBMessageStore.java:288)
              at org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$StoredBDBMessage.remove(AbstractBDBMessageStore.java:1090)
              at org.apache.qpid.server.message.AbstractServerMessageImpl.decrementReference(AbstractServerMessageImpl.java:118)
              at org.apache.qpid.server.message.AbstractServerMessageImpl.access$500(AbstractServerMessageImpl.java:37)
              at org.apache.qpid.server.message.AbstractServerMessageImpl$Reference.release(AbstractServerMessageImpl.java:309)
              at org.apache.qpid.server.queue.QueueEntryImpl.dispose(QueueEntryImpl.java:557)
              at org.apache.qpid.server.queue.QueueEntryImpl.delete(QueueEntryImpl.java:572)
              at org.apache.qpid.server.queue.AbstractQueue$11.postCommit(AbstractQueue.java:1729)
              at org.apache.qpid.server.txn.AutoCommitTransaction.dequeue(AutoCommitTransaction.java:92)
              at org.apache.qpid.server.queue.AbstractQueue.dequeueEntry(AbstractQueue.java:1722)
              at org.apache.qpid.server.queue.AbstractQueue.dequeueEntry(AbstractQueue.java:1717)
              at org.apache.qpid.server.queue.AbstractQueue.deleteEntry(AbstractQueue.java:1761)
              at org.apache.qpid.server.queue.AbstractQueue.checkMessageStatus(AbstractQueue.java:2165)
              at org.apache.qpid.server.virtualhost.AbstractVirtualHost$VirtualHostHouseKeepingTask.execute(AbstractVirtualHost.java:1965)
              at org.apache.qpid.server.virtualhost.HouseKeepingTask$1.run(HouseKeepingTask.java:56)
              at java.security.AccessController.doPrivileged(Native Method)
              at org.apache.qpid.server.virtualhost.HouseKeepingTask.run(HouseKeepingTask.java:51)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
              at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
              at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
              at org.apache.qpid.server.bytebuffer.QpidByteBufferFactory.lambda$null$0(QpidByteBufferFactory.java:464)
              at java.lang.Thread.run(Thread.java:748)
      Caused by: com.sleepycat.je.rep.InsufficientAcksException: (JE 7.4.5) Transaction: -3459038252  VLSN: 10,380,435,448, initiated at: 07:53:20.  Insufficient acks for policy:SIMPLE_MAJORITY. Need replica acks: 2. Missing replica acks: 2. Timeout: 15000ms. FeederState=acc3_2(3)[MASTER]
      Current feeds:
       acc3_1: feederVLSN=10,380,435,456 replicaTxnEndVLSN=10,380,435,396
       acc3: feederVLSN=10,380,435,456 replicaTxnEndVLSN=10,380,435,396
      
              at com.sleepycat.je.rep.impl.node.DurabilityQuorum.ensureSufficientAcks(DurabilityQuorum.java:205)
              at com.sleepycat.je.rep.stream.FeederTxns.awaitReplicaAcks(FeederTxns.java:189)
              at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHookInternal(RepImpl.java:1426)
              at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHook(RepImpl.java:1385)
              at com.sleepycat.je.rep.txn.MasterTxn.postLogCommitHook(MasterTxn.java:228)
              at com.sleepycat.je.txn.Txn.commit(Txn.java:772)
              at com.sleepycat.je.Transaction.doCommit(Transaction.java:621)
              at com.sleepycat.je.Transaction.commit(Transaction.java:401)
              at org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade.commit(ReplicatedEnvironmentFacade.java:328)
              ... 25 common frames omitted
      
      

      The issue reported with the stack trace above occurred when BDB HA VirtualHost was trying to delete an expired message, but its BDB HA group lost the majority when the VirtualHost tried to commit a BDB HA transaction for message deletion operation. The majority loss is communicated as ConnectionScopeRuntimeException to the caller. It seems we need to catch and handle ConnectionScopeRuntimeException in House Keeping operations.

      Attachments

        Activity

          People

            orudyy Alex Rudyy
            orudyy Alex Rudyy
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: