Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
qpid-java-broker-7.1.0, qpid-java-broker-7.0.4, qpid-java-broker-7.0.5, qpid-java-broker-7.0.6, qpid-java-broker-7.0.7, qpid-java-broker-7.1.1, qpid-java-broker-7.1.2, qpid-java-broker-7.0.8, qpid-java-broker-7.1.3, qpid-java-broker-7.1.4
-
None
Description
The ConnectionScopedRuntimeException thrown from VirtualHost House Keeping thread on invocation of MessageStore operations like checkMessageStatus can crash the broker. An example of such exception stack trace (from Qpid Broker version 7.0.6) is provided below:
2019-09-27 07:53:38,168 ERROR [virtualhost-test-pool-1] (o.a.q.s.Main) - Uncaught exception, shutting down. org.apache.qpid.server.util.ConnectionScopedRuntimeException: Required number of nodes not reachable at org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade.handleDatabaseException(ReplicatedEnvironmentFacade.java:495) at org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade.commit(ReplicatedEnvironmentFacade.java:332) at org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore.removeMessage(AbstractBDBMessageStore.java:288) at org.apache.qpid.server.store.berkeleydb.AbstractBDBMessageStore$StoredBDBMessage.remove(AbstractBDBMessageStore.java:1090) at org.apache.qpid.server.message.AbstractServerMessageImpl.decrementReference(AbstractServerMessageImpl.java:118) at org.apache.qpid.server.message.AbstractServerMessageImpl.access$500(AbstractServerMessageImpl.java:37) at org.apache.qpid.server.message.AbstractServerMessageImpl$Reference.release(AbstractServerMessageImpl.java:309) at org.apache.qpid.server.queue.QueueEntryImpl.dispose(QueueEntryImpl.java:557) at org.apache.qpid.server.queue.QueueEntryImpl.delete(QueueEntryImpl.java:572) at org.apache.qpid.server.queue.AbstractQueue$11.postCommit(AbstractQueue.java:1729) at org.apache.qpid.server.txn.AutoCommitTransaction.dequeue(AutoCommitTransaction.java:92) at org.apache.qpid.server.queue.AbstractQueue.dequeueEntry(AbstractQueue.java:1722) at org.apache.qpid.server.queue.AbstractQueue.dequeueEntry(AbstractQueue.java:1717) at org.apache.qpid.server.queue.AbstractQueue.deleteEntry(AbstractQueue.java:1761) at org.apache.qpid.server.queue.AbstractQueue.checkMessageStatus(AbstractQueue.java:2165) at org.apache.qpid.server.virtualhost.AbstractVirtualHost$VirtualHostHouseKeepingTask.execute(AbstractVirtualHost.java:1965) at org.apache.qpid.server.virtualhost.HouseKeepingTask$1.run(HouseKeepingTask.java:56) at java.security.AccessController.doPrivileged(Native Method) at org.apache.qpid.server.virtualhost.HouseKeepingTask.run(HouseKeepingTask.java:51) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at org.apache.qpid.server.bytebuffer.QpidByteBufferFactory.lambda$null$0(QpidByteBufferFactory.java:464) at java.lang.Thread.run(Thread.java:748) Caused by: com.sleepycat.je.rep.InsufficientAcksException: (JE 7.4.5) Transaction: -3459038252 VLSN: 10,380,435,448, initiated at: 07:53:20. Insufficient acks for policy:SIMPLE_MAJORITY. Need replica acks: 2. Missing replica acks: 2. Timeout: 15000ms. FeederState=acc3_2(3)[MASTER] Current feeds: acc3_1: feederVLSN=10,380,435,456 replicaTxnEndVLSN=10,380,435,396 acc3: feederVLSN=10,380,435,456 replicaTxnEndVLSN=10,380,435,396 at com.sleepycat.je.rep.impl.node.DurabilityQuorum.ensureSufficientAcks(DurabilityQuorum.java:205) at com.sleepycat.je.rep.stream.FeederTxns.awaitReplicaAcks(FeederTxns.java:189) at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHookInternal(RepImpl.java:1426) at com.sleepycat.je.rep.impl.RepImpl.postLogCommitHook(RepImpl.java:1385) at com.sleepycat.je.rep.txn.MasterTxn.postLogCommitHook(MasterTxn.java:228) at com.sleepycat.je.txn.Txn.commit(Txn.java:772) at com.sleepycat.je.Transaction.doCommit(Transaction.java:621) at com.sleepycat.je.Transaction.commit(Transaction.java:401) at org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade.commit(ReplicatedEnvironmentFacade.java:328) ... 25 common frames omitted
The issue reported with the stack trace above occurred when BDB HA VirtualHost was trying to delete an expired message, but its BDB HA group lost the majority when the VirtualHost tried to commit a BDB HA transaction for message deletion operation. The majority loss is communicated as ConnectionScopeRuntimeException to the caller. It seems we need to catch and handle ConnectionScopeRuntimeException in House Keeping operations.