Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
5.15.2
-
None
-
None
-
None
-
Two nodes master and slave.
GlusterFS is configured to replicate the journal across nodes. Only one node is active and writing.
Relevant from activemq.xml
<destinationPolicy> <policyMap> <policyEntries> <policyEntry queue=">" memoryLimit="40mb" producerFlowControl="true" optimizedDispatch="true"> </policyEntry> <policyEntry topic=">" memoryLimit="600mb" producerFlowControl="false" optimizedDispatch="true"> </policyEntry> </policyEntries> </policyMap> </destinationPolicy> <persistenceAdapter> <kahaDB directory="${persistence.brokerData}" enableJournalDiskSyncs="true" ignoreMissingJournalfiles="true" checkForCorruptJournalFiles="true" checksumJournalFiles="true"> <locker> <shared-file-locker lockAcquireSleepInterval="3000"/> </locker> </kahaDB> </persistenceAdapter>
Two nodes master and slave. GlusterFS is configured to replicate the journal across nodes. Only one node is active and writing. Relevant from activemq.xml <destinationPolicy> <policyMap> <policyEntries> <policyEntry queue=">" memoryLimit="40mb" producerFlowControl="true" optimizedDispatch="true"> </policyEntry> <policyEntry topic=">" memoryLimit="600mb" producerFlowControl="false" optimizedDispatch="true"> </policyEntry> </policyEntries> </policyMap> </destinationPolicy> <persistenceAdapter> <kahaDB directory="${persistence.brokerData}" enableJournalDiskSyncs="true" ignoreMissingJournalfiles="true" checkForCorruptJournalFiles="true" checksumJournalFiles="true"> <locker> <shared-file-locker lockAcquireSleepInterval="3000"/> </locker> </kahaDB> </persistenceAdapter>
Description
Hi,
I am trying to solve the issue that with ActiveMQ in GlusterFS setup. I tend to blame GlusterFS because without it, ActiveMQ performs well under the same load. However, I would like to understand why it might fail with GlusterFS or maybe some parts could be tuned to play nicely. Another point to mention is that GlusterFS doesn't report any errors whatsoever when this issue happens. Would be great to understand what the error is about.
There is not way to reproduce it with 100% probability, however there are steps that are likely to cause the issue:
- add several (10+) consumers for a queue
- add several (100+) async producers to the queue to produce very high load spike around 50-100k msg/s
2018-08-23 16:24:08,506 | ERROR | Failed to load message at: 181:32956327 | org.apache.activemq.store.kahadb.KahaDBStore | ActiveMQ BrokerService[AMQ01] Task-236 java.io.IOException: Unexpected error on journal read at: 181:32956327 at org.apache.activemq.util.IOExceptionSupport.create(IOExceptionSupport.java:28) at org.apache.activemq.store.kahadb.KahaDBStore.loadMessage(KahaDBStore.java:1260) at org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore$4.execute(KahaDBStore.java:594) at org.apache.activemq.store.kahadb.disk.page.Transaction.execute(Transaction.java:779) at org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore.recoverNextMessages(KahaDBStore.java:583) at org.apache.activemq.store.ProxyMessageStore.recoverNextMessages(ProxyMessageStore.java:110) at org.apache.activemq.broker.region.cursors.QueueStorePrefetch.doFillBatch(QueueStorePrefetch.java:127) at org.apache.activemq.broker.region.cursors.AbstractStoreCursor.fillBatch(AbstractStoreCursor.java:448) at org.apache.activemq.broker.region.cursors.AbstractStoreCursor.reset(AbstractStoreCursor.java:168) at org.apache.activemq.broker.region.cursors.StoreQueueCursor.reset(StoreQueueCursor.java:169) at org.apache.activemq.broker.region.Queue.doPageInForDispatch(Queue.java:1981) at org.apache.activemq.broker.region.Queue.pageInMessages(Queue.java:2210) at org.apache.activemq.broker.region.Queue.iterate(Queue.java:1644) at org.apache.activemq.thread.PooledTaskRunner.runTask(PooledTaskRunner.java:133) at org.apache.activemq.thread.PooledTaskRunner$1.run(PooledTaskRunner.java:48) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Could not load journal record, unexpected command type: KAHA_TRACE_COMMAND at location: 181:32956327 at org.apache.activemq.store.kahadb.KahaDBStore.loadMessage(KahaDBStore.java:1252) ... 16 more 2018-08-23 16:24:08,509 | INFO | Stopping BrokerService[AMQ01] due to exception, java.io.IOException: Unexpected error on journal read at: 181:32956327 | org.apache.activemq.util.DefaultIOExceptionHandler | ActiveMQ BrokerService[AMQ01] Task-236 java.io.IOException: Unexpected error on journal read at: 181:32956327 at org.apache.activemq.util.IOExceptionSupport.create(IOExceptionSupport.java:28) at org.apache.activemq.store.kahadb.KahaDBStore.loadMessage(KahaDBStore.java:1260) at org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore$4.execute(KahaDBStore.java:594) at org.apache.activemq.store.kahadb.disk.page.Transaction.execute(Transaction.java:779) at org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore.recoverNextMessages(KahaDBStore.java:583) at org.apache.activemq.store.ProxyMessageStore.recoverNextMessages(ProxyMessageStore.java:110) at org.apache.activemq.broker.region.cursors.QueueStorePrefetch.doFillBatch(QueueStorePrefetch.java:127) at org.apache.activemq.broker.region.cursors.AbstractStoreCursor.fillBatch(AbstractStoreCursor.java:448) at org.apache.activemq.broker.region.cursors.AbstractStoreCursor.reset(AbstractStoreCursor.java:168) at org.apache.activemq.broker.region.cursors.StoreQueueCursor.reset(StoreQueueCursor.java:169) at org.apache.activemq.broker.region.Queue.doPageInForDispatch(Queue.java:1981) at org.apache.activemq.broker.region.Queue.pageInMessages(Queue.java:2210) at org.apache.activemq.broker.region.Queue.iterate(Queue.java:1644) at org.apache.activemq.thread.PooledTaskRunner.runTask(PooledTaskRunner.java:133) at org.apache.activemq.thread.PooledTaskRunner$1.run(PooledTaskRunner.java:48) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Could not load journal record, unexpected command type: KAHA_TRACE_COMMAND at location: 181:32956327 at org.apache.activemq.store.kahadb.KahaDBStore.loadMessage(KahaDBStore.java:1252) ... 16 more 2018-08-23 16:24:08,511 | ERROR | org.apache.activemq.broker.region.cursors.QueueStorePrefetch@ef6b8e9:TEST,batchResetNeeded=false,size=654,cacheEnabled=false,maxBatchSize:200,hasSpace:true,pendingCachedIds.size:0,lastSyncCachedId:null,lastSyncCachedId-seq:null,lastAsyncCachedId:null,lastAsyncCachedId-seq:null,store=permits:10000,sd=nextSeq:19935,lastRet:MessageOrderCursor:[def:19165, low:19163, high:19163],pending:116 - Failed to fill batch | org.apache.activemq.broker.region.cursors.AbstractStoreCursor | ActiveMQ BrokerService[AMQ01] Task-236 org.apache.activemq.broker.SuppressReplyException: ShutdownBrokerInitiated at org.apache.activemq.util.DefaultIOExceptionHandler.handle(DefaultIOExceptionHandler.java:165) at org.apache.activemq.broker.BrokerService.handleIOException(BrokerService.java:2772) at org.apache.activemq.store.kahadb.KahaDBStore.loadMessage(KahaDBStore.java:1262) at org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore$4.execute(KahaDBStore.java:594) at org.apache.activemq.store.kahadb.disk.page.Transaction.execute(Transaction.java:779) at org.apache.activemq.store.kahadb.KahaDBStore$KahaDBMessageStore.recoverNextMessages(KahaDBStore.java:583) at org.apache.activemq.store.ProxyMessageStore.recoverNextMessages(ProxyMessageStore.java:110) at org.apache.activemq.broker.region.cursors.QueueStorePrefetch.doFillBatch(QueueStorePrefetch.java:127) at org.apache.activemq.broker.region.cursors.AbstractStoreCursor.fillBatch(AbstractStoreCursor.java:448) at org.apache.activemq.broker.region.cursors.AbstractStoreCursor.reset(AbstractStoreCursor.java:168) at org.apache.activemq.broker.region.cursors.StoreQueueCursor.reset(StoreQueueCursor.java:169) at org.apache.activemq.broker.region.Queue.doPageInForDispatch(Queue.java:1981) at org.apache.activemq.broker.region.Queue.pageInMessages(Queue.java:2210) at org.apache.activemq.broker.region.Queue.iterate(Queue.java:1644) at org.apache.activemq.thread.PooledTaskRunner.runTask(PooledTaskRunner.java:133) at org.apache.activemq.thread.PooledTaskRunner$1.run(PooledTaskRunner.java:48) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Unexpected error on journal read at: 181:32956327 at org.apache.activemq.util.IOExceptionSupport.create(IOExceptionSupport.java:28) at org.apache.activemq.store.kahadb.KahaDBStore.loadMessage(KahaDBStore.java:1260) ... 16 more Caused by: java.io.IOException: Could not load journal record, unexpected command type: KAHA_TRACE_COMMAND at location: 181:32956327 at org.apache.activemq.store.kahadb.KahaDBStore.loadMessage(KahaDBStore.java:1252) ... 16 more 2018-08-23 16:24:08,511 | INFO | Apache ActiveMQ 5.15.2 (AMQ01, ***) is shutting down | org.apache.activemq.broker.BrokerService | IOExceptionHandler: stopping BrokerService[AMQ01]