Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-14375

Digest mismatch Exception when sending raw hints in cluster

Agile BoardAttach filesAttach ScreenshotBulk Copy AttachmentsBulk Move AttachmentsAdd voteVotersWatch issueWatchersCreate sub-taskConvert to sub-taskMoveLinkCloneLabelsUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Normal
    • Resolution: Unresolved
    • None
    • Consistency/Hints
    • None
    • CentOS 7.3

    • Normal

    Description

      We have 14 nodes cluster where we seen hints file getting corrupted and resulting in the following error

      ERROR [HintsDispatcher:1] 2018-04-06 16:26:44,423 CassandraDaemon.java:228 - Exception in thread Thread[HintsDispatcher:1,1,main]
       org.apache.cassandra.io.FSReadError: java.io.IOException: Digest mismatch exception
       at org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:298) ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
       at org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:263) ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
       at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
       at org.apache.cassandra.hints.HintsDispatcher.sendHints(HintsDispatcher.java:169) ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
       at org.apache.cassandra.hints.HintsDispatcher.sendHintsAndAwait(HintsDispatcher.java:128) ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
       at org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:113) ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
       at org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:94) ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
       at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.deliver(HintsDispatchExecutor.java:278) ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
       at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:260) ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
       at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:238) ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
       at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.run(HintsDispatchExecutor.java:217) ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_141]
       at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_141]
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_141]
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_141]
       at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) [apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
       at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_141]
       Caused by: java.io.IOException: Digest mismatch exception
       at org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNextInternal(HintsReader.java:315) ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
       at org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:289) ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
       ... 16 common frames omitted
      

      Notes on cluster and investigation done so far
      1. Cassandra used here is built locally from 3.11.1 branch along with following patch from issue: CASSANDRA-14080
      https://github.com/apache/cassandra/commit/68079e4b2ed4e58dbede70af45414b3d4214e195
      2. The bootstrap of 14 nodes happens in the following way:

      • Out of 14 nodes only 3 nodes are picked as seed nodes.
      • Only 1 out 3 seed nodes is started and schema is created if it was not created previously.
      • Post this, rest of nodes are bootstrapped.
      • In failure scenario, only 5 out of 14 succesfully formed the cassandra cluster. The failed nodes include two seed nodes.
        3. We confirmed the following patch from issue: CASSANDRA-13696 has been applied. From confirmed from Jay Zhuang that this is different issue from what was previously fixed.
        "this should be a different issue, as HintsDispatcher.java:128 sends hints with {{buffer}}s, this patch is only to fix the digest mismatch for HintsDispatcher.java:129, which sends hints one by one."
        4. Application uses java driver with quoram setting for cassandra
        5. We saw this issue on 7 node cluster too (different from 14 node cluster)
        6. We are able to workaround by running nodetool truncatehints on failed nodes and restarting cassandra.

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned Assign to me
            vinegh Vineet Ghatge

            Dates

              Created:
              Updated:

              Slack

                Issue deployment