Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-14375

Digest mismatch Exception when sending raw hints in cluster

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Normal
    • Resolution: Unresolved
    • None
    • Consistency/Hints
    • None
    • CentOS 7.3

    • Normal

    Description

      We have 14 nodes cluster where we seen hints file getting corrupted and resulting in the following error

      ERROR [HintsDispatcher:1] 2018-04-06 16:26:44,423 CassandraDaemon.java:228 - Exception in thread Thread[HintsDispatcher:1,1,main]
       org.apache.cassandra.io.FSReadError: java.io.IOException: Digest mismatch exception
       at org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:298) ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
       at org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:263) ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
       at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
       at org.apache.cassandra.hints.HintsDispatcher.sendHints(HintsDispatcher.java:169) ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
       at org.apache.cassandra.hints.HintsDispatcher.sendHintsAndAwait(HintsDispatcher.java:128) ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
       at org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:113) ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
       at org.apache.cassandra.hints.HintsDispatcher.dispatch(HintsDispatcher.java:94) ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
       at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.deliver(HintsDispatchExecutor.java:278) ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
       at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:260) ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
       at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.dispatch(HintsDispatchExecutor.java:238) ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
       at org.apache.cassandra.hints.HintsDispatchExecutor$DispatchHintsTask.run(HintsDispatchExecutor.java:217) ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
       at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_141]
       at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_141]
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_141]
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_141]
       at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:81) [apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
       at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_141]
       Caused by: java.io.IOException: Digest mismatch exception
       at org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNextInternal(HintsReader.java:315) ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
       at org.apache.cassandra.hints.HintsReader$BuffersIterator.computeNext(HintsReader.java:289) ~[apache-cassandra-3.11.1.jar:3.11.1-SNAPSHOT]
       ... 16 common frames omitted
      

      Notes on cluster and investigation done so far
      1. Cassandra used here is built locally from 3.11.1 branch along with following patch from issue: CASSANDRA-14080
      https://github.com/apache/cassandra/commit/68079e4b2ed4e58dbede70af45414b3d4214e195
      2. The bootstrap of 14 nodes happens in the following way:

      • Out of 14 nodes only 3 nodes are picked as seed nodes.
      • Only 1 out 3 seed nodes is started and schema is created if it was not created previously.
      • Post this, rest of nodes are bootstrapped.
      • In failure scenario, only 5 out of 14 succesfully formed the cassandra cluster. The failed nodes include two seed nodes.
        3. We confirmed the following patch from issue: CASSANDRA-13696 has been applied. From confirmed from Jay Zhuang that this is different issue from what was previously fixed.
        "this should be a different issue, as HintsDispatcher.java:128 sends hints with {{buffer}}s, this patch is only to fix the digest mismatch for HintsDispatcher.java:129, which sends hints one by one."
        4. Application uses java driver with quoram setting for cassandra
        5. We saw this issue on 7 node cluster too (different from 14 node cluster)
        6. We are able to workaround by running nodetool truncatehints on failed nodes and restarting cassandra.

      Attachments

        Activity

          People

            Unassigned Unassigned
            vinegh Vineet Ghatge
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: