Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-13538

Cassandra tasks permanently block after the following assertion occurs during compaction: "java.lang.AssertionError: Interval min > max "

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Normal
    • Resolution: Unresolved
    • 2.1.x
    • Local/Compaction
    • None
    • This happens on a 7 node system with 2 data centers. We're using Cassandra version 2.1.15. I upgraded to 2.1.17 and it still occurs.

    • Normal

    Description

      We noticed this problem because the commitlogs proliferate to the point that we eventually run out of disk space. nodetool tpstats shows several of the tasks backed up:

      Pool Name                    Active   Pending      Completed   Blocked  All time blocked
      MutationStage                     0         0      134335315         0                 0
      ReadStage                         0         0      643986790         0                 0
      RequestResponseStage              0         0         114298         0                 0
      ReadRepairStage                   0         0             36         0                 0
      CounterMutationStage              0         0              0         0                 0
      MiscStage                         0         0              0         0                 0
      AntiEntropySessions               1         1          79357         0                 0
      HintedHandoff                     0         0             90         0                 0
      GossipStage                       0         0        6595098         0                 0
      CacheCleanupExecutor              0         0              0         0                 0
      InternalResponseStage             0         0        1638369         0                 0
      CommitLogArchiver                 0         0              0         0                 0
      CompactionExecutor                2       175        2922542         0                 0
      ValidationExecutor                0         0        1465374         0                 0
      MigrationStage                    1        76            600         0                 0
      AntiEntropyStage                  1       923        8291098         0                 0
      PendingRangeCalculator            0         0             20         0                 0
      Sampler                           0         0              0         0                 0
      MemtableFlushWriter               0         0          53017         0                 0
      MemtablePostFlush                 1      4584        1545141         0                 0
      MemtableReclaimMemory             0         0          70639         0                 0
      Native-Transport-Requests         0         0         352559         0                 0
      

      This all starts after the following exception is raised in Cassandra:

      ERROR [MemtableFlushWriter:2437] 2017-05-15 01:53:23,380 CassandraDaemon.java:231 - Exception in thread Thread[MemtableFlushWriter:2437,5,main]
      java.lang.AssertionError: Interval min > max
      	at org.apache.cassandra.utils.IntervalTree$IntervalNode.<init>(IntervalTree.java:249) ~[apache-cassandra-2.1.15.jar:2.1.15]
      	at org.apache.cassandra.utils.IntervalTree.<init>(IntervalTree.java:72) ~[apache-cassandra-2.1.15.jar:2.1.15]
      	at org.apache.cassandra.db.DataTracker$SSTableIntervalTree.<init>(DataTracker.java:603) ~[apache-cassandra-2.1.15.jar:2.1.15]
      	at org.apache.cassandra.db.DataTracker$SSTableIntervalTree.<init>(DataTracker.java:597) ~[apache-cassandra-2.1.15.jar:2.1.15]
      	at org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:578) ~[apache-cassandra-2.1.15.jar:2.1.15]
      	at org.apache.cassandra.db.DataTracker$View.replaceFlushed(DataTracker.java:740) ~[apache-cassandra-2.1.15.jar:2.1.15]
      	at org.apache.cassandra.db.DataTracker.replaceFlushed(DataTracker.java:172) ~[apache-cassandra-2.1.15.jar:2.1.15]
      	at org.apache.cassandra.db.compaction.AbstractCompactionStrategy.replaceFlushed(AbstractCompactionStrategy.java:234) ~[apache-cassandra-2.1.15.jar:2.1.15]
      	at org.apache.cassandra.db.ColumnFamilyStore.replaceFlushed(ColumnFamilyStore.java:1521) ~[apache-cassandra-2.1.15.jar:2.1.15]
      	at org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:336) ~[apache-cassandra-2.1.15.jar:2.1.15]
      	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-2.1.15.jar:2.1.15]
      	at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) ~[guava-16.0.jar:na]
      	at org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1127) ~[apache-cassandra-2.1.15.jar:2.1.15]
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_121]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_121]
      	at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_121]
      

      This has only occurred on one of our system tester's setup but with regularity. I couldn't begin to tell you how to reproduce it. We have many systems deployed only one this one setup encounters this issue. I have included the jstack output, config file, log file, and schema. I even have a heap dump available if needed. After looking at the heap dump, the best I can tell is that the assertion failure left a lock (i.e. latch) in a locked state that then causes a backlog of pending tasks.

      I'm hoping this assertion will mean something to the Cassandra development community and perhaps fixed in a newer release.

      Attachments

        1. tpstats.out
          2 kB
          Andy Klages
        2. system.log
          16.52 MB
          Andy Klages
        3. schema.cql3
          107 kB
          Andy Klages
        4. jstack.out
          142 kB
          Andy Klages
        5. cassandra.yaml
          38 kB
          Andy Klages

        Activity

          People

            Unassigned Unassigned
            aklages Andy Klages
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: