Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-9120

OutOfMemoryError when read auto-saved cache (probably broken)

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Normal
    • Resolution: Not A Problem
    • None
    • None
    • None
    • Linux

    • Normal

    Description

      Found during tests on a 100 nodes cluster. After restart I found that one node constantly crashes with OutOfMemory Exception. I guess that auto-saved cache was corrupted and Cassandra can't recognize it. I see that similar issues was already fixed (when negative size of some structure was read). Does auto-saved cache have checksum? it'd help to reject corrupted cache at the very beginning.

      As far as I can see current code still have that problem. Stack trace is:

      INFO [main] 2015-03-28 01:04:13,503 AutoSavingCache.java (line 114) reading saved cache /storage/core/loginsight/cidata/cassandra/saved_caches/system-sstable_activity-KeyCache-b.db
      ERROR [main] 2015-03-28 01:04:14,718 CassandraDaemon.java (line 513) Exception encountered during startup
      java.lang.OutOfMemoryError: Java heap space
              at java.util.ArrayList.<init>(Unknown Source)
              at org.apache.cassandra.db.RowIndexEntry$Serializer.deserialize(RowIndexEntry.java:120)
              at org.apache.cassandra.service.CacheService$KeyCacheSerializer.deserialize(CacheService.java:365)
              at org.apache.cassandra.cache.AutoSavingCache.loadSaved(AutoSavingCache.java:119)
              at org.apache.cassandra.db.ColumnFamilyStore.<init>(ColumnFamilyStore.java:262)
              at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:421)
              at org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:392)
              at org.apache.cassandra.db.Keyspace.initCf(Keyspace.java:315)
              at org.apache.cassandra.db.Keyspace.<init>(Keyspace.java:272)
              at org.apache.cassandra.db.Keyspace.open(Keyspace.java:114)
              at org.apache.cassandra.db.Keyspace.open(Keyspace.java:92)
              at org.apache.cassandra.db.SystemKeyspace.checkHealth(SystemKeyspace.java:536)
              at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:261)
              at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496)
              at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:585)
      

      I looked at source code of Cassandra and see:
      http://grepcode.com/file/repo1.maven.org/maven2/org.apache.cassandra/cassandra-all/2.0.10/org/apache/cassandra/db/RowIndexEntry.java

      119 int entries = in.readInt();
      120 List<IndexHelper.IndexInfo> columnsIndex = new ArrayList<IndexHelper.IndexInfo>(entries);

      It seems that value entries is invalid (negative) and it tries too allocate an array with huge initial capacity and hits OOM. I have deleted saved_cache directory and was able to start node correctly. We should expect that it may happen in real world. Cassandra should be able to skip incorrect cached data and run.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              kuzminva Vladimir Kuzmin
              Ariel Weisberg
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: