Uploaded image for project: 'Cassandra'
  1. Cassandra
  2. CASSANDRA-18589

NPE during reads after complex column drop

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Normal
    • Resolution: Unresolved
    • None
    • Local/SSTable
    • None
    • Correctness - Recoverable Corruption / Loss
    • Normal
    • Normal
    • Adhoc Test
    • All
    • None

    Description

      When writing data in parallel with dropping a complex column, the subsequent reads may fail with NPE until the affected sstable is compacted. 

       

      The scenario leading to NPE is as follows: there exists a row which contains data for a complex column that is now dropped. There are no other complex columns. The removed column is not skipped during deserialization of the row (ColumnFilter is not aware of dropped columns).

      At the same time, Row$Merger$ColumnDataReducer is not aware of existence of a complex column (hasComplex==false) and thus doesn't have a builder for complex data, eventually yielding NPE when processing said complex column (backtrace from 3.11):

      ERROR [ReadStage-2] node2 2023-06-13 11:00:46,756 Uncaught exception on thread Thread[ReadStage-2,5,node2]
      java.lang.RuntimeException: java.lang.NullPointerException
              at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2777)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
              at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
              at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService
      .java:134)
              at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:113)
              at java.lang.Thread.run(Thread.java:748)
      Caused by: java.lang.NullPointerException: null
              at org.apache.cassandra.db.rows.Row$Merger$ColumnDataReducer.getReduced(Row.java:789)
              at org.apache.cassandra.db.rows.Row$Merger$ColumnDataReducer.getReduced(Row.java:726)
              at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:217)
              at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:156)
              at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
              at org.apache.cassandra.db.rows.Row$Merger.merge(Row.java:703)
              at org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator$MergeReducer.getReduced(UnfilteredRowIterators.
      java:587)
              at org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator$MergeReducer.getReduced(UnfilteredRowIterators.java:551)
              at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:217)
              at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:156)
              at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
              at org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:533)
              at org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:390)
              at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
              at org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:100)
              at org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:32)
              at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
              at org.apache.cassandra.db.transform.BaseRows.hasNext(BaseRows.java:133)
              at org.apache.cassandra.db.transform.UnfilteredRows.isEmpty(UnfilteredRows.java:74)
              at org.apache.cassandra.db.partitions.PurgeFunction.applyToPartition(PurgeFunction.java:75)
              at org.apache.cassandra.db.partitions.PurgeFunction.applyToPartition(PurgeFunction.java:26)
              at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:96)
              at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:305)
              at org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:187)
              at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:180)
              at org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:176)
              at org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:76)
              at org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:360)
              at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:2007)
              at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2773)

      The NPE problem races with another problem in that scenario (CASSANDRA-18591), so running the reproduction test YMMV which one you hit.

       

      While it may be tempting to fix the NPE by lazy initialization of the needed builder structure et al., it seems that there is an implicit assumption that columns like the dropped one should not get into read path machinery at all at this point. 

      Thus, instead of just fixing the NPE and hoping no other class makes such an assumption I intend to instead make the assumption valid by cutting out the dropped column as soon as possible (i.e. during deserialization)

      I don't know if I need to care about memtable (instead of sstable contents only).

      I don't think schema agreement etc. is relevant - currently the ColumnFilter uses some specific TableMetadata, so if I use the very same TableMetadata as the source of dropped column info there should be internal consistency between ColumnFilter and the ColumnDataReducer (or potentially, other classes)

      Thoughts? blerer blambov 

      Attachments

        Activity

          People

            jakubzytka Jakub Zytka
            jakubzytka Jakub Zytka
            Jakub Zytka
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: