Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2030

Tablet server crashes on using deallocated Mutex object

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • 1.4.0
    • n/a
    • tserver

    Description

      The code in RaftConsensus::UpdateReplica() (src/kudu/consensus/raft_consensus.cc) instantiates Synchronizer on the stack and then uses the derived StatusCallback in a way that under certain code path leads to an attempt to use already deallocated Mutex object CountDownLatch::lock_. The instance of CountDownLatch is aggregated by the Synchronizer object itself.

      Under certain scenarios, tserver crashes with the following stack trace:

      F0605 18:22:23.583866 14144 mutex.cc:76] Check failed: rv == 0 || rv == 16 . Invalid argument. Owner tid: 23156096; Self tid: 144; To collect the owner stack trace, enable the flag --debug_mutex_collect_stacktrace
      *** Check failure stack trace: ***                                              
          @     0x7fab619a62fd  google::LogMessage::Fail() at ??:0                    
          @     0x7fab619a81bd  google::LogMessage::SendToLog() at ??:0               
          @     0x7fab619a5e39  google::LogMessage::Flush() at ??:0                   
          @     0x7fab619a8c5f  google::LogMessageFatal::~LogMessageFatal() at ??:0   
          @     0x7fab627eb453  kudu::Mutex::TryAcquire() at ??:0                     
          @     0x7fab627eb82c  kudu::Mutex::Acquire() at ??:0                        
          @     0x7fab6aec6b7a  kudu::CountDownLatch::CountDown() at ??:0             
          @     0x7fab6aec526a  kudu::CountDownLatch::CountDown() at ??:0             
          @     0x7fab69339633  kudu::Synchronizer::StatusCB() at ??:0                
          @     0x7fab69339a21  kudu::internal::RunnableAdapter<>::Run() at ??:0      
          @     0x7fab69339964  kudu::internal::InvokeHelper<>::MakeItSo() at ??:0    
          @     0x7fab693398f2  kudu::internal::Invoker<>::Run() at ??:0              
          @     0x7fab692bce26  kudu::Callback<>::Run() at ??:0  
      

      The pthread_mutex_trylock() in mutex.cc:74 returns EINVAL since the underlying pthread mutex handle has already been deallocated.

      To reproduce, run the ClientFailoverOnNegotiationTimeoutITest.Kudu1580ConnectToTServer from client-negotiation-failover-itest built from version 5f8442ff67fe87b019c71a09f0556bdcb6868428 in DEBUG configuration with --stress-cpu-threads=8 about 1K times. One 1K run usually produces about 3-4 crashes like that.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              aserbin Alexey Serbin
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: