Uploaded image for project: 'IMPALA'
  1. IMPALA
  2. IMPALA-3034

MemTracker leak on PHJ failure to spill

    XMLWordPrintableJSON

Details

    Description

      In the process of running test workloads for admission control, I noticed that some queries would not release all consumed memory via their MemTrackers. Without admission control enabled, this is difficult to observe because the query's parent tracker is the proc tracker which uses tcmalloc for its consumption metric, but when AC is enabled a pool tracker is the parent which indicates non-zero consumption after some workloads have been run. To isolate the issue I added a simple DCHECK to ensure consumption is 0 on MemTracker destruction:

      diff --git a/be/src/runtime/mem-tracker.cc b/be/src/runtime/mem-tracker.cc
      index 099a155..27da6b7 100644
      --- a/be/src/runtime/mem-tracker.cc
      +++ b/be/src/runtime/mem-tracker.cc
      @@ -216,6 +216,7 @@ shared_ptr<MemTracker> MemTracker::GetQueryMemTracker(
       }
      
       MemTracker::~MemTracker() {
      +  DCHECK_EQ(consumption_->current_value(), 0);
         lock_guard<mutex> l(static_mem_trackers_lock_);
         if (auto_unregister_) UnregisterFromParent();
         // Erase the weak ptr reference from the map.
      

      Running test_mem_usage_scaling.py this check fails very quickly and the stacks show

      I0218 17:54:30.967980 17576 partitioned-hash-join-node.cc:382] Not enough memory to switch to IO-sized buffer for partition 0x5f71740 of join=3 build small buffers=0 probe small buffers=1
      I0218 17:54:30.968037 17576 partitioned-hash-join-node.cc:382] Not enough memory to switch to IO-sized buffer for partition 0x5f70180 of join=3 build small buffers=0 probe small buffers=1
      I0218 17:54:30.968053 17576 partitioned-hash-join-node.cc:382] Not enough memory to switch to IO-sized buffer for partition 0x5f718c0 of join=3 build small buffers=0 probe small buffers=1
      I0218 17:54:30.968065 17576 partitioned-hash-join-node.cc:382] Not enough memory to switch to IO-sized buffer for partition 0x5f70750 of join=3 build small buffers=0 probe small buffers=1
      I0218 17:54:30.988139 17576 status.cc:45] Memory limit exceeded
          @     0x7eff410198da  impala::Status::Status()
          @     0x7eff4101964a  impala::Status::MemLimitExceeded()
          @     0x7eff3ec86900  impala::BufferedBlockMgr::MemLimitTooLowError()
          @     0x7eff40c91aa4  impala::PartitionedHashJoinNode::SpillPartition()
          @     0x7eff40c9184c  impala::PartitionedHashJoinNode::AppendRowStreamFull()
          @     0x7eff0bf65744  (unknown)
      I0218 17:54:30.988181 17576 buffered-block-mgr.cc:406] Query: e24589af696cf816:b5ffbcb937f3398c. Node=3 ran out of memory:
      Buffered block mgr
        Num writes outstanding: 0
        Num free io buffers: 0
        Num unpinned blocks: 0
        Num available buffers: -104
        Total pinned buffers: 148
        Unfullfilled reserved buffers: 104
        Remaining memory: 6815744 (#blocks=0)
        Block write threshold: 2
      Client 0xfb987c0
       PartitionedHashJoinNode id=3 ptr=0x11581680
        num_reserved_buffers=34
        num_tmp_reserved_buffers=0
        num_pinned_buffers=4
      I0218 17:54:30.988255 17576 data-stream-mgr.cc:207] DeregisterRecvr(): fragment_instance_id=e24589af696cf816:b5ffbcb937f33992, node=7
      I0218 17:54:30.988368 17576 data-stream-recvr.cc:233] cancelled stream: fragment_instance_id_=e24589af696cf816:b5ffbcb937f33992 node_id=7
      F0218 17:54:30.988528 17576 mem-tracker.cc:219] Check failed: consumption_->current_value() == 0 (59392 vs. 0)
      
      F0218 17:54:30.988528 17576 mem-tracker.cc:219] Check failed: consumption_->current_value() == 0 (59392 vs. 0)
      *** Check failure stack trace: ***
          @     0x7eff3c8479fd  google::LogMessage::Fail()
          @     0x7eff3c84a326  google::LogMessage::SendToLog()
          @     0x7eff3c84751d  google::LogMessage::Flush()
          @     0x7eff3c84adce  google::LogMessageFatal::~LogMessageFatal()
          @     0x7eff3edbb18e  impala::MemTracker::~MemTracker()
          @     0x7eff40b399ef  boost::checked_delete<>()
          @     0x7eff40b38edf  boost::scoped_ptr<>::~scoped_ptr()
          @     0x7eff40b3936f  boost::scoped_ptr<>::reset()
          @     0x7eff3ed5a683  impala::DataStreamRecvr::Close()
          @     0x7eff40b57cc7  impala::ExchangeNode::Close()
          @     0x7eff40b09452  impala::BlockingJoinNode::BuildSideThread()
          @     0x7eff40b0e9d6  boost::_mfi::mf2<>::operator()()
          @     0x7eff40b0e882  boost::_bi::list3<>::operator()<>()
          @     0x7eff40b0e5df  boost::_bi::bind_t<>::operator()()
          @     0x7eff40b0e2dc  boost::detail::function::void_function_obj_invoker0<>::invoke()
          @           0x71da04  boost::function0<>::operator()()
          @     0x7eff3d3b7567  impala::Thread::SuperviseThread()
          @     0x7eff3d3bf8f0  boost::_bi::list4<>::operator()<>()
          @     0x7eff3d3bf833  boost::_bi::bind_t<>::operator()()
          @     0x7eff3d3bf7f6  boost::detail::thread_data<>::run()
          @           0x7f542a  thread_proxy
          @     0x7eff3cece182  start_thread
          @     0x7eff3a1c747d  (unknown)
      

      It seems something isn't cleaned up properly on the PHJ spill path when it fails.

      I don't think this will be a blocking issue for either the AC case or the non-AC case, but I'm filing this as critical to triage and consider the impact before downgrading.

      Attachments

        Activity

          People

            kwho Michael Ho
            mjacobs Matthew Jacobs
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: