Uploaded image for project: 'Hive'
  1. Hive
  2. HIVE-26669

Hive Metastore become unresponsive

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Not A Problem
    • 3.1.0
    • None
    • Metastore
    • None

    Description

      We are experiencing issues with Hive Metastore where it goes unresponsive. Initial investigation shows thousands of thread in WAITING (parking) state as shown below:

      1 java.lang.Thread.State: BLOCKED (on object monitor)
      772 java.lang.Thread.State: RUNNABLE
      2 java.lang.Thread.State: TIMED_WAITING (on object monitor)
      13 java.lang.Thread.State: TIMED_WAITING (parking)
      5 java.lang.Thread.State: TIMED_WAITING (sleeping)
      3 java.lang.Thread.State: WAITING (on object monitor)
      14308 java.lang.Thread.State: WAITING (parking)

      ==============

      Almost all of the threads are stuck at 'parking to wait for <0x00007f9ad0795c48> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)'

      15 - parking to wait for <0x00007f9ad06c9c10> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
      14288 - parking to wait for <0x00007f9ad0795c48> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
      1 - parking to wait for <0x00007f9ad0a161f8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
      1 - parking to wait for <0x00007f9ad0a39248> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
      1 - parking to wait for <0x00007f9ad0adb0a0> (a java.util.concurrent.SynchronousQueue$TransferQueue)
      5 - parking to wait for <0x00007f9ad0b12278> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
      1 - parking to wait for <0x00007f9ad0b12518> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
      1 - parking to wait for <0x00007f9ad0b44878> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
      1 - parking to wait for <0x00007f9ad0cbe8f0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
      1 - parking to wait for <0x00007f9ad1318d60> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
      1 - parking to wait for <0x00007f9ad1478c10> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
      5 - parking to wait for <0x00007f9ad1494ff8> (a java.util.concurrent.SynchronousQueue$TransferQueue)

      ======================
      complete stack:
      "pool-8-thread-62238" #3582305 prio=5 os_prio=0 tid=0x00007f977bfc9800 nid=0x62011 waiting on condition [0x00007f959d917000]
      java.lang.Thread.State: WAITING (parking)
      at sun.misc.Unsafe.park(Native Method)

      • parking to wait for <0x00007f9ad0795c48> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
        at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
        at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
        at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:351)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:77)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:137)
        at org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:59)
        at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStoreForConf(HiveMetaStore.java:750)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMSForConf(HiveMetaStore.java:718)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:712)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database_core(HiveMetaStore.java:1488)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_database(HiveMetaStore.java:1470)
        at sun.reflect.GeneratedMethodAccessor40.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:147)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:108)
        at com.sun.proxy.$Proxy30.get_database(Unknown Source)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_database.getResult(ThriftHiveMetastore.java:15014)
        at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_database.getResult(ThriftHiveMetastore.java:14998)
        at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
        at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
        at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:636)
        at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:631)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at org.apache.hadoop.hive.metastore.security.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:631)
        at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)

      Locked ownable synchronizers:

      • <0x00007fae9f0d8c20> (a java.util.concurrent.ThreadPoolExecutor$Worker)
        ======================
        Looking at linux process, Hive exhausts its 'max processes count' while the issue is happening
        set to:
        Max processes 16000 16000 processes

      As a workaround, we restart Metastores and it works fine for few days.

      Attachments

        Activity

          People

            cnauroth Chris Nauroth
            sandygade Sandeep Gade
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: