Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-3722

Workflow actions can stuck in RUNNING state when DB connections are killed on the DB side

    XMLWordPrintableJSON

Details

    • Bug
    • Status: In Progress
    • Critical
    • Resolution: Unresolved
    • 5.2.1
    • None
    • core
    • None

    Description

      Apache Oozie 5.2.1 uses OpenJPA 2.4.2 and commons-dbcp 1.4 and commons-pool 1.5.4. These are ancient versions, I know.

      Description

      The issue is that when due to some network issues or "maintenance work" on the DB side (especially PostgreSQL) which causes the DB connection to be closed, it results exhausted Pool on the client side. Many threads are waiting at this point:

      "pool-2-thread-4" #20 prio=5 os_prio=31 tid=0x00007faf7903b800 nid=0x8603 waiting on condition [0x000000030f3e7000]
         java.lang.Thread.State: WAITING (parking)
      	at sun.misc.Unsafe.park(Native Method)
      	- parking to wait for  <0x000000066aca8e70> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
      	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
      	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
      	at org.apache.commons.pool2.impl.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:1324) 

      According to my observation this is because the JDBC driver does not get closed on the client side, nor the abstract DBCP connection org.apache.commons.dbcp2.PoolableConnection .

       

      This issue can cause workflow actions stuck in RUNNING state because the thread which would update the DB after XActionExecutor.check() doesn't get a connection causing the thread stuck infinitely.

       

      Workaround

      Restarts Oozie and/or fix the DB/network issue.

      Repro

      (Un)Fortunately I can reproduce the issue using the latest and greatest commons-dbcp 2.11.0 and commons-pool 2.12.0 along with OpenJPA 3.2.2.

      I've just created a Java application to reproduce the issue: https://github.com/dionusos/pool_exhausted_repro . See README.md for detailed repro steps.

       

      DBCP-595 was created to ask for help from DBCP/Pool teams. I am working on the case to provide them the necessary information.

      Attachments

        Issue Links

          Activity

            People

              dionusos Dénes Bodó
              dionusos Dénes Bodó
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: