Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-3254

[coordinator] LAST_ONLY and NONE execution modes: possible OutOfMemoryError when there are too many coordinator actions to materialize

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 5.0.0
    • 5.3.0
    • coordinator
    • None

    Description

      If there is a coordinator job defined with a frequency by the minute (e.g. frequency="* * * * *"), and start-time lies well in the past, and the coordinator job's execution-mode is LAST_ONLY or NONE, it can happen that too many CoordinatorActionBean instances are kept on JVM heap within CoordMaterializeTransitionXCommand#insertList as those execution modes omit the check for the throttle value.

      As a consequence, we can see as many as multiple hundred thousands of log entries trying to increase CoordMaterializeTransitionXCommand#insertList:

      [user@host ~]$ grep 'In storeToDB() coord action id' /var/log/oozie/oozie-HOSTNAME.log.out | wc -l
      478408
      

      A much worse consequence is that those CoordinatorActionBean instances are attached to GC root (the insertList itself), and thus, JVM is unable to free them until a consequent call to insertList.clear(). This will result in OutOfMemoryError occurrence in worst case.

      CoordMaterializeTransitionXCommand#insertList should be watched for a configurable limit parameter (default value something like 1000), and persisted / cleared when that limit is reached.

      Attachments

        1. OOZIE-3254-01-wip.patch
          21 kB
          Andras Salamon
        2. OOZIE-3254-002.patch
          57 kB
          János Makai

        Issue Links

          Activity

            People

              jmakai János Makai
              andras.piros Andras Piros
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: