Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-3669

Fix purge process for bundles to prevent orphan coordinators

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 5.2.1
    • 5.3.0
    • core
    • None

    Description

      The Oozie purge process for bundles is creating orphan coordinators. When purging bundle jobs and bundle actions, it does not always purge coordinator jobs, etc. This causes orphaned coordinators, meaning neither they nor their children will ever be purged due to the purge logic.

       


       

      When purging bundles, it first compiles a list of any coordinators which are not ready to purge [0]. It checks the coord list for status and coordOlderThan. If the no child coordinator meets these criteria, it adds it to the coordsToPurge list.

      Being added to the list does not guarantee that the coordinator will be purged however. The processCoordinators method also has logic to check if the children workflows are older than wfOlderThan [1]. If a purge command is started where wfOlderThan is much higher than coordOlderThan (for example the default values are 30 days for workflows and 7 days for coordinators), then the bundle will be purged, but the coordinator will not.

      Since the bundle is now purged, the child coordinator will never be purged because only parentless coordinators will be checked, since coordinators with parents will only be purged when the bundle is purged

      [0]

      PurgeXCommand
       380 long numChildrenNotReady = jpaService.execute(
       381 new CoordJobsCountNotForPurgeFromParentIdJPAExecutor(coordOlderThan, bundleId));
      CoordinatorJobBean
       192 @NamedQuery(name = "GET_COORD_COUNT_WITH_PARENT_ID_NOT_READY_FOR_PURGE", query = "select count(w) from CoordinatorJobBean"
       193 + " w where w.bundleId = :parentId and (w.statusStr NOT IN ('SUCCEEDED', 'FAILED', 'KILLED', 'DONEWITHERROR') "
       194 + "OR w.lastModifiedTimestamp >= :lastModTime)"),
      

       

      [1]

      PurgeXCommand
       343 List<String> workflowChildren = fetchTerminatedWorkflow(wfjBeanList);
       344
      private boolean isWorkflowPurgeable(WorkflowJobBean wfjBean, long wfOlderThanMS) {
       308 final Date wfEndTime = wfjBean.getEndTime();
       309 final boolean isFinished = wfjBean.inTerminalState();
       310 if (isFinished && wfEndTime != null && wfEndTime.getTime() < wfOlderThanMS)
      { 311 return true; 312 }
      313 else {
       314 final Date lastModificationTime = wfjBean.getLastModifiedTime();
       315 if (isFinished && lastModificationTime != null && lastModificationTime.getTime() < wfOlderThanMS)
      { 316 return true; 317 }
      318 }
       319 return false;
      345 // if all workflow are ready to purge add them and add the coordinator and their actions
       346 if(workflowChildren.size() == wfjBeanList.size()) {
       347 LOG.debug("Purging coordinator " + coordId);
       348 wfsToPurge.addAll(workflowChildren);
       349 coordsToPurge.add(coordId);
      

      Attachments

        1. OOZIE-3669-001.patch
          13 kB
          János Makai
        2. OOZIE-3669-002.patch
          6 kB
          János Makai
        3. OOZIE-3669-003.patch
          13 kB
          János Makai

        Activity

          People

            jmakai János Makai
            jmakai János Makai
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: