Uploaded image for project: 'Oozie'
  1. Oozie
  2. OOZIE-2018

Coordinator materialization problems with cron syntax

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • trunk, 4.0.0
    • None
    • coordinator
    • None

    Description

      Suppose you submit the following coordinator job:

      <coordinator-app name="DailySleep"
        frequency="*/2 * * * *"
        start="2013-06-01T00:00Z" end="2013-06-05T00:00Z" timezone="America/Los_Angeles"
        xmlns="uri:oozie:coordinator:0.2"
        >
        <controls>
          <timeout>-1</timeout>
          <concurrency>1</concurrency>
          <execution>FIFO</execution>
          <throttle>2</throttle>
        </controls>
        <datasets>
          <dataset name="sleep_time" frequency="${coord:days(1)}"
                   initial-instance="2012-05-31T00:00Z" timezone="America/Los_Angeles">
            <uri-template>${DAY}</uri-template>
            <done-flag></done-flag>
          </dataset>
        </datasets>
        <action>
          <workflow>
            <app-path>${wf_application_path}</app-path>
            <configuration>
              <property>
                <name>REDUCER_SLEEP_TIME</name>
                <value>120000</value>
              </property>
              <property>
                <name>oozie.use.system.libpath</name>
                <value>true</value>
              </property>
            </configuration>
         </workflow>
        </action>
      </coordinator-app>
      

      Where ${wf_application_path} points to a workflow that simply runs a sleep MR job for 2 mins.
      Notice that the above coordinator job is set to run with a frequency of */2 * * * *, which means every 2 minutes, and the throttle is 2.

      When you run this job, you’ll see a few anomalies:

      1. Other than the first action, each action is materialized twice. The action numbering works fine, but you’ll see two actions for each Nominal Time. You can see this in the job info below.
      2. You can’t see this in the job info below, but while it’s running, there are actually 3 jobs READY at the same time, when there should be only 2 (because throttle was set to 2)
      3. OOZIE-1680 added an oozie-site config property oozie.service.coord.check.maximum.frequency=true which is supposed to block jobs with frequencies faster than 5 minutes; it didn’t stop this coordinator

      Points 1 and 2 above are likely the same problem. Point 3 is somewhat trivial.

      Here’s the job info (I killed the job before it finished, and I cut out non-relevent info to make it easier to read):

      ---------------------------------------------------------------------------------------------------------------------------------------
      ID					External ID				Created				Nominal Time
      ---------------------------------------------------------------------------------------------------------------------------------------
      0000005-140922161548481-oozie-oozi-C@1	0000006-140922161548481-oozie-oozi-W	2014-09-22 23:34:38 GMT		2013-06-01 00:00:00 GMT
      ---------------------------------------------------------------------------------------------------------------------------------------
      0000005-140922161548481-oozie-oozi-C@2	0000007-140922161548481-oozie-oozi-W	2014-09-22 23:34:38 GMT		2013-06-01 00:02:00 GMT
      ---------------------------------------------------------------------------------------------------------------------------------------
      0000005-140922161548481-oozie-oozi-C@3	0000008-140922161548481-oozie-oozi-W	2014-09-22 23:36:11 GMT		2013-06-01 00:02:00 GMT
      ---------------------------------------------------------------------------------------------------------------------------------------
      0000005-140922161548481-oozie-oozi-C@4	0000009-140922161548481-oozie-oozi-W	2014-09-22 23:36:11 GMT		2013-06-01 00:04:00 GMT
      ---------------------------------------------------------------------------------------------------------------------------------------
      0000005-140922161548481-oozie-oozi-C@5	0000005-140922161548481-oozie-oozi-C	2014-09-22 23:41:11 GMT		2013-06-01 00:04:00 GMT
      ---------------------------------------------------------------------------------------------------------------------------------------
      0000005-140922161548481-oozie-oozi-C@6	0000005-140922161548481-oozie-oozi-C	2014-09-22 23:41:11 GMT		2013-06-01 00:06:00 GMT
      ---------------------------------------------------------------------------------------------------------------------------------------
      

      I tried the same coordinator job, but used the old frequency syntax (${coord:minutes(2)}, and even though we don’t recommend a 2 min frequency, it actually worked correctly (once I set oozie.service.coord.check.maximum.frequency=false of course). So this appears to be a problem with the cron syntax. If (${coord:minutes(2)} didn’t work either, then I’d say it’s just once of the quirks of too high a frequency, but that’s not the case here.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              rkanter Robert Kanter
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: