Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-21854

Race condition in TestProcedureSkipPersistence

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Resolved
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 2.1.3
    • Fix Version/s: 3.0.0, 2.2.0, 2.0.5, 2.3.0, 2.1.4
    • Component/s: proc-v2
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      There is a race condition inĀ TestProcedureSkipPersistence. After the procedure is added, the test stops ProcedureExecutor. In some cases the procedure is not added to the queue in time.

      Failing execution:

      2019-02-06 14:18:11,133 INFO  [Time-limited test] procedure2.TimeoutExecutorThread(82): ADDED pid=-1, state=WAITING_TIMEOUT; org.apache.hadoop.hbase.procedure2.CompletedProcedureCleaner; timeout=30000, timestamp=1549491521133
      2019-02-06 14:18:11,135 INFO  [PEWorker-1] procedure2.TimeoutExecutorThread(82): ADDED pid=1, state=WAITING_TIMEOUT, locked=true; org.apache.hadoop.hbase.procedure2.TestProcedureSkipPersistence$TestProcedure; timeout=2000, timestamp=1549491493135
      2019-02-06 14:18:11,137 INFO  [Time-limited test] hbase.Waiter(189): Waiting up to [30,000] milli-secs(wait.for.ratio=[1])
      2019-02-06 14:18:11,139 INFO  [Time-limited test] procedure2.ProcedureTestingUtility(125): RESTART - Stop
      2019-02-06 14:18:11,139 INFO  [Time-limited test] procedure2.ProcedureExecutor(702): Stopping
      2019-02-06 14:18:11,139 INFO  [Time-limited test] wal.WALProcedureStore(331): Stopping the WAL Procedure Store, isAbort=false
      2019-02-06 14:18:11,140 DEBUG [PEWorker-1] procedure2.RootProcedureState(153): Add procedure pid=1, state=WAITING_TIMEOUT, locked=true; org.apache.hadoop.hbase.procedure2.TestProcedureSkipPersistence$TestProcedure as the 0th rollback step
      2019-02-06 14:18:11,141 WARN  [PEWorker-1] procedure2.ProcedureExecutor$WorkerThread(2074): Worker terminating UNNATURALLY null
      java.lang.RuntimeException: the store must be running before inserting data
         at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.pushData(WALProcedureStore.java:710)
         at org.apache.hadoop.hbase.procedure2.store.wal.WALProcedureStore.update(WALProcedureStore.java:603)
         at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.updateStoreOnExec(ProcedureExecutor.java:1943)
         at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1809)
         at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1481)
         at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$1200(ProcedureExecutor.java:78)
         at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:2058)
      2019-02-06 14:18:11,145 INFO  [Time-limited test] procedure2.ProcedureTestingUtility(137): RESTART - Start

      In a successful run the ProcExecutor is stopped AFTER the procedure is actually in the queue.

      Successful:

      2019-02-07 15:48:08,731 INFO  [Time-limited test] procedure2.TimeoutExecutorThread(82): ADDED pid=-1, state=WAITING_TIMEOUT; org.apache.hadoop.hbase.procedure2.CompletedProcedureCleaner; timeout=30000, timestamp=1549550918731
      2019-02-07 15:48:08,731 INFO  [PEWorker-1] procedure2.TimeoutExecutorThread(82): ADDED pid=1, state=WAITING_TIMEOUT, locked=true; org.apache.hadoop.hbase.procedure2.TestProcedureSkipPersistence$TestProcedure; timeout=2000, timestamp=1549550890731
      2019-02-07 15:48:08,732 DEBUG [PEWorker-1] procedure2.RootProcedureState(153): Add procedure pid=1, state=WAITING_TIMEOUT, locked=true; org.apache.hadoop.hbase.procedure2.TestProcedureSkipPersistence$TestProcedure as the 0th rollback step
      2019-02-07 15:48:08,732 INFO  [WALProcedureStoreSyncThread] wal.WALProcedureStore(1217): Remove all state logs with ID less than 1, since all the active procedures are in the latest log
      2019-02-07 15:48:08,733 DEBUG [WALProcedureStoreSyncThread] wal.WALProcedureStore(1239): Removed log=file:/Users/peter.somogyi/Cloudera/hbase/hbase-procedure/target/test-data/b9a1969a-85a4-15e8-7da5-6198f5acf2de/proc-logs/pv2-00000000000000000001.log, activeLogs=[/Users/peter.somogyi/Cloudera/hbase/hbase-procedure/target/test-data/b9a1969a-85a4-15e8-7da5-6198f5acf2de/proc-logs/pv2-00000000000000000002.log]
      2019-02-07 15:48:08,734 INFO  [Time-limited test] hbase.Waiter(189): Waiting up to [30,000] milli-secs(wait.for.ratio=[1])
      2019-02-07 15:48:08,734 INFO  [Time-limited test] procedure2.ProcedureTestingUtility(125): RESTART - Stop
      2019-02-07 15:48:08,734 INFO  [Time-limited test] procedure2.ProcedureExecutor(702): Stopping
      2019-02-07 15:48:08,735 INFO  [Time-limited test] wal.WALProcedureStore(331): Stopping the WAL Procedure Store, isAbort=false
      2019-02-07 15:48:08,735 INFO  [Time-limited test] procedure2.ProcedureTestingUtility(137): RESTART - Start

        Attachments

        1. HBASE-21854.patch
          2 kB
          Duo Zhang

          Activity

            People

            • Assignee:
              psomogyi Peter Somogyi
              Reporter:
              psomogyi Peter Somogyi
            • Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: