Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-19147 All branch-2 unit tests pass
  3. HBASE-20015

TestMergeTableRegionsProcedure and TestRegionMergeTransactionOnCluster flakey

    XMLWordPrintableJSON

Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.0.0-beta-2, 2.0.0
    • test
    • None

    Description

      MergeRegionProcedure seems incomplete. The ProcedureExecutor framework can run in a test mode such that it kills the Procedure before it can persist state and it does this repeatedly to shake out areas where Procedures may not be preserving all needed state at each Procedural step. The kill will cause the Procedure to 'fail'. It'll then run the rollback procedure. The MergeRegionProcedure is not able to roll back the last few steps of Merge.... It throws an UnsupportedException (the hope was that the missing steps would be filled in ... but they are hard to complete in that they themselves are stepped).

      So....

      Well it turns out that Split has a mechanism where it will not fail the Procedure if gets to a stage from which it cannot rollback. Instead, it will just retry and keep retrying till it succeeds.... eventually. Merge has this facility half-implemented. Merge tests are therefore flakey. They do stuff like this:

      2018-02-17 04:04:02,999 WARN  [PEWorker-1] assignment.MergeTableRegionsProcedure(311): Failed rollback attempt step MERGE_TABLE_REGIONS_UPDATE_META for merging the regions [485dd0c2a5d14601d61fed791f793158, 8af34a614f064c162ab1d05eac7fca4c] in table testRollbackAndDoubleExecution
      java.lang.UnsupportedOperationException: pid=44, state=FAILED:MERGE_TABLE_REGIONS_UPDATE_META, exception=org.apache.hadoop.hbase.procedure2.ProcedureAbortedException via MergeTableRegionsProcedure:org.apache.hadoop.hbase.procedure2.ProcedureAbortedException: abort requested; MergeTableRegionsProcedure table=testRollbackAndDoubleExecution, regions=[485dd0c2a5d14601d61fed791f793158, 8af34a614f064c162ab1d05eac7fca4c], forcibly=false unhandled state=MERGE_TABLE_REGIONS_UPDATE_META
      	at org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.rollbackState(MergeTableRegionsProcedure.java:291)
      	at org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.rollbackState(MergeTableRegionsProcedure.java:78)
      	at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:199)
      	at org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:859)
      	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1356)
      	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1312)
      	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1181)
      	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78)
      	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1734)
      2018-02-17 04:04:03,007 ERROR [PEWorker-1] helpers.MarkerIgnoringBase(159): CODE-BUG: Uncaught runtime exception for pid=44, state=FAILED:MERGE_TABLE_REGIONS_UPDATE_META, exception=org.apache.hadoop.hbase.procedure2.ProcedureAbortedException via MergeTableRegionsProcedure:org.apache.hadoop.hbase.procedure2.ProcedureAbortedException: abort requested; MergeTableRegionsProcedure table=testRollbackAndDoubleExecution, regions=[485dd0c2a5d14601d61fed791f793158, 8af34a614f064c162ab1d05eac7fca4c], forcibly=false
      java.lang.UnsupportedOperationException: pid=44, state=FAILED:MERGE_TABLE_REGIONS_UPDATE_META, exception=org.apache.hadoop.hbase.procedure2.ProcedureAbortedException via MergeTableRegionsProcedure:org.apache.hadoop.hbase.procedure2.ProcedureAbortedException: abort requested; MergeTableRegionsProcedure table=testRollbackAndDoubleExecution, regions=[485dd0c2a5d14601d61fed791f793158, 8af34a614f064c162ab1d05eac7fca4c], forcibly=false unhandled state=MERGE_TABLE_REGIONS_UPDATE_META
      	at org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.rollbackState(MergeTableRegionsProcedure.java:291)
      	at org.apache.hadoop.hbase.master.assignment.MergeTableRegionsProcedure.rollbackState(MergeTableRegionsProcedure.java:78)
      	at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.rollback(StateMachineProcedure.java:199)
      	at org.apache.hadoop.hbase.procedure2.Procedure.doRollback(Procedure.java:859)
      	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1356)
      	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeRollback(ProcedureExecutor.java:1312)
      	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.executeProcedure(ProcedureExecutor.java:1181)
      	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$800(ProcedureExecutor.java:78)
      	at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$WorkerThread.run(ProcedureExecutor.java:1734)
      

      i.e. throw up their hands which makes for a CODE-BUG... a condition the framework can not process.... The test fails.

      Attachments

        1. HBASE-20015.branch-2.001.patch
          3 kB
          Michael Stack

        Issue Links

          Activity

            People

              stack Michael Stack
              stack Michael Stack
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: