Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
This serves 2 purposes for monitoring:
1) Catching when regions are on dead servers due to long WAL splitting or other delays in SCP. At that time, the regions are not listed as RITs; we'd like to be able to have alerts in such cases.
2) Catching various bugs in assignment and procWAL corruption, etc. that leave region "OPEN" on a server that no longer exists, again to alert the administrator via a metric.
Later, it might be possible to add more logic to distinguish 1 and 2, and to mitigate 2 automatically and also set some metric to alert the administrator to investigate later.
Attachments
Issue Links
- relates to
-
HBASE-23613 ProcedureExecutor check StuckWorkers blocked by DeadServerMetricRegionChore
- Resolved
- links to