[HBASE-24545] Add backoff to SCP check on WAL split completion - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 3.0.0-alpha-1, 2.3.0
Component/s: None
Labels:
None

Hadoop Flags:

Reviewed
Release Note:

Hide
Adds backoff in ServerCrashProcedure wait on WAL split to complete if large backlog of files to split (Its possible to avoid SCP blocking, waiting on WALs to split if you use procedure-based splitting -- set 'hbase.split.wal.zk.coordinated' to false to enable procedure based wal splitting.)

Show
Adds backoff in ServerCrashProcedure wait on WAL split to complete if large backlog of files to split (Its possible to avoid SCP blocking, waiting on WALs to split if you use procedure-based splitting -- set 'hbase.split.wal.zk.coordinated' to false to enable procedure based wal splitting.)

Description

Crashed cluster. Lots of backed up WALs. Startup. Recover hundreds of servers; each has a running SCP. Taking a thread dump during recovery, I noticed that there were 160 threads each in SCP waiting on split WAL completion. Each thread was scanning zk splitWAL directory every 100ms. The dir had thousands of entries in it so each check was pulling down MB from zk... * 160 (max configured PE threads (16) * 10 for the KeepAlive factor that has us do 10 * configured PEs as max for PE worker pool).

If lots of remaining WALs to split, have the SCP backoff on its wait so it checks less frequently.

Attachments

Issue Links

links to

GitHub Pull Request #1891

Activity

People

Assignee:: Michael Stack

Reporter:: Michael Stack

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 12/Jun/20 06:09

Updated:: 13/Jun/20 15:57

Resolved:: 12/Jun/20 15:00