[HBASE-18971] Limit the concurrent opened wal writers when splitting - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Resolved
Priority: Major
Resolution: Duplicate
Affects Version/s: None
Fix Version/s: None
Component/s: Recovery, wal
Labels:
None

Description

A whole cluster restart is very easy to fail under the current architecture if there are many regions on a single region server.

On a small cluster, although an recovered edits file is very small, NN will reserve a block size for it when opening, so it will easily run out of space.

And on a large cluster, although the max xceiver count is already 4096, it is still easy to run out of quota and cause DN to reject our request if there are 1k+ regions on a single RS as we will write 3 copies for a block.

Under the current architecture we need to carefully choose the ‘hbase.regionserver.wal.max.splitters’ and 'hbase.master.executor.serverops.threads' to limit the concurrency of wal splitter. But this is only a compromise as it also slows down the fail recovery.

So here we want to limit the concurrent opened wal writers when splitting. It may work like a memstore, which buffers the wal entries in memory and when it is full we flush some entries out.

Suggestions are welcomed.

Attachments

Issue Links

is duplicated by

HBASE-19358 Improve the stability of splitting log when do fail over

Resolved

Activity

People

Assignee:: Unassigned

Reporter:: Duo Zhang

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 09/Oct/17 08:51

Updated:: 03/Jan/18 12:33

Resolved:: 03/Jan/18 12:33