[HBASE-830] Debugging HCM.locateRegionInMeta is painful - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Minor
Resolution: Fixed
Affects Version/s: 0.2.0
Fix Version/s: 0.2.1, 0.18.0
Component/s: Client
Labels:
None

Description

I've been debugging a case where a bunch of reduces were hanging for no apparent reason and then get killed because they did not do anything for 600 seconds. I figured that it's because we are stuck in a very long waiting time due to retry backoffs.

public static int RETRY_BACKOFF[] = { 1, 1, 1, 1, 2, 4, 8, 16, 32, 64 };

That means we wait 10 sec, 10 sec, 10, 10, ... then 640 sec. That's a long time, do we really need that much time to finally be warned that there's a bug in HBase?

Also, the places where we get this:

LOG.debug("reloading table servers because: " + t.getMessage());

should be more verbose. I my logs these are caused by a table not found but the only thing I see is "reloading table servers because: tableName".

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

hbase-826-v1.patch
15/Aug/08 16:32
2 kB
Jean-Daniel Cryans
830-v2-shortertimeouts.patch
15/Aug/08 18:09
2 kB
Michael Stack

Activity

People

Assignee:: Unassigned

Reporter:: Jean-Daniel Cryans

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 14/Aug/08 15:28

Updated:: 13/Sep/08 23:22

Resolved:: 15/Aug/08 18:31