[HBASE-16288] HFile intermediate block level indexes might recurse forever creating multi TB files - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.3.0, 1.1.6, 0.98.21, 1.2.3, 2.0.0
Component/s: HFile
Labels:
None

Hadoop Flags:

Reviewed
Release Note:

Hide
A new hfile configuration "hfile.index.block.min.entries" which defaults to 16 determines how many entries the hfile index block can have at least. The configuration which determines how large the index block can be at max (hfile.index.block.max.size) is ignored as long as we have fewer than hfile.index.block.min.entries entries. This ensures that multi-level index does not build up with too many levels.

Show
A new hfile configuration "hfile.index.block.min.entries" which defaults to 16 determines how many entries the hfile index block can have at least. The configuration which determines how large the index block can be at max (hfile.index.block.max.size) is ignored as long as we have fewer than hfile.index.block.min.entries entries. This ensures that multi-level index does not build up with too many levels.

Description

Mighty elserj was debugging an opentsdb cluster where some region directory ended up having 5TB+ files under <regiondir>/.tmp/

Further debugging and analysis, we were able to reproduce the problem locally where we never we recursing in this code path for writing intermediate level indices:

HFileBlockIndex.java

if (curInlineChunk != null) {
        while (rootChunk.getRootSize() > maxChunkSize) {
          rootChunk = writeIntermediateLevel(out, rootChunk);
          numLevels += 1;
        }
      }

The problem happens if we end up with a very large rowKey (larger than "hfile.index.block.max.size" being the first key in the block, then moving all the way to the root-level index building. We will keep writing and building the next level of intermediate level indices with a single very-large key. This can happen in flush / compaction / region recovery causing cluster inoperability due to ever-growing files.

Seems the issue was also reported earlier, with a temporary workaround:
https://github.com/OpenTSDB/opentsdb/issues/490

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

hbase-16288_v1.patch
26/Jul/16 18:03
3 kB
Enis Soztutar
hbase-16288_v2.patch
26/Jul/16 19:32
7 kB
Enis Soztutar
hbase-16288_v3.patch
26/Jul/16 20:30
8 kB
Enis Soztutar
hbase-16288_v4.patch
29/Jul/16 21:31
11 kB
Enis Soztutar

Issue Links

is related to

HBASE-16319 Fix TestCacheOnWrite after HBASE-16288

Resolved

Activity

People

Assignee:: Enis Soztutar

Reporter:: Enis Soztutar

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Dates

Created:: 26/Jul/16 17:59

Updated:: 09/Nov/17 01:21

Resolved:: 01/Aug/16 18:14