[HBASE-23679] FileSystem instance leaks due to bulk loads with Kerberos enabled - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Critical
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 3.0.0-alpha-1, 2.3.0, 2.1.9, 2.2.4
Component/s: None
Labels:
None

Hadoop Flags:

Reviewed
Release Note:

Hide
This issues fixes an issue with Bulk Loading on installations with Kerberos enabled and more than a single RegionServer. When multiple tables are involved in hosting a table's regions which are being bulk-loaded into, all but the RegionServer hosting the table's first Region will "leak" one DistributedFileSystem object onto the heap, never freeing that memory. Eventually, with enough bulk loads, this will create a situation for RegionServers where they have no free heap space and will either spend all time in JVM GC, lose their ZK session, or crash with an OutOfMemoryError.

The only mitigation for this issue is to periodically restart RegionServers. All earlier versions of HBase 2.x are subject to this issue (2.0.x, <=2.1.8, <=2.2.3)

Show
This issues fixes an issue with Bulk Loading on installations with Kerberos enabled and more than a single RegionServer. When multiple tables are involved in hosting a table's regions which are being bulk-loaded into, all but the RegionServer hosting the table's first Region will "leak" one DistributedFileSystem object onto the heap, never freeing that memory. Eventually, with enough bulk loads, this will create a situation for RegionServers where they have no free heap space and will either spend all time in JVM GC, lose their ZK session, or crash with an OutOfMemoryError. The only mitigation for this issue is to periodically restart RegionServers. All earlier versions of HBase 2.x are subject to this issue (2.0.x, <=2.1.8, <=2.2.3)

Description

Spent the better part of a week chasing an issue on HBase 2.x where the number of DistributedFileSystem instances on the heap of a RegionServer would grow unbounded. Looking at multiple heap-dumps, it was obvious to see that we had an immense number of DFS instances cached (in FileSystem$Cache) for the same user, with the unique number of Tokens contained in that DFS's UGI member (one hbase delegation token, and two HDFS delegation tokens – we only do this for bulk loads). For the user's clusters, they eventually experienced 10x perf degradation as RegionServers spent all of their time in JVM GC (they were unlucky to not have RegionServers crash outright, as this would've, albeit temporarily, fixed the issue).

The problem seems to be two-fold with changes by ~~HBASE-15291~~ being largely the cause. This issue tried to close FileSystem instances which were being leaked – however, it did this by instrumenting the method SecureBulkLoadManager.cleanupBulkLoad(..). Two big issues with this approach:

It relies on clients to call this method (client's hanging up will leak resources in RegionServers)
This method is only called on the RegionServer hosting the first Region of the table which was bulk-loaded into. For multiple RegionServers, they are left to leak resources.

~~HBASE-21342~~ later tried to fix an issue where FS objects were now being closed prematurely via reference-counting (which appears to work fine), but does not address the other two issues above. Point #2 makes debugging this issue harder than normal because it doesn't manifest on a single node instance

Through all of this, I (re)learned the dirty history of UGI and how its caching doesn't work so great ~~HADOOP-6670~~. I see trying to continue to leverage the FileSystem$CACHE as a potentially dangerous thing (we've been back here multiple times already). My opinion at this point is that we should cleanly create a new FileSystem instance during the call to SecureBulkLoadManager#secureBulkLoadHFiles(..) and close it in a finally block in that same method. This both simplifies the lifecycle of a FileSystem instance in the bulk-load codepath but also helps us avoid future problems with UGI and FS caching. The one downside is that we pay the penalty to create a new FileSystem instance, but I'm of the opinion that we cross that bridge when we get there.

Thanks for jdcryans and busbey for their help along the way.

Attachments

Issue Links

links to

GitHub Pull Request #1019

GitHub Pull Request #1029

Activity

People

Assignee:: Josh Elser

Reporter:: Josh Elser

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 10/Jan/20 19:52

Updated:: 15/Jan/20 13:34

Resolved:: 14/Jan/20 00:09