[HBASE-21081] Trim Master memory usage, part 2 - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 2.0.1
Fix Version/s: None
Component/s: None
Labels:
None

Description

Good one found by a jxray spelunking misha@cloudera.com on a 700 node cluster with 500k+ regions. For some reason, there are >1M instances of each column family when there should be only 500k (By rights there should be only the number of column families in the table rather than repeating these bytes per region – TODO).

The below seemed suspicious added by ~~HBASE-19496~~. It is making hashmaps with byte []s for keys. Byte []'s don't do hashCode/Equals. Usually when we have byte []'s for keys, we do ConcurrentMap and pass a Comparator in constructor that knows how to do byte []s.

.setStoreSequenceIds(regionLoadPB.getStoreCompleteSequenceIdList().stream()
  .collect(Collectors.toMap(
    (ClusterStatusProtos.StoreSequenceId s) -> s.getFamilyName().toByteArray(),
      ClusterStatusProtos.StoreSequenceId::getSequenceId)))

But looking back through code, even if a hashmap, the hashmap should only have one item in the Map. Where's the other coming from.

Here's how to get a TreeMap w/ Comparator into the mix... but need to check if this fixes the issue (I don't think so).

@@ -66,12 +70,13 @@ public final class RegionMetricsBuilder {
         .setStoreCount(regionLoadPB.getStores())
         .setStoreFileCount(regionLoadPB.getStorefiles())
         .setStoreFileSize(new Size(regionLoadPB.getStorefileSizeMB(), Size.Unit.MEGABYTE))
-        .setStoreSequenceIds(regionLoadPB.getStoreCompleteSequenceIdList().stream()
-          .collect(Collectors.toMap(
-            (ClusterStatusProtos.StoreSequenceId s) -> s.getFamilyName().toByteArray(),
-              ClusterStatusProtos.StoreSequenceId::getSequenceId)))
+        .setStoreSequenceIds(regionLoadPB.getStoreCompleteSequenceIdList().stream().collect(
+            Collectors.toMap(s -> s.getFamilyName().toByteArray(),
+                ClusterStatusProtos.StoreSequenceId::getSequenceId,
+                (k1, k2) -> k1, // Should never happen; only one completed sequenceid per Store
+                () -> new TreeMap<byte [], Long>(Bytes.BYTES_COMPARATOR))))
         .setUncompressedStoreFileSize(
-          new Size(regionLoadPB.getStoreUncompressedSizeMB(),Size.Unit.MEGABYTE))
+            new Size(regionLoadPB.getStoreUncompressedSizeMB(), Size.Unit.MEGABYTE))
         .build();
   }

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

Screen Shot 2018-08-20 at 10.29.54 PM.png
21/Aug/18 05:30
124 kB
Michael Stack

Issue Links

duplicates

HBASE-21080 Collectors.toMap uses a byte [] for key

Resolved

is related to

HBASE-20914 Trim Master memory usage

Resolved

Activity

People

Assignee:: Michael Stack

Reporter:: Michael Stack

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 21/Aug/18 05:21

Updated:: 21/Aug/18 06:33