[HBASE-26850] Optimize the implementation of LRUCache in LRUDictionary - ASF JIRA

Log work

Agile Board

Rank to Top

Rank to Bottom

Attach files

Attach Screenshot

Bulk Copy Attachments

Bulk Move Attachments

Add vote

Voters

Watch issue

Watchers

Create sub-task

Convert to sub-task

Move

Link

Clone

Labels

Update Comment Author

Replace String in Comment

Update Comment Visibility

Delete Comments

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Reopened
Priority: Major
Resolution: Unresolved
Affects Version/s: 1.7.1, 3.0.0-alpha-2, 2.4.11
Fix Version/s: None
Component/s: wal
Labels:
None

Description

This issue is to unify the behavior of reading and writing links.

During my research on ~~HBASE-26849~~, I found that there seems to be something wrong with the implementation of LRUDictionary.

It uses array to save data, and uses hashMap to save array index.

For the write link:

CompressedKvEncoder#write
->
LRUDictionary#findEntry
->
LRUDictionary#addEntry

And for the read link:

CompressedKVDecoder#readIntoArray
->
LRUDirectory#addEntry

Then we could see the logic in findEntry:

  @Override
  public short findEntry(byte[] data, int offset, int length) {
    short ret = backingStore.findIdx(data, offset, length);
    if (ret == NOT_IN_DICTIONARY) {
      addEntry(data, offset, length);
    }
    return ret;
  }

    private short findIdx(byte[] array, int offset, int length) {
      Short s;
      final Node comparisonNode = new Node();
      comparisonNode.setContents(array, offset, length);
      if ((s = nodeToIndex.get(comparisonNode)) != null) {
        moveToHead(indexToNode[s]);
        return s;
      } else {
        return -1;
      }
    }

At first it will try to find if there is same node in the cache, If there is, use it directly.

The problem is, if we have some same nodes, The behavior on the read and write links are inconsistent:
On the write link we will reuse just one node, but on the read link multiple copies will be stored in the array.
but there is only one mapping from the node to the array index in the map.
So, on the read link, if the first 'same' node is evict, we will remove it from the map, then we might met a NPE because when the second 'same' node evict we can not find it in the map.