[AVRO-3751] FastReaderBuilder in multithread lead to infinite loop also blocking other threads - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Resolved
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 1.12.0
Component/s: java
Labels:
- pull-request-available

Description

In Java implementation we encountered a case where initiating an RecordReader lead to application blocking due to concurrent requests.

I analysed thread-dumps for our application and in the very initial timeline of the issue - there were in total 9 threads doing some operation on FastReaderBuilder.

Category-1 : 6 threads - Blocked on object lock as reap is a synchronised operation

java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.avro.util.WeakIdentityHashMap.reap(WeakIdentityHashMap.java:161)
 - waiting to lock <0x00007f4f4a239ab0> (a org.apache.avro.util.WeakIdentityHashMap)
at org.apache.avro.util.WeakIdentityHashMap.get(WeakIdentityHashMap.java:115)
at java.util.Map.computeIfAbsent(java.base@11.0.18/Map.java:1001)
at org.apache.avro.io.FastReaderBuilder.getRecordReaderFromCache(FastReaderBuilder.java:246)

Category-2: 2 threads - Runnable trying to insert entry into backingStore HashMap

java.lang.Thread.State: RUNNABLE
at java.util.HashMap$TreeNode.balanceInsertion(java.base@11.0.18/HashMap.java:2304)
at java.util.HashMap$TreeNode.treeify(java.base@11.0.18/HashMap.java:2010)
at java.util.HashMap$TreeNode.split(java.base@11.0.18/HashMap.java:2245)
at java.util.HashMap.resize(java.base@11.0.18/HashMap.java:710)
at java.util.HashMap.putVal(java.base@11.0.18/HashMap.java:659)
at java.util.HashMap.put(java.base@11.0.18/HashMap.java:608)
at org.apache.avro.util.WeakIdentityHashMap.put(WeakIdentityHashMap.java:122)
at java.util.Map.computeIfAbsent(java.base@11.0.18/Map.java:1004)
at org.apache.avro.io.FastReaderBuilder.getRecordReaderFromCache(FastReaderBuilder.java:246)

Category-3: 1 thread - Runnable trying to remove an entry from backingStore HashMap

java.lang.Thread.State: RUNNABLE
at java.util.HashMap$TreeNode.root(java.base@11.0.18/HashMap.java:1889)
at java.util.HashMap$TreeNode.getTreeNode(java.base@11.0.18/HashMap.java:1954)
at java.util.HashMap.removeNode(java.base@11.0.18/HashMap.java:820)
at java.util.HashMap.remove(java.base@11.0.18/HashMap.java:795)
at org.apache.avro.util.WeakIdentityHashMap.reap(WeakIdentityHashMap.java:165)
 - locked <0x00007f4f4a239ab0> (a org.apache.avro.util.WeakIdentityHashMap)
at org.apache.avro.util.WeakIdentityHashMap.get(WeakIdentityHashMap.java:115)
at java.util.Map.computeIfAbsent(java.base@11.0.18/Map.java:1001)
at org.apache.avro.io.FastReaderBuilder.getRecordReaderFromCache(FastReaderBuilder.java:246)

Over time more and more thread started to enter in the BLOCKED state, but the three RUNNABLE threads remained constant.

I think the problem is backingStore being a non concurrent HashMap (a relevant piece on it: https://stackoverflow.com/questions/35534906/java-hashmap-getobject-infinite-loop)

Found a similar old issue report in GenericDatumReader: https://issues.apache.org/jira/browse/AVRO-3531

Attachments

Issue Links

links to

GitHub Pull Request #2436

Activity

People

Assignee:: Christophe Le Saec

Reporter:: Bharat Jindal

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 04/May/23 09:10

Updated:: 17/Aug/23 19:23

Resolved:: 17/Aug/23 07:06

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

20m