[HDFS-15402] Requesting http jmx metrics leads to too much CLOSE-WAIT on datanode - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Open
Priority: Major
Resolution: Unresolved
Affects Version/s: 3.1.3
Fix Version/s: None
Component/s: metrics
Labels:
None

Description

We access http://127.0.0.1:50075/jmx to get datanode metrics periodically. But there is too much CLOSE-WAIT socket state that lead the normal webhdfs request failed.

$ ss -ant|grep 127.0.0.1:50075 |grep CLOSE-WAIT |head -10
CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:37296 
CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:26499 
CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:47470 
CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:42852 
CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:40281
$ ss -ant|grep 127.0.0.1:50075 |grep CLOSE-WAIT | wc -l 
6729
lsof -i:37296
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
java 101015 hdfs 3044u IPv4 271157177 0t0 TCP localhost:50075->localhost:37296 (CLOSE_WAIT)

The pid 101015 is the datanode's process id.

I use cdh6.1.1 and apache-hadoop-3.1.3 in my production, and both of them have the same issue. When the metric retriving script stop, the number of CLOSE-WAIT does not increase anymore.

The version apache-hadoop-2.9.2 does not have this issue with the same retriving metric script.

Attachments

Activity

People

Assignee:: Unassigned

Reporter:: Sean Chow

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 09/Jun/20 14:00

Updated:: 09/Dec/20 03:12