Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-15402

Requesting http jmx metrics leads to too much CLOSE-WAIT on datanode

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 3.1.3
    • None
    • metrics
    • None

    Description

      We access  http://127.0.0.1:50075/jmx  to get datanode metrics periodically. But there is too much CLOSE-WAIT socket state that lead the normal webhdfs request failed.

       

      $ ss -ant|grep 127.0.0.1:50075 |grep CLOSE-WAIT |head -10
      CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:37296 
      CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:26499 
      CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:47470 
      CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:42852 
      CLOSE-WAIT 122 0 127.0.0.1:50075 127.0.0.1:40281
      $ ss -ant|grep 127.0.0.1:50075 |grep CLOSE-WAIT | wc -l 
      6729
      lsof -i:37296
      COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
      java 101015 hdfs 3044u IPv4 271157177 0t0 TCP localhost:50075->localhost:37296 (CLOSE_WAIT)
      

       

      The pid 101015 is the datanode's process id.

      I use cdh6.1.1 and apache-hadoop-3.1.3 in my production, and both of them have the same issue. When the metric retriving script stop, the number of CLOSE-WAIT does not increase anymore.

       The version apache-hadoop-2.9.2 does not have this issue with the same retriving metric script.

       

      Attachments

        Activity

          People

            Unassigned Unassigned
            seanlook Sean Chow
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated: