Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-18800

Bad ipc.client.connection.idle-scan-interval.ms cause resource leaks

    XMLWordPrintableJSON

Details

    Description

      When setting ipc.client.connection.idle-scan-interval.ms to a bad value (e.g. a negative value), Hadoop Server fails to schedule the idle connection scan task and causes resource leaks.

      Buggy code:

      private void scheduleIdleScanTask() {
        ...
        TimerTask idleScanTask = new TimerTask(){
          @Override
          public void run() {
            ...
            try {
              closeIdle(false);
            } finally {
              // explicitly reschedule so next execution occurs relative
              // to the end of this scan, not the beginning
              scheduleIdleScanTask();
            }
          }
        };
        idleScanTimer.schedule(idleScanTask, idleScanInterval);   // <--- idleScanInterval is a negative value
      }
      

      In schedule, the task will not be scheduled if the delay is negative, which causes resource leaks due to unscheduled idleScanTask.

      public void schedule(TimerTask task, long delay) {
          if (delay < 0)
              throw new IllegalArgumentException("Negative delay.");
          sched(task, System.currentTimeMillis()+delay, 0);        // <-- the task will not be scheduled when delay is negative
      }
      

      How to reproduce:

      We can use the test org.apache.hadoop.ipc.TestIPC#testSocketLeak to check the resource leaks.
      (1) Set ipc.client.connection.idle-scan-interval.ms to -1;
      (2) Run test org.apache.hadoop.ipc.TestIPC#testSocketLeak
      (3) You will see the following message (note that the number of leaked descriptors can vary from run to run):

      java.lang.AssertionError: Leaked 142 file descriptors
              at org.junit.Assert.fail(Assert.java:89)
              at org.junit.Assert.assertTrue(Assert.java:42)
              at org.apache.hadoop.ipc.TestIPC.testSocketLeak(TestIPC.java:1155)
              at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
              at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.base/java.lang.reflect.Method.invoke(Method.java:566)
              at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
              at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
              at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
              at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
              at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
              at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
              at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
              at java.base/java.lang.Thread.run(Thread.java:829)
      

      You can use the reproduce.sh in the attachment to easily reproduce the bug:

      We are happy to provide a patch if this issue is confirmed. 

      Attachments

        1. reproduce.sh
          0.7 kB
          ConfX

        Issue Links

          Activity

            People

              Unassigned Unassigned
              FuzzingTeam ConfX
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: