Uploaded image for project: 'Hadoop YARN'
  1. Hadoop YARN
  2. YARN-10334

TestDistributedShell leaks resources on timeout/failure

    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      TestDistributedShell times out on trunk. I found that the application, and containers will stay running in the background long after the unit test has failed.
      This causes failure of other test cases and several false positives failures as result of:

      • Ports will stay busy, so other tests cases fail to launch.
      • Unit tests fail because of memory restrictions.

      Although the unit test is already broken on trunk, we do not want its failures to other unit tests.
      TestDistributedShell needs to be revisited to make sure that all YarnClients, and YarnApplications are closed properly at the end of the each unit test (including exception and timeouts)

      Steps to reproduce:

      mvn test -Dtest=TestDistributedShell#testDSShellWithOpportunisticContainers
      
      ## this will timeout as
      [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 90.234 s <<< FAILURE! - in org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell
      [ERROR] testDSShellWithOpportunisticContainers(org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell)  Time elapsed: 90.018 s  <<< ERROR!
      org.junit.runners.model.TestTimedOutException: test timed out after 90000 milliseconds
              at java.lang.Thread.sleep(Native Method)
              at org.apache.hadoop.yarn.applications.distributedshell.Client.monitorApplication(Client.java:1117)
              at org.apache.hadoop.yarn.applications.distributedshell.Client.run(Client.java:1089)
              at org.apache.hadoop.yarn.applications.distributedshell.TestDistributedShell.testDSShellWithOpportunisticContainers(TestDistributedShell.java:1438)
              at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
              at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:498)
              at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
              at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
              at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
              at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
              at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
              at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
              at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:298)
              at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
              at java.util.concurrent.FutureTask.run(FutureTask.java:266)
              at java.lang.Thread.run(Thread.java:748)
      
      [INFO] 
      [INFO] Results:
      [INFO] 
      [ERROR] Errors: 
      [ERROR]   TestDistributedShell.testDSShellWithOpportunisticContainers:1438 ยป TestTimedOut
      [INFO] 
      [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0
      

      Using ps command, you can find the yarn processes are still in the background

      /bin/bash -c $JRE_HOME/bin/java -Xmx512m org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster --container_type OPPORTUNISTIC --container_memory 128 --container_vcores 1 --num_containers 2 --priority 0 --appname DistributedShell --homedir file:/Users/ahussein 1>$WORK_DIR8/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/TestDistributedShell/TestDistributedShell-logDir-nm-0_0/application_1593554710896_0001/container_1593554710896_0001_01_000001/AppMaster.stdout 2>$WORK_DIR8/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/target/TestDistributedShell/TestDistributedShell-logDir-nm-0_0/application_1593554710896_0001/container_1593554710896_0001_01_000001/AppMaster.stderr
      
      
      $JRE_HOME/bin/java -Xmx512m org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster --container_type OPPORTUNISTIC --container_memory 128 --container_vcores 1 --num_containers 2 --priority 0 --appname DistributedShell --homedir file:/Users/ahussein
      

      Attachments

        Issue Links

          Activity

            People

              ahussein Ahmed Hussein
              ahussein Ahmed Hussein
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 1h 10m
                  1h 10m