Uploaded image for project: 'Hadoop HDFS'
  1. Hadoop HDFS
  2. HDFS-17278

Detect order dependent flakiness in TestViewfsWithNfs3.java under hadoop-hdfs-nfs module

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Minor
    • Resolution: Fixed
    • 3.4.0
    • 3.4.0
    • nfs, test
    • openjdk version "17.0.9"
      Apache Maven 3.9.5

    • Reviewed

    Description

      The order dependent flakiness was detected if the test class TestDFSClientCache.java runs before TestRpcProgramNfs3.java.

      The error message looks like below:

      [ERROR] Failures: 
      [ERROR]   TestRpcProgramNfs3.testAccess:279 Incorrect return code expected:<0> but was:<13>
      [ERROR]   TestRpcProgramNfs3.testCommit:764 Incorrect return code: expected:<13> but was:<5>
      [ERROR]   TestRpcProgramNfs3.testCreate:493 Incorrect return code: expected:<13> but was:<5>
      [ERROR]   TestRpcProgramNfs3.testEncryptedReadWrite:359->createFileUsingNfs:393 Incorrect response:  expected:<null> but was:<org.apache.hadoop.nfs.nfs3.response.WRITE3Response@42752a9b>
      [ERROR]   TestRpcProgramNfs3.testFsinfo:714 Incorrect return code: expected:<13> but was:<5>
      [ERROR]   TestRpcProgramNfs3.testFsstat:696 Incorrect return code: expected:<0> but was:<13>
      [ERROR]   TestRpcProgramNfs3.testGetattr:205 Incorrect return code expected:<0> but was:<13>
      [ERROR]   TestRpcProgramNfs3.testLookup:249 Incorrect return code expected:<13> but was:<5>
      [ERROR]   TestRpcProgramNfs3.testMkdir:517 Incorrect return code: expected:<13> but was:<5>
      [ERROR]   TestRpcProgramNfs3.testPathconf:738 Incorrect return code: expected:<13> but was:<5>
      [ERROR]   TestRpcProgramNfs3.testRead:341 Incorrect return code: expected:<0> but was:<13>
      [ERROR]   TestRpcProgramNfs3.testReaddir:642 Incorrect return code: expected:<13> but was:<5>
      [ERROR]   TestRpcProgramNfs3.testReaddirplus:666 Incorrect return code: expected:<13> but was:<5>
      [ERROR]   TestRpcProgramNfs3.testReadlink:297 Incorrect return code: expected:<0> but was:<5>
      [ERROR]   TestRpcProgramNfs3.testRemove:570 Incorrect return code: expected:<13> but was:<5>
      [ERROR]   TestRpcProgramNfs3.testRename:618 Incorrect return code: expected:<13> but was:<5>
      [ERROR]   TestRpcProgramNfs3.testRmdir:594 Incorrect return code: expected:<13> but was:<5>
      [ERROR]   TestRpcProgramNfs3.testSetattr:225 Incorrect return code expected:<13> but was:<5>
      [ERROR]   TestRpcProgramNfs3.testSymlink:546 Incorrect return code: expected:<13> but was:<5>
      [ERROR]   TestRpcProgramNfs3.testWrite:468 Incorrect return code: expected:<13> but was:<5>
      [INFO] 
      [ERROR] Tests run: 25, Failures: 20, Errors: 0, Skipped: 0
      [INFO] 
      [ERROR] There are test failures. 

      The polluter that led to this flakiness was the test method
      testGetUserGroupInformationSecure() in TestDFSClientCache.java. There was a line 

      UserGroupInformation.setLoginUser(currentUserUgi);

      which modifies some shared state and resource, something like pre-setup the config. To fix this issue, I added the cleanup methods in TestDFSClientCache.java to reset the UserGroupInformation to ensure the isolation among each test class.

      @AfterClass
      public static void cleanup() {
          UserGroupInformation.reset();
      }

      Including setting

      authenticationMethod = null;
      conf = null; // set configuration to null
      setLoginUser(null); // reset login user to default null

      ..., and so on. The reset() methods can be referred to hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java.

      After the fix, the error was no longer exist and the succeed message was:

      [INFO] -------------------------------------------------------
      [INFO]  T E S T S
      [INFO] -------------------------------------------------------
      [INFO] Running org.apache.hadoop.hdfs.nfs.nfs3.CustomTest
      [INFO] Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 18.457 s - in org.apache.hadoop.hdfs.nfs.nfs3.CustomTest
      [INFO] 
      [INFO] Results:
      [INFO] 
      [INFO] Tests run: 25, Failures: 0, Errors: 0, Skipped: 0
      [INFO] 
      [INFO] ------------------------------------------------------------------------
      [INFO] BUILD SUCCESS
      [INFO] ------------------------------------------------------------------------ 

      Here is the CustomTest.java file that I used to run these two tests in order, the error can be reproduce by running this CustomTest.java. 

      package org.apache.hadoop.hdfs.nfs.nfs3;
      
      import org.junit.runner.RunWith;import org.junit.runners.Suite;
      
      
      
      @RunWith(Suite.class)
      @Suite.SuiteClasses({
          TestDFSClientCache.class,
          TestRpcProgramNfs3.class
      })
      public class CustomTest {} 

      Attachments

        1. failed-1.png
          363 kB
          Ruby
        2. failed-2.png
          275 kB
          Ruby
        3. success.png
          156 kB
          Ruby

        Issue Links

          Activity

            People

              yijujt2 Ruby
              yijujt2 Ruby
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: