Uploaded image for project: 'HBase'
  1. HBase
  2. HBASE-27874

Problem in flakey generated report causes pre-commit run to fail

VotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Reviewed

    Description

      Have noticed the UT pre-commit run failed on this latest PR for branch-2 with the below:

      Thu May 18 10:37:32 AM UTC 2023
      cd /home/jenkins/jenkins-home/workspace/Base-PreCommit-GitHub-PR_PR-5241/yetus-jdk8-hadoop2-check/src/hbase-server
      /opt/maven/bin/mvn --batch-mode -Dmaven.repo.local=/home/jenkins/jenkins-home/workspace/Base-PreCommit-GitHub-PR_PR-5241/yetus-m2/hbase-branch-2-patch-1 --threads=4 -Djava.io.tmpdir=/home/jenkins/jenkins-home/workspace/Base-PreCommit-GitHub-PR_PR-5241/yetus-jdk8-hadoop2-check/src/target -DHBasePatchProcess -PrunAllTests -Dtest.exclude.pattern=**/regionserver.TestMetricsRegionServer.java,**/master.procedure.TestSnapshotProcedureRSCrashes.java,**/security.access.TestAccessController.java,**/conf.TestConfigurationManagerWARNING: package jdk.internal.util.random not in java.base.java,**/io.hfile.bucket.TestPrefetchPersistence.java,**/client.TestFromClientSide3.java,**/replication.TestReplicationMetricsforUI.java,**/io.hfile.bucket.TestBucketCache.java,**/replication.regionserver.TestReplicationValueCompressedWAL.java,**/master.procedure.TestHBCKSCP.java,**/http.TestInfoServersACL.java,**/io.hfile.bucket.TestBucketCachePersister.java,**/replication.TestReplicationKillSlaveRS.java,**/regionserver.TestClearRegionBlockCache.java,**/master.TestUnknownServers.java,**/replication.TestReplicationKillSlaveRSWithSeparateOldWALs.java,**/quotas.TestClusterScopeQuotaThrottle.java,**/io.hfile.TestBlockEvictionOnRegionMovement.java,**/replication.regionserver.TestMetaRegionReplicaReplicationEndpoint.java,**/regionserver.TestRegionReplicas.java,**/coprocessor.TestCoprocessorEndpointTracing.java,**/master.region.TestMasterRegionCompaction.java,**/io.hfile.TestPrefetchRSClose.java -Dsurefire.firstPartForkCount=0.5C -Dsurefire.secondPartForkCount=0.5C clean test -fae
      ....
      ------------------------------------------------------------------------
      [INFO] BUILD FAILURE
      [INFO] ------------------------------------------------------------------------
      [INFO] Total time:  0.861 s (Wall Clock)
      [INFO] Finished at: 2023-05-18T10:37:34Z
      [INFO] ------------------------------------------------------------------------
      [ERROR] Unknown lifecycle phase "jdk.internal.util.random". You must specify a valid lifecycle phase or a goal in the format <plugin-prefix>:<goal> or <plugin-group-id>:<plugin-artifact-id>[:<plugin-version>]:<goal>. Available lifecycle phases are: validate, initialize, generate-sources, process-sources, generate-resources, process-resources, compile, process-classes, generate-test-sources, process-test-sources, generate-test-resources, process-test-resources, test-compile, process-test-classes, test, prepare-package, package, pre-integration-test, integration-test, post-integration-test, verify, install, deploy, pre-clean, clean, post-clean, pre-site, site, post-site, site-deploy. -> [Help 1]
      [ERROR] 
      

      Note the "**/conf.TestConfigurationManagerWARNING: package jdk.internal.util.random not in java.base.java" passed as one of the supposedly flakey tests. Looking around our build scripts, I figured we pull the list of flakey from the "excludes" artifact generated by the latest "find flakey" build. It seems the latest branch-2 run generated this artifact with the wrong name already:

      **/replication.TestReplicationMetricsforUI.java,**/conf.TestConfigurationManagerWARNING: package jdk.internal.util.random not in java.base.java,**/master.region.TestMasterRegionCompaction.java,**/regionserver.TestRegionReplicas.java,**/replication.regionserver.TestReplicationValueCompressedWAL.java,**/coprocessor.TestCoprocessorEndpointTracing.java,**/quotas.TestClusterScopeQuotaThrottle.java,**/replication.TestReplicationKillSlaveRSWithSeparateOldWALs.java,**/client.TestFromClientSide3.java,**/io.hfile.TestBlockEvictionOnRegionMovement.java,**/io.hfile.bucket.TestPrefetchPersistence.java,**/regionserver.TestMetricsRegionServer.java,**/io.hfile.bucket.TestBucketCachePersister.java,**/regionserver.TestClearRegionBlockCache.java,**/master.procedure.TestHBCKSCP.java,**/replication.regionserver.TestMetaRegionReplicaReplicationEndpoint.java,**/security.access.TestAccessController.java,**/io.hfile.bucket.TestBucketCache.java,**/io.hfile.TestPrefetchRSClose.java,**/replication.TestReplicationKillSlaveRS.java,**/master.TestUnknownServers.java,**/http.TestInfoServersACL.java
      

      Digging deeper, found that the "find flakey" build checks the UT output of latest nightly and flakey builds, to parse it and generate the report. In some of the builds, such as this one, results can be malformed, merging test names and WARNING messages in same line:

      [INFO] Running org.apache.hadoop.hbase.conf.TestConfigurationManagerWARNING: package jdk.internal.util.random not in java.base
      
      [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.145 s - in org.apache.hadoop.hbase.conf.TestConfigurationManager
      

      Thought about modifying the python script that generates the flakey report to also consider this malformed pattern when parsing test names.

      Attachments

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            wchevreuil Wellington Chevreuil
            wchevreuil Wellington Chevreuil
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment