Uploaded image for project: 'Kudu'
  1. Kudu
  2. KUDU-2668

TestKuduClient.readYourWrites tests are flaky

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Critical
    • Resolution: Fixed
    • 1.9.0
    • 1.9.0
    • java, test
    • None

    Description

      I looped TestKuduClient 1000 times in dist-test while working on another problem, and saw the following failures:

      1 testReadYourWritesBatchLeaderReplica
      14 testReadYourWritesSyncClosestReplica
      15 testReadYourWritesSyncLeaderReplica
      

      In all cases, the stack trace of the failure was effectively this:

      java.util.concurrent.ExecutionException: java.lang.AssertionError
              at java.util.concurrent.FutureTask.report(FutureTask.java:122)
              at java.util.concurrent.FutureTask.get(FutureTask.java:192)
              at org.apache.kudu.client.TestKuduClient.readYourWrites(TestKuduClient.java:1113)
              ...
      Caused by: java.lang.AssertionError
              at org.junit.Assert.fail(Assert.java:86)
              at org.junit.Assert.assertTrue(Assert.java:41)
              at org.junit.Assert.assertTrue(Assert.java:52)
              at org.apache.kudu.client.TestKuduClient$2.call(TestKuduClient.java:1098)
              at org.apache.kudu.client.TestKuduClient$2.call(TestKuduClient.java:1055)
              ...
      

      The offending lines:

                    AsyncKuduScanner scanner = asyncClient.newScannerBuilder(table)
                            .readMode(AsyncKuduScanner.ReadMode.READ_YOUR_WRITES)
                            .replicaSelection(replicaSelection)
                            .build();
                    KuduScanner syncScanner = new KuduScanner(scanner);
                    long preTs = asyncClient.getLastPropagatedTimestamp();
                    assertNotEquals(AsyncKuduClient.NO_TIMESTAMP,
                        asyncClient.getLastPropagatedTimestamp());
      
                    long row_count = countRowsInScan(syncScanner);
                    long expected_count = 100L * (i + 1);
                    assertTrue(expected_count <= row_count);
      
                    // After the scan, verify that the chosen snapshot timestamp is
                    // returned from the server and it is larger than the previous
                    // propagated timestamp.
                    assertNotEquals(AsyncKuduClient.NO_TIMESTAMP, scanner.getSnapshotTimestamp());
      -->           assertTrue(preTs < scanner.getSnapshotTimestamp());
      

      It's possible that this is just test flakiness, but I'm setting a higher priority so we can understand whether that's the case, or whether there's something wrong with read-your-writes scans.

      Attachments

        Activity

          People

            hahao Hao Hao
            adar Adar Dembo
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: