Description
I looped TestKuduClient 1000 times in dist-test while working on another problem, and saw the following failures:
1 testReadYourWritesBatchLeaderReplica 14 testReadYourWritesSyncClosestReplica 15 testReadYourWritesSyncLeaderReplica
In all cases, the stack trace of the failure was effectively this:
java.util.concurrent.ExecutionException: java.lang.AssertionError at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.kudu.client.TestKuduClient.readYourWrites(TestKuduClient.java:1113) ... Caused by: java.lang.AssertionError at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.kudu.client.TestKuduClient$2.call(TestKuduClient.java:1098) at org.apache.kudu.client.TestKuduClient$2.call(TestKuduClient.java:1055) ...
The offending lines:
AsyncKuduScanner scanner = asyncClient.newScannerBuilder(table) .readMode(AsyncKuduScanner.ReadMode.READ_YOUR_WRITES) .replicaSelection(replicaSelection) .build(); KuduScanner syncScanner = new KuduScanner(scanner); long preTs = asyncClient.getLastPropagatedTimestamp(); assertNotEquals(AsyncKuduClient.NO_TIMESTAMP, asyncClient.getLastPropagatedTimestamp()); long row_count = countRowsInScan(syncScanner); long expected_count = 100L * (i + 1); assertTrue(expected_count <= row_count); // After the scan, verify that the chosen snapshot timestamp is // returned from the server and it is larger than the previous // propagated timestamp. assertNotEquals(AsyncKuduClient.NO_TIMESTAMP, scanner.getSnapshotTimestamp()); --> assertTrue(preTs < scanner.getSnapshotTimestamp());
It's possible that this is just test flakiness, but I'm setting a higher priority so we can understand whether that's the case, or whether there's something wrong with read-your-writes scans.