Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-11638

CloudSolrClientTest periodic failures

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • 8.0
    • 7.2, 8.0
    • SolrJ, Tests
    • None

    Description

      The test-randomization recently-added as a part of SOLR-11507 has caused CloudSolrClientTest.testRetryUpdatesWhenClusterStateIsStale to fail semi-regularly on master. The test only succeeds for me on 3 out of 10 test runs. The test fails with the message:

         [junit4]   2> 14848 ERROR (TEST-CloudSolrClientTest.testRetryUpdatesWhenClusterStateIsStale-seed#[64E89FBB977E15AA]) [    ] o.a.s.c.s.i.CloudSolrClient Request to collection [stale_state_test_col] failed due to (404) org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://127.0.0.1:38925/solr/stale_state_test_col_shard1_replica_n1: Expected mime type application/octet-stream but got text/html. <html>
         [junit4]   2> <head>
         [junit4]   2> <meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
         [junit4]   2> <title>Error 404 </title>
         [junit4]   2> </head>
         [junit4]   2> <body>
         [junit4]   2> <h2>HTTP ERROR: 404</h2>
         [junit4]   2> <p>Problem accessing /solr/stale_state_test_col_shard1_replica_n1/update. Reason:
         [junit4]   2> <pre>    Can not find: /solr/stale_state_test_col_shard1_replica_n1/update</pre></p>
         [junit4]   2> <hr /><a href="http://eclipse.org/jetty">Powered by Jetty:// 9.3.20.v20170531</a><hr/>
         [junit4]   2> </body>
         [junit4]   2> </html>
         [junit4]   2> , retry? 0
         [junit4]   2> 14851 INFO  (TEST-CloudSolrClientTest.testRetryUpdatesWhenClusterStateIsStale-seed#[64E89FBB977E15AA]) [    ] o.a.s.SolrTestCaseJ4 ###Ending testRetryUpdatesWhenClusterStateIsStale
         [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=CloudSolrClientTest -Dtests.method=testRetryUpdatesWhenClusterStateIsStale -Dtests.seed=64E89FBB977E15AA -Dtests.slow=true -Dtests.locale=es-VE -Dtests.timezone=Indian/Chagos -Dtests.asserts=true -Dtests.file.encoding=US-ASCII
         [junit4] ERROR   5.86s | CloudSolrClientTest.testRetryUpdatesWhenClusterStateIsStale <<<
         [junit4]    > Throwable #1: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://127.0.0.1:38925/solr/stale_state_test_col_shard1_replica_n1: Expected mime type application/octet-stream but got text/html. <html>
         [junit4]    > <head>
         [junit4]    > <meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1"/>
         [junit4]    > <title>Error 404 </title>
         [junit4]    > </head>
         [junit4]    > <body>
         [junit4]    > <h2>HTTP ERROR: 404</h2>
         [junit4]    > <p>Problem accessing /solr/stale_state_test_col_shard1_replica_n1/update. Reason:
         [junit4]    > <pre>    Can not find: /solr/stale_state_test_col_shard1_replica_n1/update</pre></p>
         [junit4]    > <hr /><a href="http://eclipse.org/jetty">Powered by Jetty:// 9.3.20.v20170531</a><hr/>
         [junit4]    > </body>
         [junit4]    > </html>
         [junit4]    > 	at __randomizedtesting.SeedInfo.seed([64E89FBB977E15AA:D0D9075374976386]:0)
         [junit4]    > 	at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:607)
         [junit4]    > 	at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
         [junit4]    > 	at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
         [junit4]    > 	at org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:483)
         [junit4]    > 	at org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:413)
         [junit4]    > 	at org.apache.solr.client.solrj.impl.CloudSolrClient.directUpdate(CloudSolrClient.java:559)
         [junit4]    > 	at org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1016)
         [junit4]    > 	at org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:883)
         [junit4]    > 	at org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:816)
         [junit4]    > 	at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
         [junit4]    > 	at org.apache.solr.client.solrj.request.UpdateRequest.commit(UpdateRequest.java:233)
         [junit4]    > 	at org.apache.solr.client.solrj.impl.CloudSolrClientTest.testRetryUpdatesWhenClusterStateIsStale(CloudSolrClientTest.java:844)
         [junit4]    > 	at java.lang.Thread.run(Thread.java:748)
      

      After some digging, it looks like the issue is that testRetryUpdatesWhenClusterStateIsStale implicitly relies on directUpdatesToLeadersOnly, and parallelUpdates, which are now randomized when using SolrTestCaseJ4's CloudSolrClient creation helpers.

      Attached is a patch ensuring that testRetryUpdatesWhenClusterStateIsStale insists on those two update-related properties, instead of taking the randomized defaults. Without the patch, this test passes maybe 5 out of twenty times. With the patch, it passes consistently (20 out of 20 runs).

      Attachments

        1. SOLR-11638.patch
          1 kB
          Jason Gerlowski

        Activity

          People

            shalin Shalin Shekhar Mangar
            gerlowskija Jason Gerlowski
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: