Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-16086

Issues with TestReplicationHandler.doTestIndexFetchOnLeaderRestart

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 9.0
    • None
    • replication (java), Tests
    • None

    Description

      Ever since early December 2021 the doTestIndexFetchOnLeaderRestart test has been failing around 3% of the time. It looks like this was introduced by SOLR-15590. When drilling into why the test fails, it looks like the replication never happens in the follower (no logging whatsoever of the replication handler or the index fetcher). This indicates that there is something that is hanging in the first replication call request. The indexFetcher start the fetching thread at a random interval between 1 ms and 1000 ms. After the follower is started, the leader is restarted. It generally (from my observation) takes around 30 ms for this to happen. Meaning that 3% of the tests will have the first indexFetcher request sent while the leader is restarting, which is in line with the failure rate we are seeing.

      Mike Drob and I could not get the hanging indexFetcher request to replicate locally, so this is still conjecture, and we are unsure as to how SOLR-15590 would be affecting it.

      Side note: When looking at the history of the test, it looks like the original purpose of the test is no longer tested for as well. Originally the last part of the test was to make sure that there was only 1 successful index replication, that test has now been moved to before the leader is started up again. This no longer checks that a full replication happens after the leader starts. So we just need to add that check in at the back of the test. (This was changed in SOLR-13577)

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              houston Houston Putman
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m