Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-17234

LBHttp2SolrClient does not skip "zombie" endpoints

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • main (10.0)
    • main (10.0)
    • SolrJ
    • None

    Description

      While working on SOLR-14763, I found different behavior with LBHttp2SolrClient between branch_9x and main/10.x.

      If the first Endpoint in the list had previously failed, branch_9x will skip the failed Endpoint with subsequent requests, and begin requesting with the second Endpoint. If all remaining Endpoints fail, it will then retry the first Endpoint again.

      If the first Endpoint in the list had previously failed, main/10.x will always try the first Endpoint despite it being in the "Zombie List".  When the first Endpoint fails again, it will re-try the second Endpoint.

      The branch_9x behavior seems more desirable as this minimizes unnecessary work by avoiding Endpoints that are known to fail. Indeed, main/10.x has an obvious bug in EndpointIterator#fetchNext where it attempts to get the wrong type of key for the map holding the Zombies.  I believe this difference is a regression bug in main/10x.

      The different behavior is recorded in test LBHttp2SolrClientTest#testAsyncWithFailures. This test was added after-the-fact with SOLR-14763. I needed to change its "asserts" when backporting to branch_9x to account for the changed behavior.

      Attachments

        Issue Links

          Activity

            People

              jdyer James Dyer
              jdyer James Dyer
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 40m
                  40m