Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-16871

Race condition for coordinator node init

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • SolrCloud
    • None

    Description

      From a unit test case that issue concurrent select queries to coordinator nodes, it’s found that there could be 3 race condition issues:

      1. If multiple concurrent requests find the synthetic collection is not yet created, they might all attempt to create the synthetic collection. This could trigger SolrException on `collection already exists`

      2. Similarly, if multiple concurrent requests find there’s no replica of the synthetic collection for current node (multiple coordinator node scenario), then CoordinatorHttpSolrCall#addReplica could be invoked multiple times. This should not trigger any exception, but would create multiple replicas for the same node in the synthetic collection

      3. The existing logic here assumes if syntheticColl.getReplicas(solrCall.cores.getZkController().getNodeName()) returns non empty result, then the following call in here should return a core. Unfortunately, the first call can return a non empty list but with a DOWN replica if another request is in the progress of creating such replica. In this case, the solrCall.getCoreByCollection(syntheticCollectionName, isPreferLeader) would call super.getCoreByCollection at here which would return a null (since super impl only returns active replica). So CoordinatorHttpSolrCall#getCoreByCollection would end up calling CoordinatorHttpSolrCall#getCore , introducing an infinite loop and cause stack overflow

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            patson Patson Luk
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - Not Specified
                Not Specified
                Remaining:
                Remaining Estimate - 0h
                0h
                Logged:
                Time Spent - 3h 50m
                3h 50m

                Slack

                  Issue deployment