Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-16560

Pull replication appears to consume excessive CPU as it does not use Segment Reader pooling

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Critical
    • Resolution: Unresolved
    • 8.8, main (10.0)
    • None
    • SolrCloud
    • None

    Description

      While we are experimenting with adding PULL replica to our solr cluster, it's found from profiling that IndexFetcher seems to impose much more CPU overhead than anticipated , and most of those CPU time are spent in `SegmentReader.init` which is called by `SolrCore.openNewSearch` from the `IndexFetcher`.

      With some debugging, it's found that for every replication on an updated collection, a new `SegmentReader` is created for every segment for such collection (not only the ones that are pulled down), compared to `SolrCore.openNewSearch` triggered from regular commit, which ONLY creates `SegmentReader` for new segments, which old `SegmentReader`s are obtained from `ReaderPool`.

      Unfortunately, such pool does not work for `IndexFetcher` as it opens a new `IndexWriter` on every run at here, which creates a new `ReaderPool`.

      I am not familiar enough to tell whether such new `IndexWriter` is always needed, but opening `SegmentReader` for every segment of a collection seems excessive for pull replication.

      Any thoughts on this please? Many thanks!!

      Attachments

        Activity

          People

            Unassigned Unassigned
            patson Patson Luk
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: