Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-13061

Solr replica remaining down status when hitting the maxQueueSize as 20000 after Solr servers restarted

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 7.2, 7.3, 7.3.1, 7.4, 7.5
    • None
    • SolrCloud
    • Cluster info: 6 nodes, 30 Solr servers (5 Solr server per node)

      1000 collections, 10 shards per collection, 3 replica per shard

      Exception happened when restarting Solr cluster.

    Description

      1. Cluster info: 6 nodes, 30 Solr servers

      1000 collections, 10 shards per collection, 3 replica per shard.

      Exception happened when restarting Solr cluster.

       

      2. Exception happened when restarting Solr cluster. The question is NO exception hander is defined when this exception "java.lang.IllegalStateException: queue is full" is thrown when arriving at the threshold

      STATE_UPDATE_MAX_QUEUE 20000 defined in Overseer. And the core fails to preRegister and never come up again.

       

      3. Suggestions:

      a. Is this configuration STATE_UPDATE_MAX_QUEUE reasonable? Any plan or risk to enlarge this queue size as 20000 is too much small.

      b. Should this configuration STATE_UPDATE_MAX_QUEUE configurable by user? Currently it is hard code in Overseer.java: 

          public static final int STATE_UPDATE_MAX_QUEUE = 20000;

      c. IllegalStateException should be handled and retry logic should be added.

       

      4. Detailed error is given as below.

      2018-12-12 11:20:24,737 | ERROR | coreContainerWorkExecutor-2-thread-1-processing-n:8.5.165.7:21101_solr | Error waiting for SolrCore to be created | org.apache.solr.core.CoreContainer.lambda$load$1(CoreContainer.java:578)
      java.util.concurrent.ExecutionException: org.apache.solr.common.SolrException: Unable to create core [collection9_shard1_replica3]
      at java.util.concurrent.FutureTask.report(FutureTask.java:122)
      at java.util.concurrent.FutureTask.get(FutureTask.java:192)
      at org.apache.solr.core.CoreContainer.lambda$load$1(CoreContainer.java:574)
      at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
      at java.util.concurrent.FutureTask.run(FutureTask.java:266)
      at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
      at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
      at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
      at java.lang.Thread.run(Thread.java:748)
      Caused by: org.apache.solr.common.SolrException: Unable to create core [collection9_shard1_replica3]
      at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1087)
      at org.apache.solr.core.CoreContainer.lambda$load$0(CoreContainer.java:546)
      ... 5 more
      Caused by: java.lang.IllegalStateException: queue is full
      at org.apache.solr.cloud.ZkDistributedQueue.offer(ZkDistributedQueue.java:311)
      at org.apache.solr.cloud.ZkController.publish(ZkController.java:1346)
      at org.apache.solr.cloud.ZkController.publish(ZkController.java:1245)
      at org.apache.solr.cloud.ZkController.preRegister(ZkController.java:1634)
      at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1061)
      ... 6 more

      Attachments

        Activity

          People

            Unassigned Unassigned
            A Fat Horse Zhaohui Ma
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated: