Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-13291

Failed to create collection due to lock held by this virtual machine

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Open
    • Major
    • Resolution: Unresolved
    • 7.5, 7.7
    • None
    • SolrCloud
    • None

    Description

      We have a weird workload that at some times involves deletion and re-creation of collections with the same name in a short period of time (don't ask why).

       

      When running in a SolrCloud cluster this will occasionally leave a random core lying around and locked even though the Collection deletion was reported to have finished successfully.

       

      This results in an error the next time a collection of that given name should be created.

       

      The attached shell script is consistently able to reproduce the error states within a small number of iterations against the 7.7.1 binary distribution running the default cloud example (`solr start -e cloud`, accept all default values).

       

      Log entries that seemed relevant to me are:

      At the time when the collection is deleted:

      2019-03-04 16:56:44.037 WARN  (Thread-24) [c:myCollection s:shard2 r:core_node4 x:myCollection_shard2_replica_n2] o.a.s.c.ZkController listener throws error
      org.apache.solr.common.SolrException: Unable to reload core [myCollection_shard2_replica_n2]
              at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:1463) ~[solr-core-7.7.1.jar:7.7.1 5bf96d32f88eb8a2f5e775339885cd6ba84a3b58 - ishan - 2019-02-23 02:39:07]
              at org.apache.solr.core.SolrCore.lambda$getConfListener$20(SolrCore.java:3041) ~[solr-core-7.7.1.jar:7.7.1 5bf96d32f88eb8a2f5e775339885cd6ba84a3b58 - ishan - 2019-02-23 02:39:07]
              at org.apache.solr.cloud.ZkController.lambda$fireEventListeners$21(ZkController.java:2803) [solr-core-7.7.1.jar:7.7.1 5bf96d32f88eb8a2f5e775339885cd6ba84a3b58 - ishan - 2019-02-23 02:39:07]
              at java.lang.Thread.run(Thread.java:834) [?:?]
      Caused by: org.apache.solr.common.SolrException
              at org.apache.solr.core.SolrCore.<init>(SolrCore.java:1048) ~[solr-core-7.7.1.jar:7.7.1 5bf96d32f88eb8a2f5e775339885cd6ba84a3b58 - ishan - 2019-02-23 02:39:07]
              at org.apache.solr.core.SolrCore.reload(SolrCore.java:666) ~[solr-core-7.7.1.jar:7.7.1 5bf96d32f88eb8a2f5e775339885cd6ba84a3b58 - ishan - 2019-02-23 02:39:07]
              at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:1439) ~[solr-core-7.7.1.jar:7.7.1 5bf96d32f88eb8a2f5e775339885cd6ba84a3b58 - ishan - 2019-02-23 02:39:07]
              ... 3 more
      Caused by: java.lang.NullPointerException
              at org.apache.solr.metrics.SolrMetricManager.loadShardReporters(SolrMetricManager.java:1160) ~[solr-core-7.7.1.jar:7.7.1 5bf96d32f88eb8a2f5e775339885cd6ba84a3b58 - ishan - 2019-02-23 02:39:07]
              at org.apache.solr.metrics.SolrCoreMetricManager.loadReporters(SolrCoreMetricManager.java:92) ~[solr-core-7.7.1.jar:7.7.1 5bf96d32f88eb8a2f5e775339885cd6ba84a3b58 - ishan - 2019-02-23 02:39:07]
              at org.apache.solr.core.SolrCore.<init>(SolrCore.java:920) ~[solr-core-7.7.1.jar:7.7.1 5bf96d32f88eb8a2f5e775339885cd6ba84a3b58 - ishan - 2019-02-23 02:39:07]
              at org.apache.solr.core.SolrCore.reload(SolrCore.java:666) ~[solr-core-7.7.1.jar:7.7.1 5bf96d32f88eb8a2f5e775339885cd6ba84a3b58 - ishan - 2019-02-23 02:39:07]
              at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:1439) ~[solr-core-7.7.1.jar:7.7.1 5bf96d32f88eb8a2f5e775339885cd6ba84a3b58 - ishan - 2019-02-23 02:39:07]
      

       

      Later, when trying to re-create the collection:

       

      2019-03-04 16:56:51.982 ERROR (OverseerThreadFactory-9-thread-5-processing-n:127.0.1.1:8983_solr) [   ] o.a.s.c.a.c.OverseerCollectionMessageHandler Error from shard: http://127.0.1.1:8983/solr
      org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://127.0.1.1:8983/solr: Error CREATEing SolrCore 'myCollection_shard2_replica_n2': Unable to create core [myCollection_shard2_replica_n2
      ] Caused by: Lock held by this virtual machine: /home/joachim/workspaces/devtools/solr-7.7.1/example/cloud/node1/solr/myCollection_shard2_replica_n2/data/index/write.lock
              at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:643) ~[solr-solrj-7.7.1.jar:7.7.1 5bf96d32f88eb8a2f5e775339885cd6ba84a3b58 - ishan - 2019-02-23 02:39:09]
              at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255) ~[solr-solrj-7.7.1.jar:7.7.1 5bf96d32f88eb8a2f5e775339885cd6ba84a3b58 - ishan - 2019-02-23 02:39:09]
              at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244) ~[solr-solrj-7.7.1.jar:7.7.1 5bf96d32f88eb8a2f5e775339885cd6ba84a3b58 - ishan - 2019-02-23 02:39:09]
              at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1260) ~[solr-solrj-7.7.1.jar:7.7.1 5bf96d32f88eb8a2f5e775339885cd6ba84a3b58 - ishan - 2019-02-23 02:39:09]
              at org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:173) ~[solr-core-7.7.1.jar:7.7.1 5bf96d32f88eb8a2f5e775339885cd6ba84a3b58 - ishan - 2019-02-23 02:39:07]
              at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
              at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
              at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176) ~[metrics-core-3.2.6.jar:3.2.6]
              at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209) [solr-solrj-7.7.1.jar:7.7.1 5bf96d32f88eb8a2f5e775339885cd6ba84a3b58 - ishan - 2019-02-23 02:39:09]
              at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
              at java.lang.Thread.run(Thread.java:834) [?:?]
      

       

       

      Attachments

        1. tortureSolr.sh
          1 kB
          Joachim Sauer

        Issue Links

          Activity

            People

              Unassigned Unassigned
              jsauer Joachim Sauer
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated: