Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-5075

SolrCloud commit process is too time consuming, even if documents are light

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Invalid
    • 4.1
    • None
    • SolrCloud 4.1, internal Zookeeper, 16 shards, custom java importer.
      Server: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz, 32 cores, 192gb RAM, 10tb SSD and 50tb SAS memory

    Description

      We are having a client with business model that requires indexing each month billion rows into solr from mysql in a small time-frame. The documents are very light, but the number is very high and we need to achieve speeds of around 80-100k/s. The built in solr indexer goes to 40-50k tops, but after some hours ( ~12 ) it crashes and the speed slows down as hours go by.

      Therefore we have developed a custom java importer that connects directly to mysql and solrcloud via zookeeper, grabs data from mysql, creates documents and then imports into solr. This helps because we are opening ~50 threads and the indexing process speeds up. We have optimized the mysql queries ( mysql was the initial bottleneck ) and the speeds we get now are over 100k/s, but as index number gets bigger, solr stays very long on adding documents. I assume it needs to be something from solrconfig that makes solr stay and even block after 100 mil documents indexed.

      Here is the java code that creates documents and then adds to solr server:

      public void createDocuments() throws SQLException, SolrServerException, IOException
      {
      App.logger.write("Creating documents..");
      this.docs = new ArrayList<SolrInputDocument>();
      App.logger.incrementNumberOfRows(this.size);
      while(this.results.next())

      { this.docs.add(this.getDocumentFromResultSet(this.results)); }

      this.statement.close();
      this.results.close();
      }

      public void commitDocuments() throws SolrServerException, IOException

      { App.logger.write("Committing.."); App.solrServer.add(this.docs); // here it stays very long and then blocks App.logger.incrementNumberOfRows(this.docs.size()); this.docs.clear(); }

      I am also pasting solrconfig.xml parameters that make sense to this discussion:
      <maxIndexingThreads>128</maxIndexingThreads>
      <useCompoundFile>false</useCompoundFile>
      <ramBufferSizeMB>10000</ramBufferSizeMB>
      <maxBufferedDocs>1000000</maxBufferedDocs>
      <mergePolicy class="org.apache.lucene.index.TieredMergePolicy">
      <int name="maxMergeAtOnce">20000</int>
      <int name="segmentsPerTier">1000000</int>
      <int name="maxMergeAtOnceExplicit">10000</int>
      </mergePolicy>
      <mergeFactor>100</mergeFactor>
      <termIndexInterval>1024</termIndexInterval>
      <autoCommit>
      <maxTime>15000</maxTime>
      <maxDocs>1000000</maxDocs>
      <openSearcher>false</openSearcher>
      </autoCommit>
      <autoSoftCommit>
      <maxTime>2000000</maxTime>
      </autoSoftCommit>

      Thanks a lot for any answers and excuse my long text, I'm new to this JIRA. If there's any other info needed please let me know.

      Attachments

        Activity

          People

            Unassigned Unassigned
            radu@wmds.ro Radu Ghita
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: