Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-12684

Document speed gotchas and partitionKeys usage for ParallelStream

    XMLWordPrintableJSON

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 7.5, 8.0
    • Component/s: None
    • Labels:
      None

      Description

      The aim of this Jira is to beef up the ref guide around parallel stream

      There are two things I want to address:

       

      Firstly usage of partitionKeys :

      This line in the ref guide indicates that parallel stream keys should always be the same as the underlying sort criteria 

      The parallel function maintains the sort order of the tuples returned by the worker nodes, so the sort criteria of the parallel function must match up with the sort order of the tuples returned by the workers.
      

      But as discussed on SOLR-12635 , Joel provided an example

      The hash partitioner just needs to send documents to the same worker node. You could do that with just one partitioning key
      
      For example if you sort on year, month and day. You could partition on year only and still be fine as long as there was enough different years to spread the records around the worker nodes.

      So we should make this more clear in the ref guide.

      Let's also document that specifying more than 4 partitionKeys will throw an error after SOLR-12683

       

      At this point the user will understand how to use partitonKeys . It's related to the sort criteria but should not have all the sort fields 

       

      We should now mention a trick where the user could warn up the hash queries as they are always run on the whole document set ( irrespective of the filter criterias )

      also users should only use parallel when the docs matching post filter criterias is very large .  

      <listener event="newSearcher" class="solr.QuerySenderListener">
      <arr name="queries">
      
      <lst><str name="q">:</str><str name="fq">{!hash workers=6 worker=0}</str><str name="partitionKeys">myPartitionKey</str></lst>
      <lst><str name="q">:</str><str name="fq">{!hash workers=6 worker=1}</str><str name="partitionKeys">myPartitionKey</str></lst>
      <lst><str name="q">:</str><str name="fq">{!hash workers=6 worker=2}</str><str name="partitionKeys">myPartitionKey</str></lst>
      <lst><str name="q">:</str><str name="fq">{!hash workers=6 worker=3}</str><str name="partitionKeys">myPartitionKey</str></lst>
      <lst><str name="q">:</str><str name="fq">{!hash workers=6 worker=4}</str><str name="partitionKeys">myPartitionKey</str></lst>
      <lst><str name="q">:</str><str name="fq">{!hash workers=6 worker=5}</str><str name="partitionKeys">myPartitionKey</str></lst>
      </arr>
      </listener>

       

        Attachments

        1. SOLR-12684.patch
          4 kB
          Varun Thacker
        2. SOLR-12684.patch
          4 kB
          Varun Thacker
        3. SOLR-12684.patch
          4 kB
          Amrit Sarkar
        4. SOLR-12684.patch
          18 kB
          Amrit Sarkar

          Issue Links

            Activity

              People

              • Assignee:
                varun Varun Thacker
                Reporter:
                varun Varun Thacker
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: