Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-1966

Add Hint for selecting more performant index in MongoDocumentStore#query

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.0.3, 1.1.0
    • mongomk
    • None

    Description

      In MongoDocumentStore#query we make a call like

      db.nodes.find({ _id: { $gt: "3:/content/foo/01/", $lt: "3:/content/foo010" }, _modified: { $gte: 1405085300 } }).sort({_id:1})

      Further we have indexes

      • {_id : 1}
      • {_modified : -1}

      Depending on scenario one of the two index would perform better

      • If very few changes have happened in the time interval then _modified would perform better
      • In other cases _id index would perform better

      Ideally this should be decided by Mongo Query planner but it seems that at times it is not making the right choice. For example getting plan for the query with same shape yields following results

      A - planSummary: COLLSCAN ntoreturn:0 ntoskip:0 nscanned:19046849 nscannedObjects:19046849 scanAndOrder:1 keyUpdates:0 numYields:41359 locks(micros) r:103894585 nreturned:2 reslen:1173 61334ms
      B - 2014-07-10T15:59:19.994-0400 [conn1365] query aem-author.nodes query: {$query: { _id:{ $gt: "3:/content/foo/01/", $lt: "3:/content/foo010" }, _modified:{ $gte: 1404245090 }}, $orderby:{ _id: 1 }} planSummary: IXSCAN{ _modified: -1 } ntoreturn:0 ntoskip:0 nscanned:15626664 nscannedObjects:15626664 scanAndOrder:1 keyUpdates:0 numYields:324016 locks(micros) r:453960384 nreturned:1 reslen:582 972125ms
      C - 2014-07-11T15:22:44.579-0400 [conn1387] query aem-author.nodes query: { $query: { _id: { $gt: "4:/oak:index/uuid/:index/", $lt: "4:/oak:index/uuid/:index0" }, _modified: { $gte: 1405106530 } }, $orderby: { _id: 1 }, $hint: { _id: 1 } } planSummary: IXSCAN { _id: 1 } ntoreturn:0 ntoskip:0 nscanned:701631 nscannedObjects:701631 keyUpdates:0 numYields:42 locks(micros) r:4471112 nreturned:17 reslen:6557 2540ms 
      

      So Mongo used a BasicCursor, _id index, _modified index in different runs. Now lets see whats the difference between time of query and _modified and nscanned

      • B - 15626664, 8 days, _modified - Should have used _id index as duration is too large
      • C - 701631, 34 sec, _id - Might have used modified index as duration to check for is less

      Mongo 2.6 uses heuristics to determine which plan to use. As mentioned in SERVER-13866

      Some background: to choose a query plan to use for a given query when multiple candidate plans exist, the query engine runs each candidate plan and then picks the plan that produced the most results during a trial period on a subset of the data to be scanned. The winning query plan is then cached, and used for subsequent queries of the same shape until the cache entry is invalidated (which happens under certain conditions, such as when the data distribution in the collection changes sufficiently or when the chosen query plan performs consistently much worse than it did during initial selection).

      So at times Mongo might make a right guess at times not! So we need to determine ways such that right index is used by Mongo to execute a given query

      Attachments

        1. out.csv
          41 kB
          Chetan Mehrotra

        Issue Links

          Activity

            People

              chetanm Chetan Mehrotra
              chetanm Chetan Mehrotra
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: