Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-9625

Support ordered index for first value of a multi-valued property, node name, and path

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.42.0
    • indexing, lucene, query
    • None

    Description

      Keyset pagination https://jackrabbit.apache.org/oak/docs/query/query-engine.html#Keyset_Pagination requires ordered indexing on a property.

      If all we have is a property "x", which is set on "nt:base" (or a similar node type), then an ordered index on the property "x" can be used for pagination. However, if the property is sometimes multi-valued, then it's not possible, because we don't support ordered indexes on multi-valued properties.

      /jcr:root//element(*, nt:base)
      [jcr:first(@alias) >= $lastEntry]
      order by jcr:first(@alias), @jcr:path
      
      /oak:index/aliasIndex
        - type = lucene
        - compatVersion = 2
        - async = async
        - includedPaths = [ "/" ]
        - queryPaths = [ "/" ]
        + indexRules
          + nt:base
            + properties
              + firstAlias
                - function = "first([alias])"
                - propertyIndex = true
                - ordered = true
      

      If we have a property that is set on a mixin type (or primary node type), then the index can be much smaller, as we only need to index that node type. However, even here we need a property to do pagination. One option is to order by the lower case version of the name. However, this is quite strange. Also, the node name may not be unique, which complicates things further. It would be good if we can define an ordered index on the path itself (which is unique).

      select [jcr:path], * from [nt:file]
      where path() >= $lastEntry
      and isdescendantnode(a, '/content')
      order by path()
      
      /oak:index/fileIndex
        - type = lucene
        - compatVersion = 2
        - async = async
        - includedPaths = [ "/content" ]
        - queryPaths = [ "/content" ]
        + indexRules
          + nt:file
            + properties
              + path
                - function = "path()"
                - propertyIndex = true
                - ordered = true
      

      It would be good if ordering by node name would use the function index. Test case:

      select [jcr:path], * from [nt:file] as a
      where name(a) >= $lastEntry
      and isdescendantnode(a, '/content')
      order by name(a), [jcr:path]
      
      /oak:index/fileIndex
        - type = lucene
        - compatVersion = 2
        - async = async
        - includedPaths = [ "/content" ]
        - queryPaths = [ "/content" ]
        + indexRules
          + nt:file
            + properties
              + nodeName
                - function = "name()"
                - propertyIndex = true
                - ordered = true
      

      Attachments

        Issue Links

          Activity

            People

              thomasm Thomas Mueller
              thomasm Thomas Mueller
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: