Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-9596

Full text search using Lucene index for binary content

    XMLWordPrintableJSON

Details

    • Task
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • indexing, lucene
    • None

    Description

      I am trying out jackrabbit oak with lucene in a file node store. The index definition record is created successfully but it seems the index record is not created. lucene index creation code snippets:

       

      public void initRepository() {
       LuceneIndexProvider provider = new LuceneIndexProvider();
       Jcr jcr = new Jcr(nodeStore)
       .withAsyncIndexing("async",3)
       .with(new LuceneIndexEditorProvider())
       .with((QueryIndexProvider) provider)
       .with((Observer) provider)
       .withAsyncIndexing("async",3);
       repository = jcr.createRepository();
       log.info("Repository initialized");
       }
      public void createLuceneIndex() throws RepositoryException {
       Session session = createAdminSession();
       Node lucene = JcrUtils.getOrCreateByPath("/oak:index/lucene", "oak:Unstructured",
       "oak:QueryIndexDefinition", session, false);
       lucene.setProperty("compatVersion", 2);
       lucene.setProperty("type", "lucene");
       lucene.setProperty("async", "async");
       Node rules = lucene.addNode("indexRules", "nt:unstructured");
       Node allProps = rules.addNode("nt:base")
       .addNode("properties", "nt:unstructured")
       .addNode("allProps", "oak:Unstructured");
       allProps.setProperty(Property.JCR_DATA, ".*");
       allProps.setProperty("isRegexp", true);
       allProps.setProperty("nodeScopeIndex", true);
       session.save();
       session.logout();
       log.info("Lucene index created");
       }
      

       

      After creating Lucene index, I have uploaded test.doc file in node store using below code:

       

      log.info("Setting the JCR content for file name: test.doc, under path: " + folderNode.getPath());
      final Binary binary = new BinaryImpl(fileBytes);
      final Node content = folderNode.addNode(Property.JCR_CONTENT, NodeType.NT_RESOURCE);
      content.setProperty(Property.JCR_DATA, binary);
      //JCR session save code here
      

       

      test.doc file contents:

      HelloWorld, Test file contents for Full text search using Lucene index.

       

      I have used below query to fetch result:

       

      final Query query = queryManager.createQuery(
               "select * from [nt:base] where contains(*, 'HelloWorld')",
               Query.JCR_SQL2);
      final QueryResult result = query.execute();
      final NodeIterator nodeIter = result.getNodes();
      log.info("Number of nodes: " + nodeIter.getSize());

      But this query is not returning node where file contents are stored. I am getting below result:

      Number of nodes: 0

      Logs:

      "Traversal query (query without index): select * from [nt:base] where contains(*,'HelloWorld'); consider creating an index"

      Could you please let me know where I am making mistake or how to create proper index and do full text search on binary contents.

      Attachments

        Activity

          People

            Unassigned Unassigned
            ankush28 Ankush Nagapure
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated: