Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-4585

Text extraction: runtime status monitoring

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.4.7, 1.5.8, 1.6.0
    • lucene
    • None

    Description

      Text extraction is sometimes slow, and, in case of a bug in the text extraction library, can even get stuck in an endless loop.

      Right now, it is not easy to understand what is going on, even when looking at full thread dumps. (Debug) log information about the current state of text extraction would be nice as well.

      I suggest we add debug level logging for the current extracted binary (content identity). For larger binaries, we can also temporarily set the thread name (append "Extracting <contentIdentity>"). That way, it is relatively easy to see if text extraction is stuck simply looking at full thread dumps, without having to change the log level and then reindex.

      Attachments

        Activity

          People

            thomasm Thomas Mueller
            thomasm Thomas Mueller
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: