Uploaded image for project: 'Jackrabbit Oak'
  1. Jackrabbit Oak
  2. OAK-6587

Provide a way to "force" Tika to treat binaries with a different mime type than the jcr:mimeType property

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 1.7.7, 1.8.0
    • lucene
    • None

    Description

      There are occasions when an existing Tika parser can be used to parse a binary, but Tika doesn't "think" it can because the mime type isn't in the Parser's list of supported types. There appears to be no way to configure this in Tika. Editing the config.xml file only allows for types which are theoretically parseable to be mapped to different parsers; it doesn't change the set of supported types.

      To deal with this, I'd like to add a new configuration node structure named mimeTypes under the tika node of lucene indexes. Using this structure, a mapped type can be defined and this type will be used for interaction with Tika.

      Attachments

        Activity

          People

            chetanm Chetan Mehrotra
            justinedelson Justin Edelson
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: