Uploaded image for project: 'Tika'
  1. Tika
  2. TIKA-3476

Remove tag reports from default tika-eval reports

    XMLWordPrintableJSON

Details

    • Task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 2.0.0
    • None
    • None

    Description

      tika-eval can run on xhtml output from Tika. When it does, it maintains counts of those tags, and then allows for sums of those tags per file type and comparison of tags extracted.

      When tika-eval is run against text output from Tika, these queries are taking 30 seconds per tag type on a million files because of the joins.

      In Tika 2.x let's turn off tag reports by default, but allow users to include them if needed with the exising -rf (reports file) commandline option.

      Attachments

        Activity

          People

            Unassigned Unassigned
            tallison Tim Allison
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: