Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1458

Support for raw HTML field added to Solr

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Duplicate
    • 1.5.1
    • None
    • indexer, parser

    Description

      At the moment, the “content” field holds only the parsed text from the page. It would be nice to have a separate field in Solr document that would hold raw HTML from the crawled page.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              dzyubam Max Dzyuba
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: