Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1098

better url-normalizer basic

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Closed
    • Major
    • Resolution: Invalid
    • 1.3
    • 1.5
    • fetcher
    • Any

    • Patch Available

    Description

      Basic URL normalizer lacks 2 important features

      Encode space in URL into %20 to unbreak httpclient and possibly others who do not expect space inside URL

      Ability to decode %33 encoding in URL. This is important for avoiding duplicates

      Attachments

        1. patch-with-utf8-encoding.diff
          8 kB
          Markus Jelsma

        Activity

          People

            Unassigned Unassigned
            hsn Radim Kolar
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Time Tracking

                Estimated:
                Original Estimate - 4h
                4h
                Remaining:
                Remaining Estimate - 4h
                4h
                Logged:
                Time Spent - Not Specified
                Not Specified