Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-802

Problems managing outlinks with large url length

VotersWatch issueWatchersLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Patch Available

    Description

      Nutch can get idle during the collection of outlinks if the URL address of the outlink is too large.

      The maximum sizes of an URL for the main web servers are:

      • Apache: 4,000 bytes
      • Microsoft Internet Information Server (IIS): 16, 384 bytes
      • Perl HTTP::Daemon: 8.000 bytes

      URL adress sizes bigger than 4000 bytes are problematic, so the limit should be set in the nutch-default.xml configuration file.

      I attached a patch

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            ab Andrzej Bialecki
            elaragon Pablo Aragón
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment