Uploaded image for project: 'Nutch'
  1. Nutch
  2. NUTCH-1929

Consider implementing dependency injection for crawl HTTPS sites that use self signed certificates

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • None
    • None

    Description

      It was mentioned a while ago that "to be able to crawl sites with a self signed certificate required a simple code modification the protocol-httpclient plugin."

      in org.apache.nutch.protocol.httpclient.Http
      
      Replace:
      
      ProtocolSocketFactory factory = new SSLProtocolSocketFactory();
      
      With:
      
      ProtocolSocketFactory factory = new DummySSLProtocolSocketFactory();
      

      I can confirm that this patch actually fixes the issue, however the thread hangs on a question which was never answered.

      "Is there dependency injection that can be used?"

      This issue needs to investigate the required logic which we can implement to make the decision at runtime.

      Attachments

        Issue Links

          Activity

            People

              lewismc Lewis John McGibbney
              lewismc Lewis John McGibbney
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: