Details
Description
When we are doing big crawls we would like to know how many of the URLs are being discarded by the regex filters, this is only presented in the Inject class:
Injector: Total number of urls rejected by filters: 0
It will be nice to have a counter in the CrawlDB class so we know in every round how many were discarded by our filters:
CrawlDb update: Total number of URLs filtered by regex filters: 31415