-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crawl https sitemap with http urls #639
Comments
I would try |
No difference with |
Have you tried using |
same issue here. |
That is a tough one because any URLs, even within the same site, might be supporting different schemes: We could investigate a new feature that always tries the "preferred" scheme first, and if failing, falling back to the other. But for some sites, it could pretty much double the number of "hits" on the server. My preference would be to encourage identifying those sites instead and handle them separately, in their own crawler config let's say. It may not always be the most realistic, but preferable when possible. If you have better options, I would like to hear them. |
Hi Pascal,
how can I crawl a sitemap with is reachable with https but contains urls with http. Norconex is not identifying any startURLs in that case.
I tried already setting lenient to true
As well as stayOnProtocol to false
Any other recommendations?
Best regards
Sascha
The text was updated successfully, but these errors were encountered: