{{ message }}
NUTCH-2990 HttpRobotRulesParser to follow 5 redirects as specified by RFC 9309#779
Merged
sebastian-nagel merged 3 commits intoOct 21, 2023
Conversation
… RFC 9309 - follow multiple redirects when fetching robots.txt - number of followed redirects is configurable by the property http.robots.redirect.max (default: 5)
- bug fix: the passed agent names need to be transferred to the property http.robots.agents earlier, before the protocol plugins are configured - more verbose debug logging
Member
Contributor
Author
http://wikipedia.org/robots.txt Note: works with protocol-http, for protocol-okhttp need also to apply the fix for NUTCH-3002. Maybe as an additional note: this PR removes the secondary lookup for a lower-cased "location" header. Case-insensitive lookup of protocol metadata should be implemented on protocol level. |
… RFC 9309 - fix comment
Contributor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

http.robots.redirect.max(default: 5)