{{ message }}
fix(utils): only strip leading scheme/www in _log_pretty_url#5050
Open
JSap0914 wants to merge 1 commit into
Open
fix(utils): only strip leading scheme/www in _log_pretty_url#5050JSap0914 wants to merge 1 commit into
JSap0914 wants to merge 1 commit into
Conversation
_log_pretty_url is documented to remove the protocol and a leading 'www.' prefix, but it used str.replace which removes those substrings anywhere in the string. This mangled domains containing the substring (e.g. 'awwww.com' became 'awcom') and corrupted URLs whose path or query legitimately contained 'http(s)://' or 'www.' (e.g. 'https://example.com/r?u=https://other.com' lost its embedded scheme). Strip only a leading scheme and a single leading 'www.' so the rest of the URL is preserved, matching the documented behavior.
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

What
_log_pretty_urlis documented to "remove the protocol and www. prefix", but it implemented this withstr.replace:str.replaceremoves every occurrence, not just the leading prefix, so the helper corrupts otherwise valid URLs:https://awwww.com→awcom(the substringwww.inside the domain is stripped)https://example.com/r?u=https://other.com→example.com/r?u=other.com(embedded scheme in the query is dropped)https://site.com/www.assets/img→site.com/assets/img(www.inside the path is stripped)This shows up in the tab-info debug logging in
browser/session.py, making logged URLs misleading.Fix
Strip only a leading scheme and a single leading
www., matching the documented behavior:The rest of the URL (path, query, fragment) is left untouched.
Verification
Added
tests/ci/infrastructure/test_log_pretty_url.pycovering prefix stripping, the three corruption cases above, and the existing truncation behavior.Before the fix the three "does not corrupt" cases fail; after the fix all 5 pass.
ruff checkandruff format --checkpass on the changed files.Summary by cubic
Fix
_log_pretty_urlto only strip a leadinghttp(s)://and a single leadingwww., preventing mangled domains and embedded URLs in logs. Adds focused tests; truncation behavior is unchanged.str.replacewithre.sub(r'^https?://', ...)and a prefix check forwww..https:///www..tests/ci/infrastructure/test_log_pretty_url.py.Written for commit aaa6033. Summary will update on new commits.