fix(utils): only strip leading scheme/www in _log_pretty_url by JSap0914 · Pull Request #5050 · browser-use/browser-use · GitHub
Skip to content

fix(utils): only strip leading scheme/www in _log_pretty_url#5050

Open
JSap0914 wants to merge 1 commit into
browser-use:mainfrom
JSap0914:fix-log-pretty-url-prefix
Open

fix(utils): only strip leading scheme/www in _log_pretty_url#5050
JSap0914 wants to merge 1 commit into
browser-use:mainfrom
JSap0914:fix-log-pretty-url-prefix

Conversation

@JSap0914

@JSap0914 JSap0914 commented Jun 16, 2026

Copy link
Copy Markdown

What

_log_pretty_url is documented to "remove the protocol and www. prefix", but it implemented this with str.replace:

s = s.replace('https://', '').replace('http://', '').replace('www.', '')

str.replace removes every occurrence, not just the leading prefix, so the helper corrupts otherwise valid URLs:

  • https://awwww.comawcom (the substring www. inside the domain is stripped)
  • https://example.com/r?u=https://other.comexample.com/r?u=other.com (embedded scheme in the query is dropped)
  • https://site.com/www.assets/imgsite.com/assets/img (www. inside the path is stripped)

This shows up in the tab-info debug logging in browser/session.py, making logged URLs misleading.

Fix

Strip only a leading scheme and a single leading www., matching the documented behavior:

s = re.sub(r'^https?://', '', s)
if s.startswith('www.'):
    s = s[len('www.') :]

The rest of the URL (path, query, fragment) is left untouched.

Verification

Added tests/ci/infrastructure/test_log_pretty_url.py covering prefix stripping, the three corruption cases above, and the existing truncation behavior.

$ uv run pytest tests/ci/infrastructure/test_log_pretty_url.py -p no:xdist -o addopts="" -q
5 passed

Before the fix the three "does not corrupt" cases fail; after the fix all 5 pass. ruff check and ruff format --check pass on the changed files.

Note: drafted with AI assistance and verified by a human (RED/GREEN tests run locally).


Summary by cubic

Fix _log_pretty_url to only strip a leading http(s):// and a single leading www., preventing mangled domains and embedded URLs in logs. Adds focused tests; truncation behavior is unchanged.

  • Bug Fixes
    • Replaced global str.replace with re.sub(r'^https?://', ...) and a prefix check for www..
    • Preserves path/query and embedded https:///www..
    • Added tests in tests/ci/infrastructure/test_log_pretty_url.py.

Written for commit aaa6033. Summary will update on new commits.

Review in cubic

_log_pretty_url is documented to remove the protocol and a leading 'www.'
prefix, but it used str.replace which removes those substrings anywhere in
the string. This mangled domains containing the substring (e.g. 'awwww.com'
became 'awcom') and corrupted URLs whose path or query legitimately contained
'http(s)://' or 'www.' (e.g. 'https://example.com/r?u=https://other.com' lost
its embedded scheme).

Strip only a leading scheme and a single leading 'www.' so the rest of the URL
is preserved, matching the documented behavior.
Copilot AI review requested due to automatic review settings June 16, 2026 04:04

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

@CLAassistant

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

Re-trigger cubic

@JSap0914

Copy link
Copy Markdown
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants