{{ message }}
fix(llm): include cache-read tokens in Anthropic total_tokens#5053
Open
he-yufeng wants to merge 2 commits into
Open
fix(llm): include cache-read tokens in Anthropic total_tokens#5053he-yufeng wants to merge 2 commits into
he-yufeng wants to merge 2 commits into
Conversation
c044127 to
42351e4
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

What
_get_usagecomputesprompt_tokensasinput_tokens + cache_read_input_tokens(Anthropic reports cached prompt tokens separately, so they have to be added back), buttotal_tokenswas left as justinput_tokens + output_tokens. As soon as prompt caching kicks in,total_tokensis smaller thanprompt_tokens + completion_tokens, which breaks anything downstream that assumes the totals add up (cost tracking, budget/limit checks).Concrete example with a cached prompt:
input_tokens=500,cache_read_input_tokens=10000,output_tokens=200givesprompt_tokens=10500,completion_tokens=200, buttotal_tokens=700instead of10700.This affects both
ChatAnthropic(browser_use/llm/anthropic/chat.py) andChatAnthropicBedrock(browser_use/llm/aws/chat_anthropic.py), which share the same usage logic.Fix
Add the cache-read tokens to
total_tokensso it mirrors theprompt_tokensformula and the invarianttotal_tokens == prompt_tokens + completion_tokensholds again. One line per file.Verifying
Added
tests/ci/models/test_anthropic_usage.pycovering both clients: the cached case (asserts the totals add up) and the no-cache case (asserts the number is unchanged).The cached-case assertions fail on current
main(700 != 10700) and pass with this change; the no-cache case stays at 700 both ways.ruff check/ruff formatclean on the touched files.Note: #4294 proposed the same one-liner for
chat.pybut went stale before it landed, and it never touched the Bedrock client. This covers both and adds a regression test.Summary by cubic
Fixes Anthropic usage accounting by adding cache-read prompt tokens to
total_tokens, restoringtotal_tokens == prompt_tokens + completion_tokens. Applies to both Anthropic and Bedrock clients.cache_read_input_tokensintotal_tokensinbrowser_use/llm/anthropic/chat.pyandbrowser_use/llm/aws/chat_anthropic.py.tests/ci/models/test_anthropic_usage.pyfor cached/no-cache cases and type the mock response helper for Pyright.Written for commit 42351e4. Summary will update on new commits.