{{ message }}
fix(cost): surface unpriced models and add provider pricing hook#279
Merged
Conversation
Cost summaries silently summed only the priced subset of agents and presented it as the complete total, so spend on models without available pricing vanished with no signal (#265). - Surface unpriced agents: WorkflowUsage gains unpriced_agents / unpriced_models / has_unpriced (total_cost_usd stays a partial sum). The CLI summary and dashboard status bar now render "~$X (N agents unpriced: ...)" instead of a clean-looking wrong number. - Add AgentProvider.get_model_pricing hook (base default None). Pricing resolves override -> provider hook -> DEFAULT_PRICING -> unpriced. Copilot derives USD from the SDK billing.token_prices (AI Credits, 100 credits = $1; guarded, never raises); other providers fall back to the table. Refresh DEFAULT_PRICING with current Claude/GPT-5.x models. - Bridge the async hook to the sync UsageTracker.record via WorkflowEngine._ensure_pricing_resolved (per-model asyncio.Lock so concurrent parallel/for-each siblings sharing a model are all priced). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addresses PR review findings on the #265 provider pricing hook. - copilot: reject negative / NaN / inf token prices (a malformed SDK price would supersede the static table and emit a confident-wrong cost); a genuine 0.0 is still priced as free. Move model-dict building inside the try and validate via a non-raising TypeGuard so get_model_pricing truly honors its never-raises contract. Ignore billing.multiplier and the long_context tier (documented). - engine: surface a systemic pricing-hook failure once per run via a warning latch (mirrors _budget_unpriced_warned) so a broken live-pricing integration is diagnosable instead of silently degrading every model. - pricing: make get_pricing's overrides / provider_pricing keyword-only so the two same-typed dicts can't be transposed; fix the gpt-5-mini table comment and several docstrings to match actual behavior. - tests: add real-path parallel-group and for-each E2E coverage of the pricing wiring (previously only the single-agent path ran end-to-end); add negative/NaN/inf/zero-price, multiplier-ignored, malformed-list, and warning-latch cases; cover the CLI no-model-list and fully-priced branches. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
_is_finite_nonneg sat outside the SDK try/except, so a pathologically large int price (unconvertible to float, e.g. 10**400) would make math.isfinite raise OverflowError -- a latent hole in get_model_pricing's never-raises contract. Catch it in the guard so such a value is rejected (falls back to the static table) rather than raised. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The for-each E2E test passed for the wrong reason: finder shared _UNPRICED_MODEL, so it resolved (and cached) the price at its main-loop record site before the items ran, making the for-each site's resolution an untested no-op fast path. Give finder a table-priced model (gpt-4o) so the for-each site is the sole resolver of _UNPRICED_MODEL; dropping _ensure_pricing_resolved from _execute_for_each_group now fails the test. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
901dc75 to
3a2ca92
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
Fixes #265. The end-of-run cost summary silently summed only the priced subset of agents and presented it as the complete total, so spend on models without available pricing vanished with no signal. This addresses both parts of the issue.
Part 1 — surface unpriced agents (no more silent undercount)
WorkflowUsagegainsunpriced_agents/unpriced_models/has_unpriced.total_cost_usdstill returns the partial sum (backward-compatible).unpriced_agent_count/unpriced_models.Total: ~$X (N agents unpriced: model-a, …)andCost data unavailable (…)instead of a clean-looking wrong number.unpricedCountand the StatusBar shows a~$X (N unpriced)badge.Part 2 — pricing as a provider hook (static table as fallback)
AgentProvider.get_model_pricing(model) -> ModelPricing | None(base defaultNone), mirroring the existingget_max_prompt_tokensprecedent.engine/pricing.py: workflowcost.pricingoverride → provider hook →DEFAULT_PRICING→ unpriced.billing.token_prices(AI Credits per batch; documented100 credits = $1constant; guarded — never raises, falls back to the table). Claude / hermes inherit theNonedefault.WorkflowEngine._ensure_pricing_resolvedbridges the async hook to the syncUsageTracker.record(called before all 5 record sites), resolving each model once and caching on the tracker.DEFAULT_PRICINGwith current Claude (Opus 4.7/4.8 +1m, Sonnet 5) and GPT-5.x (5.3/5.4/5.5, GPT-5 mini) models.Concurrency fix (caught in review)
The first cut marked a model resolved before awaiting the hook, so parallel/for-each siblings sharing a model raced ahead and recorded as unpriced — reintroducing the exact undercount on the concurrent path. Fixed with a per-model
asyncio.Lockand mark-resolved-after-cache. Regression testtest_pricing_hook.py::test_concurrent_same_model_agents_all_pricedwas verified to fail on the old logic ([40.0, None, None, None]) and pass on the fix, with the hook still called exactly once.Testing
uv run pytest -m "not real_api and not install_scripts"→ 3729 passed, 13 skipped.ruff check+ruff format --checkclean;ty check srcclean (one pre-existing unrelateddialog_evaluator.pydiagnostic).make validate-examples→ all 30 examples validate.make build-frontend.New tests: unpriced surfacing (
test_usage.py), pricing precedence (test_pricing.py), base hook default (test_base.py), Copilot derivation + guards (test_copilot.py), engine end-to-end + concurrency (test_pricing_hook.py), CLI rendering (test_logging.py).🤖 Draft PR opened via git-workflow skill.