fix(cost): surface unpriced models and add provider pricing hook by jrob5756 · Pull Request #279 · microsoft/conductor · GitHub
Skip to content

fix(cost): surface unpriced models and add provider pricing hook#279

Merged
jrob5756 merged 4 commits into
mainfrom
fix/265-cost-summary-unpriced-pricing
Jul 2, 2026
Merged

fix(cost): surface unpriced models and add provider pricing hook#279
jrob5756 merged 4 commits into
mainfrom
fix/265-cost-summary-unpriced-pricing

Conversation

@jrob5756

@jrob5756 jrob5756 commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

Summary

Fixes #265. The end-of-run cost summary silently summed only the priced subset of agents and presented it as the complete total, so spend on models without available pricing vanished with no signal. This addresses both parts of the issue.

Part 1 — surface unpriced agents (no more silent undercount)

  • WorkflowUsage gains unpriced_agents / unpriced_models / has_unpriced. total_cost_usd still returns the partial sum (backward-compatible).
  • Execution summary exposes unpriced_agent_count / unpriced_models.
  • CLI renders Total: ~$X (N agents unpriced: model-a, …) and Cost data unavailable (…) instead of a clean-looking wrong number.
  • Web dashboard tracks an unpricedCount and the StatusBar shows a ~$X (N unpriced) badge.

Part 2 — pricing as a provider hook (static table as fallback)

  • New AgentProvider.get_model_pricing(model) -> ModelPricing | None (base default None), mirroring the existing get_max_prompt_tokens precedent.
  • Resolution order in engine/pricing.py: workflow cost.pricing override → provider hookDEFAULT_PRICING → unpriced.
  • Copilot derives USD from the SDK billing.token_prices (AI Credits per batch; documented 100 credits = $1 constant; guarded — never raises, falls back to the table). Claude / hermes inherit the None default.
  • WorkflowEngine._ensure_pricing_resolved bridges the async hook to the sync UsageTracker.record (called before all 5 record sites), resolving each model once and caching on the tracker.
  • Refreshed DEFAULT_PRICING with current Claude (Opus 4.7/4.8 +1m, Sonnet 5) and GPT-5.x (5.3/5.4/5.5, GPT-5 mini) models.

Concurrency fix (caught in review)

The first cut marked a model resolved before awaiting the hook, so parallel/for-each siblings sharing a model raced ahead and recorded as unpriced — reintroducing the exact undercount on the concurrent path. Fixed with a per-model asyncio.Lock and mark-resolved-after-cache. Regression test test_pricing_hook.py::test_concurrent_same_model_agents_all_priced was verified to fail on the old logic ([40.0, None, None, None]) and pass on the fix, with the hook still called exactly once.

Testing

  • uv run pytest -m "not real_api and not install_scripts"3729 passed, 13 skipped.
  • ruff check + ruff format --check clean; ty check src clean (one pre-existing unrelated dialog_evaluator.py diagnostic).
  • make validate-examples → all 30 examples validate.
  • Frontend rebuilt via make build-frontend.

New tests: unpriced surfacing (test_usage.py), pricing precedence (test_pricing.py), base hook default (test_base.py), Copilot derivation + guards (test_copilot.py), engine end-to-end + concurrency (test_pricing_hook.py), CLI rendering (test_logging.py).

🤖 Draft PR opened via git-workflow skill.

jrob5756 and others added 4 commits July 2, 2026 12:59
Cost summaries silently summed only the priced subset of agents and
presented it as the complete total, so spend on models without available
pricing vanished with no signal (#265).

- Surface unpriced agents: WorkflowUsage gains unpriced_agents /
  unpriced_models / has_unpriced (total_cost_usd stays a partial sum).
  The CLI summary and dashboard status bar now render
  "~$X (N agents unpriced: ...)" instead of a clean-looking wrong number.
- Add AgentProvider.get_model_pricing hook (base default None). Pricing
  resolves override -> provider hook -> DEFAULT_PRICING -> unpriced.
  Copilot derives USD from the SDK billing.token_prices (AI Credits,
  100 credits = $1; guarded, never raises); other providers fall back to
  the table. Refresh DEFAULT_PRICING with current Claude/GPT-5.x models.
- Bridge the async hook to the sync UsageTracker.record via
  WorkflowEngine._ensure_pricing_resolved (per-model asyncio.Lock so
  concurrent parallel/for-each siblings sharing a model are all priced).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Addresses PR review findings on the #265 provider pricing hook.

- copilot: reject negative / NaN / inf token prices (a malformed SDK price
  would supersede the static table and emit a confident-wrong cost); a
  genuine 0.0 is still priced as free. Move model-dict building inside the
  try and validate via a non-raising TypeGuard so get_model_pricing truly
  honors its never-raises contract. Ignore billing.multiplier and the
  long_context tier (documented).
- engine: surface a systemic pricing-hook failure once per run via a
  warning latch (mirrors _budget_unpriced_warned) so a broken live-pricing
  integration is diagnosable instead of silently degrading every model.
- pricing: make get_pricing's overrides / provider_pricing keyword-only so
  the two same-typed dicts can't be transposed; fix the gpt-5-mini table
  comment and several docstrings to match actual behavior.
- tests: add real-path parallel-group and for-each E2E coverage of the
  pricing wiring (previously only the single-agent path ran end-to-end);
  add negative/NaN/inf/zero-price, multiplier-ignored, malformed-list, and
  warning-latch cases; cover the CLI no-model-list and fully-priced branches.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
_is_finite_nonneg sat outside the SDK try/except, so a pathologically
large int price (unconvertible to float, e.g. 10**400) would make
math.isfinite raise OverflowError -- a latent hole in get_model_pricing's
never-raises contract. Catch it in the guard so such a value is rejected
(falls back to the static table) rather than raised.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The for-each E2E test passed for the wrong reason: finder shared
_UNPRICED_MODEL, so it resolved (and cached) the price at its main-loop
record site before the items ran, making the for-each site's resolution
an untested no-op fast path. Give finder a table-priced model (gpt-4o) so
the for-each site is the sole resolver of _UNPRICED_MODEL; dropping
_ensure_pricing_resolved from _execute_for_each_group now fails the test.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jrob5756 jrob5756 force-pushed the fix/265-cost-summary-unpriced-pricing branch from 901dc75 to 3a2ca92 Compare July 2, 2026 17:03
@jrob5756 jrob5756 marked this pull request as ready for review July 2, 2026 17:11
@jrob5756 jrob5756 merged commit 3ecdfea into main Jul 2, 2026
9 checks passed
@jrob5756 jrob5756 deleted the fix/265-cost-summary-unpriced-pricing branch July 2, 2026 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cost summary silently undercounts unpriced models; pricing should be a provider hook, not just a static table

1 participant