feat(pricing): embed models.dev snapshot for offline pricing#1242
Conversation
ccusage prices models from the embedded LiteLLM snapshot plus a runtime models.dev fallback that was only consulted when online. Newly released Anthropic models (e.g. claude-fable-5) ship on models.dev before LiteLLM publishes them, so they could not be priced offline at all. Pin the models.dev source as a flake input and reproducibly regenerate a compacted, Anthropic-only pricing snapshot from it. models.dev ships per-model TOML rather than a prebuilt catalog, so the snapshot is built with the project's own generateCatalog routine (Bun + the zero-dependency remeda/zod vendored from the pinned bun.lock hashes) and then trimmed to the pricing fields ccusage consumes. The result is committed to the repo and embedded via include_str!, so every platform (Nix and plain cargo on macOS/Windows) ships identical, pinned data with no build-time network access. At runtime the embedded models.dev data is kept as a separate fallback map, consulted only when the primary table misses, so it never perturbs the primary table's fuzzy alias matching. Unlike the network source it stays available offline. - flake.nix/flake.lock: pin anomalyco/models.dev input - nix/models-dev-gen.ts + nix/models-dev-pricing.nix: reproducible generator, exposed as the .#models-dev-pricing package - justfile: gen-models-dev-pricing / update-models-dev-pricing recipes - .github/workflows/update-pricing.yaml: refresh LiteLLM and models.dev snapshots hourly, each opening its own PR when the pricing actually changes - pricing.rs: embed the committed snapshot and resolve it as an offline-capable fallback, with tests covering offline resolution of models.dev-only models
Deploying with
|
| Status | Name | Latest Commit | Preview URL | Updated (UTC) |
|---|---|---|---|---|
| ✅ Deployment successful! View logs |
ccusage-guide | 70dca9e | Commit Preview URL Branch Preview URL |
Jun 09 2026, 10:48 PM |
There was a problem hiding this comment.
Reviewed changes — embeds a pinned, generated models.dev pricing snapshot as a third fallback tier in PricingMap, enabling offline resolution of newly released Anthropic models that aren't yet in LiteLLM.
- Embedded models.dev snapshot — committed at
rust/crates/ccusage/src/models-dev-pricing.json, embedded viainclude_str!, and consulted byfind()/context_limit()after the primary table and the network models.dev cache. - Reproducible generator —
nix/models-dev-gen.tsimports models.dev's owngenerateCatalog, filters to Anthropic models, and outputs compacted JSON with stable key ordering.nix/models-dev-pricing.nixvendorsremeda/zodby hash for reproducibility. - Runtime fallback chain —
find()now chains through the embedded snapshot with its ownenable_embedded_models_dev_fallbackflag, keeping the embedded map separate so it never interferes with the primary table's fuzzy alias matching. - Hourly automation —
update-pricing.yamlgains aupdate-models-dev-pricingjob alongside the existing LiteLLM job, with per-job permissions scoping. - Tests — three new tests cover snapshot parseability, offline fallback for models only in the embedded snapshot, and resolution of
claude-fable-5specifically.
ℹ️ No critical issues — one minor suggestion inline.
DeepSeek Pro (free via Pullfrog for OSS) | 𝕏
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
.github/workflows/update-pricing.yaml (1)
76-133: ⚖️ Poor tradeoffConsider potential race condition between parallel jobs.
Both
update-pricingandupdate-models-dev-pricingjobs updateflake.lock(different inputs) and can run simultaneously. While--force-with-leaseand different branch names reduce the risk, concurrent updates toflake.lockcould cause one job to fail if they finish within the same minute.Impact is minimal since the failed job will retry hourly, but you could serialize them using
needs:if deterministic execution order is preferred.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/update-pricing.yaml around lines 76 - 133, Two jobs can race updating flake.lock: serialize the jobs by adding a dependency so they don't run in parallel; specifically add a needs: reference on the update-models-dev-pricing job (or on update-pricing depending which should run first) so GitHub Actions will wait for the prior job to finish before starting the other, e.g., make update-models-dev-pricing depend on update-pricing (use the job name update-pricing in needs) to prevent concurrent flake.lock edits and eliminate the --force-with-lease collision risk.nix/models-dev-gen.ts (1)
79-79: 💤 Low valueConsider adding OUTFILE validation for better error messaging.
While the non-null assertion is safe in the Nix build context (where
OUTFILEis always set), adding an explicit check would provide a clearer error if the script is accidentally run outside Nix:+const outfile = process.env.OUTFILE; +if (!outfile) { + throw new Error('OUTFILE environment variable is required'); +} -await Bun.write(process.env.OUTFILE!, `${JSON.stringify(sortObject(out), null, 2)}\n`); +await Bun.write(outfile, `${JSON.stringify(sortObject(out), null, 2)}\n`);However, since this is strictly a Nix-invoked build script and any failure would be caught at build time, the current approach is acceptable.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@nix/models-dev-gen.ts` at line 79, Replace the non-null assertion on process.env.OUTFILE before calling Bun.write with an explicit validation: check that process.env.OUTFILE is defined (and optionally non-empty) and if not, throw or log a clear error and exit (so the failure message explains OUTFILE is missing), then call Bun.write with process.env.OUTFILE; reference the OUTFILE env var, the Bun.write(...) call, and the sortObject(out) usage when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/update-pricing.yaml:
- Line 17: The checkout steps currently use "uses:
actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10" without disabling
persisted credentials; update both checkout actions (the ones using
actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10) to include the input
"persist-credentials: false" under their step configuration so the runner will
not leak git credentials (add the single key/value to each checkout step).
---
Nitpick comments:
In @.github/workflows/update-pricing.yaml:
- Around line 76-133: Two jobs can race updating flake.lock: serialize the jobs
by adding a dependency so they don't run in parallel; specifically add a needs:
reference on the update-models-dev-pricing job (or on update-pricing depending
which should run first) so GitHub Actions will wait for the prior job to finish
before starting the other, e.g., make update-models-dev-pricing depend on
update-pricing (use the job name update-pricing in needs) to prevent concurrent
flake.lock edits and eliminate the --force-with-lease collision risk.
In `@nix/models-dev-gen.ts`:
- Line 79: Replace the non-null assertion on process.env.OUTFILE before calling
Bun.write with an explicit validation: check that process.env.OUTFILE is defined
(and optionally non-empty) and if not, throw or log a clear error and exit (so
the failure message explains OUTFILE is missing), then call Bun.write with
process.env.OUTFILE; reference the OUTFILE env var, the Bun.write(...) call, and
the sortObject(out) usage when making the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 51ec09c2-9484-4503-8e7d-5c686e76ca50
⛔ Files ignored due to path filters (1)
flake.lockis excluded by!**/*.lock
📒 Files selected for processing (9)
.github/workflows/update-pricing.yamlflake.nixjustfilenix/models-dev-gen.tsnix/models-dev-pricing.nixnix/packages.nixpackage.nixrust/crates/ccusage/src/models-dev-pricing.jsonrust/crates/ccusage/src/pricing.rs
ccusage
@ccusage/ccusage-darwin-arm64
@ccusage/ccusage-darwin-x64
@ccusage/ccusage-linux-arm64
@ccusage/ccusage-linux-x64
@ccusage/ccusage-win32-arm64
@ccusage/ccusage-win32-x64
commit: |
There was a problem hiding this comment.
2 issues found across 10 files
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
The update-models-dev-pricing recipe used just dependencies for generation and validation, which run before the recipe body. That regenerated and checked against the old models-dev lock, then updated only flake.lock afterward. Run the models.dev input update first, then regenerate the committed snapshot and run the normal check. Also make the claude-fable-5 offline pricing test assert that the embedded snapshot actually contains the target model instead of silently returning.
ccusage performance comparisonPR SHA: This compares the PR package against the configured base package on the same CI runner. Package runner startupExecution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one
Cached bunx execution performanceRuns the same large fixture through Fixtures: Claude
Package runtime diagnosticsCompares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself. Fixtures: Claude
Committed fixture performanceCommitted small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage. Fixtures: Claude
Large real-world-shaped fixture performanceGenerated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures. Fixtures: Claude
Artifact size
Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees. |
ccusage performance comparisonPR SHA: This compares the Rust PR release binary against the configured base package on the same CI runner. Package runner startupExecution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one
Cached bunx execution performanceRuns the same large fixture through Fixtures: Claude
Package runtime diagnosticsCompares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself. Fixtures: Claude
Committed fixture performanceCommitted small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage. Fixtures: Claude
Large real-world-shaped fixture performanceGenerated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures. Fixtures: Claude
Artifact size
Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees. |
|
@coderabbitai review |
@ryoppippi I have started the AI code review. It will take a few minutes to complete. |
Address reviewer feedback on the pricing refresh workflow. Disable persisted checkout credentials because the jobs configure git identity explicitly and use GH_TOKEN only for pull-request operations. Serialize the models.dev refresh after the LiteLLM refresh to avoid simultaneous flake.lock update jobs, and validate OUTFILE before writing the generated models.dev pricing snapshot so local/manual failures are explicit.
|
@coderabbitai review |
@ryoppippi I have started the AI code review. It will take a few minutes to complete. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
# Read the main Rust pricing file
cat rust/crates/ccusage/src/pricing.rs
echo "=== models-dev-gen.ts ==="
cat nix/models-dev-gen.ts
echo "=== models-dev-pricing.nix ==="
cat nix/models-dev-pricing.nix
echo "=== update-pricing.yaml ==="
cat .github/workflows/update-pricing.yaml
echo "=== justfile relevant section ==="
cat justfile
echo "=== nix/packages.nix ==="
cat nix/packages.nix
echo "=== package.nix ==="
cat package.nix
echo "=== flake.nix relevant section ==="
cat flake.nixLength of output: 92594
|
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
# Read the main files for review
echo "=== nix/models-dev-gen.ts ==="
cat nix/models-dev-gen.ts
echo ""
echo "=== nix/models-dev-pricing.nix ==="
cat nix/models-dev-pricing.nix
echo ""
echo "=== rust/crates/ccusage/src/pricing.rs ==="
cat rust/crates/ccusage/src/pricing.rs
echo ""
echo "=== .github/workflows/update-pricing.yaml ==="
cat .github/workflows/update-pricing.yaml
echo ""
echo "=== justfile (relevant recipes) ==="
cat justfile
echo ""
echo "=== nix/packages.nix ==="
cat nix/packages.nix
echo ""
echo "=== package.nix ==="
cat package.nix
echo ""
echo "=== flake.nix (head) ==="
head -60 flake.nixLength of output: 93054
Review:
|
CodeRabbit pointed out that the embedded models.dev fallback ignored parse failures from the compile-time snapshot. Treat that as a build-time invariant violation instead of silently returning an empty fallback map. Keep the claude-fable-5 offline test because it is the concrete motivation for this PR, but make the assertion failure explain that the pinned models.dev snapshot is expected to contain it.
|
@coderabbitai Addressed the valid runtime invariant from your latest analysis in 85c285b: embedded models.dev snapshot parse failures now fail fast instead of silently producing an empty fallback map. The OUTFILE validation and workflow serialization items were already fixed in ed99589. I kept the claude-fable-5 assertion intentionally because it is the concrete regression target for this PR, and added a clear assertion message. Validation: focused cargo test for embedded_models_dev, just fmt, and pre-push clippy/treefmt/gitleaks/cargo test passed. |
|
@coderabbitai review |
@ryoppippi I have started the AI code review. It will take a few minutes to complete. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
# Verify the fixes in the two commits
git show 85c285b6 --stat
echo "---"
git show 85c285b6 -- rust/crates/ccusage/src/pricing.rs | head -80
echo "==========================="
git show ed995893 --stat
echo "---"
git show ed995893 -- .github/workflows/update-pricing.yaml | head -100
git show ed995893 -- nix/models-dev-gen.ts | head -50Length of output: 6674
Everything looks good. Thanks for the thorough follow-up! (ᵔ◡ᵔ)🐇 If you found this review helpful, would you consider giving us a shout-out on X? Thank you for using CodeRabbit! |
There was a problem hiding this comment.
✅ No new issues found.
Reviewed changes — the two follow-up commits harden CI, fix recipe ordering, validate env vars, and strengthen tests.
- CI hardening — both checkout steps now use
persist-credentials: false;update-models-dev-pricingis serialized afterupdate-pricingvianeeds:; the LiteLLM job captures pre-update JSON to compare content and skip lock-only churn. - Recipe ordering fix —
update-models-dev-pricingnow runsnix flake update models-devbeforejust gen-models-dev-pricingandjust check. OUTFILEvalidation —nix/models-dev-gen.tsreplacesprocess.env.OUTFILE!with an explicit null/empty check and a descriptive error.- Test assertion strengthening —
offline_prices_new_anthropic_model_from_embedded_models_devnow asserts the entry exists instead of conditionally returning.
DeepSeek Pro (free via Pullfrog for OSS) | 𝕏
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
echo "=== rust/crates/ccusage/src/pricing.rs ==="
cat rust/crates/ccusage/src/pricing.rs
echo ""
echo "=== nix/models-dev-gen.ts ==="
cat nix/models-dev-gen.ts
echo ""
echo "=== .github/workflows/update-pricing.yaml ==="
cat .github/workflows/update-pricing.yaml
echo ""
echo "=== justfile ==="
cat justfile
echo ""
echo "=== nix/models-dev-pricing.nix ==="
cat nix/models-dev-pricing.nix
echo ""
echo "=== rust/crates/ccusage/src/models-dev-pricing.json (first 50 lines) ==="
head -50 rust/crates/ccusage/src/models-dev-pricing.json 2>/dev/null || echo "File not found"Length of output: 89893
Addressed ✅
One remaining question — Both jobs still check out with If - name: Configure git credentials
run: gh auth setup-git
env:
GH_TOKEN: ${{ github.token }}Remaining low-priority items (not blocking, can be addressed in follow-up)
Overall the PR is in solid shape — the three-tier fallback design, test coverage, and automation are all clean. Just need to confirm the credential helper question before merging. |
There was a problem hiding this comment.
✅ No new issues found.
Reviewed changes — a single follow-up commit makes the embedded snapshot parse failure a hard error and adds a descriptive assertion message.
- Fail-fast on parse errors —
embedded_models_dev_pricing()now calls.expect("...")onload_models_dev_json_missingso a corrupt committed snapshot halts immediately instead of being silently skipped. - Descriptive assertion — the
claude-fable-5resolution test usesassert!(..., "...")with a failure message instead of a bare assertion.
DeepSeek Pro (free via Pullfrog for OSS) | 𝕏
There was a problem hiding this comment.
1 issue found across 10 files
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
ccusage performance comparisonPR SHA: This compares the Rust PR release binary against the configured base package on the same CI runner. Package runner startupExecution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one
Cached bunx execution performanceRuns the same large fixture through Fixtures: Claude
Package runtime diagnosticsCompares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself. Fixtures: Claude
Committed fixture performanceCommitted small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage. Fixtures: Claude
Large real-world-shaped fixture performanceGenerated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures. Fixtures: Claude
Artifact size
Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees. |
ccusage performance comparisonPR SHA: This compares the PR package against the configured base package on the same CI runner. Package runner startupExecution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one
Cached bunx execution performanceRuns the same large fixture through Fixtures: Claude
Package runtime diagnosticsCompares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself. Fixtures: Claude
Committed fixture performanceCommitted small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage. Fixtures: Claude
Large real-world-shaped fixture performanceGenerated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures. Fixtures: Claude
Artifact size
Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees. |
ccusage performance comparisonPR SHA: This compares the PR package against the configured base package on the same CI runner. Package runner startupExecution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one
Cached bunx execution performanceRuns the same large fixture through Fixtures: Claude
Package runtime diagnosticsCompares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself. Fixtures: Claude
Committed fixture performanceCommitted small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage. Fixtures: Claude
Large real-world-shaped fixture performanceGenerated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures. Fixtures: Claude
Artifact size
Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees. |
ccusage performance comparisonPR SHA: This compares the Rust PR release binary against the configured base package on the same CI runner. Package runner startupExecution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one
Cached bunx execution performanceRuns the same large fixture through Fixtures: Claude
Package runtime diagnosticsCompares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself. Fixtures: Claude
Committed fixture performanceCommitted small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage. Fixtures: Claude
Large real-world-shaped fixture performanceGenerated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures. Fixtures: Claude
Artifact size
Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees. |
ccusage performance comparisonPR SHA: This compares the PR package against the configured base package on the same CI runner. Package runner startupExecution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one
Cached bunx execution performanceRuns the same large fixture through Fixtures: Claude
Package runtime diagnosticsCompares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself. Fixtures: Claude
Committed fixture performanceCommitted small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage. Fixtures: Claude
Large real-world-shaped fixture performanceGenerated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures. Fixtures: Claude
Artifact size
Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees. |
ccusage performance comparisonPR SHA: This compares the Rust PR release binary against the configured base package on the same CI runner. Package runner startupExecution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one
Cached bunx execution performanceRuns the same large fixture through Fixtures: Claude
Package runtime diagnosticsCompares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself. Fixtures: Claude
Committed fixture performanceCommitted small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage. Fixtures: Claude
Large real-world-shaped fixture performanceGenerated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures. Fixtures: Claude
Artifact size
Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees. |
Keep the models.dev pricing updater serialized after the LiteLLM updater to avoid concurrent flake.lock edits, but run it with always() so a LiteLLM refresh failure does not skip the independent models.dev refresh.
|
@coderabbitai review |
@ryoppippi I have started the AI code review. It will take a few minutes to complete. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
echo "=== .github/workflows/update-pricing.yaml ==="
cat .github/workflows/update-pricing.yaml
echo ""
echo "=== .github/actions/setup-nix (if exists) ==="
find . -path './.github/actions/setup-nix*' -exec cat {} \;
echo ""
echo "=== rust/crates/ccusage/src/pricing.rs ==="
cat rust/crates/ccusage/src/pricing.rs
echo ""
echo "=== nix/models-dev-gen.ts ==="
cat nix/models-dev-gen.ts
echo ""
echo "=== nix/models-dev-pricing.nix ==="
cat nix/models-dev-pricing.nix
echo ""
echo "=== justfile (update-models-dev-pricing recipe) ==="
cat justfileLength of output: 92106
🔴 Unresolved —
|
| Item | Status |
|---|---|
embedded_models_dev_pricing() — .expect("…must parse") |
✓ |
OUTFILE guard in nix/models-dev-gen.ts |
✓ |
claude-fable-5 assertion with clear message |
✓ |
needs: update-pricing + if: ${{ always() }} serialization |
✓ |
🟡 Remaining low-priority nits (not blocking)
- SRI hashes in
nix/models-dev-pricing.nix— still require manual sync whennix flake update models-devbumps the input; consider a follow-up to automate hash extraction or add a CI guard. load_models_dev_json_missingdrops provider namespace — intentional for Anthropic scope, but a one-line comment confirming this would help future maintainers.KEEP = /claude|anthropic/i— still substring-matches;\b(claude|anthropic)\bwould be more precise (cosmetic).
There was a problem hiding this comment.
1 issue found across 10 files
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
There was a problem hiding this comment.
✅ No new issues found.
Reviewed changes — the single follow-up commit adds if: ${{ always() }} to the update-models-dev-pricing job so a LiteLLM refresh failure does not silently skip the models.dev snapshot refresh.
- CI resilience —
update-models-dev-pricingnow usesif: ${{ always() }}alongsideneeds: update-pricing, preserving serialized execution order while preventing a failed LiteLLM update from blocking the independent models.dev refresh.
DeepSeek Pro (free via Pullfrog for OSS) | 𝕏
ccusage performance comparisonPR SHA: This compares the PR package against the configured base package on the same CI runner. Package runner startupExecution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one
Cached bunx execution performanceRuns the same large fixture through Fixtures: Claude
Package runtime diagnosticsCompares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself. Fixtures: Claude
Committed fixture performanceCommitted small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage. Fixtures: Claude
Large real-world-shaped fixture performanceGenerated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures. Fixtures: Claude
Artifact size
Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees. |
ccusage performance comparisonPR SHA: This compares the Rust PR release binary against the configured base package on the same CI runner. Package runner startupExecution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one
Cached bunx execution performanceRuns the same large fixture through Fixtures: Claude
Package runtime diagnosticsCompares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself. Fixtures: Claude
Committed fixture performanceCommitted small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage. Fixtures: Claude
Large real-world-shaped fixture performanceGenerated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures. Fixtures: Claude
Artifact size
Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees. |
Keep checkout credential persistence disabled in the pricing update workflow while restoring an explicit authentication path for later git push commands. Both pricing automation jobs already receive GH_TOKEN for gh CLI operations. Running gh auth setup-git configures Git to use that token-backed auth before pushing the automation branches, so the workflow no longer depends on credentials persisted by actions/checkout.
|
@coderabbitai review |
@ryoppippi I have started the AI code review. It will take a few minutes to complete. |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
echo "=== .github/workflows/update-pricing.yaml ==="
cat .github/workflows/update-pricing.yaml
echo ""
echo "=== .github/actions/setup-nix/action.yml (if exists) ==="
find . -path './.github/actions/setup-nix*' | xargs cat 2>/dev/null || echo "Not found"
echo ""
echo "=== git log --oneline -8 ==="
git log --oneline -8Length of output: 9347
All blocking issues are now addressed:
The three remaining low-priority nits (SRI hash automation in (ᵔ◡ᵔ)🐇 |
ccusage performance comparisonPR SHA: This compares the Rust PR release binary against the configured base package on the same CI runner. Package runner startupExecution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one
Cached bunx execution performanceRuns the same large fixture through Fixtures: Claude
Package runtime diagnosticsCompares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself. Fixtures: Claude
Committed fixture performanceCommitted small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage. Fixtures: Claude
Large real-world-shaped fixture performanceGenerated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures. Fixtures: Claude
Artifact size
Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees. |
ccusage performance comparisonPR SHA: This compares the PR package against the configured base package on the same CI runner. Package runner startupExecution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one
Cached bunx execution performanceRuns the same large fixture through Fixtures: Claude
Package runtime diagnosticsCompares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself. Fixtures: Claude
Committed fixture performanceCommitted small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage. Fixtures: Claude
Large real-world-shaped fixture performanceGenerated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures. Fixtures: Claude
Artifact sizeLower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees. |


Summary
Prices brand-new Anthropic models offline by embedding a pinned, self-generated
models.dev pricing snapshot. Models such as
claude-fable-5ship on models.devbefore LiteLLM publishes them, so they previously could not be priced without
network access (the models.dev fallback was online-only).
What changed
anomalyco/models.devas a flake input.nix/models-dev-gen.ts+nix/models-dev-pricing.nix):models.dev ships per-model TOML, not a prebuilt catalog, so the snapshot is
built with the project's own
generateCatalog(Bun + the zero-dependencyremeda/zodvendored from the pinnedbun.lockhashes) and compacted to theAnthropic models and pricing fields ccusage consumes. Exposed as the
.#models-dev-pricingpackage.rust/crates/ccusage/src/models-dev-pricing.jsonand embedded viainclude_str!. No build-time network on any platform (Nix builds and plaincargo buildon macOS/Windows ship identical, pinned data).build.rsisunchanged.
pricing.rs): the embedded models.dev data is a separate,offline-capable fallback map, consulted only when the primary table misses, so
it never perturbs the primary table's fuzzy alias matching.
.github/workflows/update-pricing.yaml): refresh the LiteLLMand models.dev snapshots hourly, each opening its own PR only when the
pricing actually changes. Workflow permissions scoped per job.
justfile):gen-models-dev-pricingandupdate-models-dev-pricing.Why
The hardcoded built-in pricing table only covers a fixed set of models, and the
models.dev fallback required network access. Embedding a pinned snapshot closes
the offline gap for newly released models while keeping the data reproducible and
reviewable in git (refreshed automatically).
Testing
just check(clippy, treefmt, schema drift, gitleaks, nix build) — passedcargo test -p ccusagepricing suite (52 tests, incl. offline resolution ofmodels.dev-only models like
claude-fable-5) — passednix build .#ccusageandnix build .#models-dev-pricing— passedSummary by cubic
Embed a pinned
models.devpricing snapshot intoccusageto enable offline pricing for new Anthropic models whenLiteLLMhasn’t published them yet. Builds reproducibly with no build-time network, keeps primary pricing behavior unchanged, and fails fast if the embedded snapshot is invalid.New Features
anomalyco/models.devas a flake input.nix/models-dev-gen.ts+nix/models-dev-pricing.nix) usingBunwith vendoredremeda/zod; compacts to Anthropic models and required fields; validatesOUTFILEbefore write.rust/crates/ccusage/src/models-dev-pricing.jsonand embed viainclude_str!; Nix andcargobuilds ship identical, pinned data.models.devfallback map inpricing.rs, used only on misses so fuzzy alias matching in the primary table is unaffected; works offline; treat embedded snapshot parse errors as build-time failures.Automation
.github/workflows/update-pricing.yaml: hourly refresh forLiteLLMandmodels.dev; serialize themodels.devupdater afterLiteLLM; runmodels.devrefresh even if theLiteLLMjob fails (always()).gh auth setup-gitfor pushes usingGH_TOKEN; disable persisted checkout credentials; job-scoped permissions.Written for commit 70dca9e. Summary will update on new commits.
Summary by CodeRabbit
New Features
Chores