iframe-proxy

ryoppippi · 2026-06-09T21:52:41Z

Summary

Prices brand-new Anthropic models offline by embedding a pinned, self-generated
models.dev pricing snapshot. Models such as claude-fable-5 ship on models.dev
before LiteLLM publishes them, so they previously could not be priced without
network access (the models.dev fallback was online-only).

What changed

Pin the source: add anomalyco/models.dev as a flake input.
Reproducible generator (nix/models-dev-gen.ts + nix/models-dev-pricing.nix):
models.dev ships per-model TOML, not a prebuilt catalog, so the snapshot is
built with the project's own generateCatalog (Bun + the zero-dependency
remeda/zod vendored from the pinned bun.lock hashes) and compacted to the
Anthropic models and pricing fields ccusage consumes. Exposed as the
.#models-dev-pricing package.
Commit + embed: the compacted snapshot is committed at
rust/crates/ccusage/src/models-dev-pricing.json and embedded via
include_str!. No build-time network on any platform (Nix builds and plain
cargo build on macOS/Windows ship identical, pinned data). build.rs is
unchanged.
Runtime (pricing.rs): the embedded models.dev data is a separate,
offline-capable fallback map, consulted only when the primary table misses, so
it never perturbs the primary table's fuzzy alias matching.
Automation (.github/workflows/update-pricing.yaml): refresh the LiteLLM
and models.dev snapshots hourly, each opening its own PR only when the
pricing actually changes. Workflow permissions scoped per job.
Recipes (justfile): gen-models-dev-pricing and
update-models-dev-pricing.

Why

The hardcoded built-in pricing table only covers a fixed set of models, and the
models.dev fallback required network access. Embedding a pinned snapshot closes
the offline gap for newly released models while keeping the data reproducible and
reviewable in git (refreshed automatically).

Testing

just check (clippy, treefmt, schema drift, gitleaks, nix build) — passed
cargo test -p ccusage pricing suite (52 tests, incl. offline resolution of
models.dev-only models like claude-fable-5) — passed
nix build .#ccusage and nix build .#models-dev-pricing — passed

Summary by cubic

Embed a pinned models.dev pricing snapshot into ccusage to enable offline pricing for new Anthropic models when LiteLLM hasn’t published them yet. Builds reproducibly with no build-time network, keeps primary pricing behavior unchanged, and fails fast if the embedded snapshot is invalid.

New Features
- Pin anomalyco/models.dev as a flake input.
- Reproducible snapshot generator (nix/models-dev-gen.ts + nix/models-dev-pricing.nix) using Bun with vendored remeda/zod; compacts to Anthropic models and required fields; validates OUTFILE before write.
- Commit snapshot to rust/crates/ccusage/src/models-dev-pricing.json and embed via include_str!; Nix and cargo builds ship identical, pinned data.
- Runtime adds a separate embedded models.dev fallback map in pricing.rs, used only on misses so fuzzy alias matching in the primary table is unaffected; works offline; treat embedded snapshot parse errors as build-time failures.
Automation
- .github/workflows/update-pricing.yaml: hourly refresh for LiteLLM and models.dev; serialize the models.dev updater after LiteLLM; run models.dev refresh even if the LiteLLM job fails (always()).
- Configure git auth with gh auth setup-git for pushes using GH_TOKEN; disable persisted checkout credentials; job-scoped permissions.
- Open PRs only when pricing changes; skip lock-only churn.

^{Written for commit 70dca9e. Summary will update on new commits.}

Summary by CodeRabbit

New Features
- Offline embedded models.dev pricing snapshot for reliable local access to model pricing and context limits.
- Improved offline resolution so previously-unavailable Anthropic models can be resolved without network access.
- Build now includes the generated models.dev pricing snapshot so it’s available at runtime.
Chores
- Pricing refresh workflow runs hourly and adds an automated job to regenerate, validate, and publish the models.dev pricing snapshot (suppresses PR noise when unchanged).
- CI step added to ensure git auth before pushing automation branches; added local commands to regenerate the snapshot.

ccusage prices models from the embedded LiteLLM snapshot plus a runtime models.dev fallback that was only consulted when online. Newly released Anthropic models (e.g. claude-fable-5) ship on models.dev before LiteLLM publishes them, so they could not be priced offline at all. Pin the models.dev source as a flake input and reproducibly regenerate a compacted, Anthropic-only pricing snapshot from it. models.dev ships per-model TOML rather than a prebuilt catalog, so the snapshot is built with the project's own generateCatalog routine (Bun + the zero-dependency remeda/zod vendored from the pinned bun.lock hashes) and then trimmed to the pricing fields ccusage consumes. The result is committed to the repo and embedded via include_str!, so every platform (Nix and plain cargo on macOS/Windows) ships identical, pinned data with no build-time network access. At runtime the embedded models.dev data is kept as a separate fallback map, consulted only when the primary table misses, so it never perturbs the primary table's fuzzy alias matching. Unlike the network source it stays available offline. - flake.nix/flake.lock: pin anomalyco/models.dev input - nix/models-dev-gen.ts + nix/models-dev-pricing.nix: reproducible generator, exposed as the .#models-dev-pricing package - justfile: gen-models-dev-pricing / update-models-dev-pricing recipes - .github/workflows/update-pricing.yaml: refresh LiteLLM and models.dev snapshots hourly, each opening its own PR when the pricing actually changes - pricing.rs: embed the committed snapshot and resolve it as an offline-capable fallback, with tests covering offline resolution of models.dev-only models

coderabbitai · 2026-06-09T21:52:54Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 31e42fc6-8376-4278-b753-530dd06029e9

📥 Commits

Reviewing files that changed from the base of the PR and between ab1eb97 and 70dca9e.

📒 Files selected for processing (1)

.github/workflows/update-pricing.yaml

🚧 Files skipped from review as they are similar to previous changes (1)

.github/workflows/update-pricing.yaml

📝 Walkthrough

Walkthrough

Adds a build-time models.dev pricing snapshot: generator script, Nix derivation and packaging, CI job to update the flake and snapshot, Just recipes, and Rust integration that uses the embedded snapshot as an offline pricing fallback with tests.

Changes

Embedded models-dev pricing fallback for offline resolution

Layer / File(s)	Summary
Flake input and workflow trigger updates `flake.nix`, `.github/workflows/update-pricing.yaml`	Add `models-dev` non-flake input under `inputs.litellm`, update workflow to run hourly, and tighten permissions scoping.
Automated snapshot update job `.github/workflows/update-pricing.yaml`	Add `update-models-dev-pricing` job that updates the `models-dev` flake input, regenerates the snapshot, validates via `nix develop --command just check`, and creates/updates an automation PR with `flake.lock` and the snapshot. Includes `gh auth setup-git` step.
Models-dev Nix module & build `nix/models-dev-pricing.nix`, `nix/packages.nix`	Nix module vendors pinned npm deps and runs the TypeScript generator via Bun to produce a deterministic `models-dev-pricing.json` derivation; package exposed under `perSystem`.
Generator script `nix/models-dev-gen.ts`	Generator filters Anthropic/Claude models by id/key, emits minimized runtime pricing entries, stable-sorts keys, and writes pretty JSON to `OUTFILE`.
Packaging, source inclusion, and recipes `package.nix`, `justfile`	Include `models-dev-pricing.json` in crate sources; add `gen-models-dev-pricing` and `update-models-dev-pricing` `just` recipes to regenerate and update the committed snapshot.
PricingMap embedded snapshot fallback and resolution `rust/crates/ccusage/src/pricing.rs`	Embed snapshot as build-time constant, add `enable_embedded_models_dev_fallback` flag, extend `find()` and `context_limit()` with a third lookup stage for the embedded snapshot, add lazy-loading helper, and include parseability and offline-resolution tests.

Sequence Diagram(s)

sequenceDiagram
  participant GH as GitHub Actions (update-models-dev-pricing)
  participant Flake as `flake.nix` / nix flake update
  participant NixBuild as `nix/models-dev-pricing.nix` (runCommand)
  participant Generator as `nix/models-dev-gen.ts`
  participant Repo as Repository (rust/crates/ccusage/src/models-dev-pricing.json)
  participant PR as GitHub (automation/models-dev-pricing PR)

  GH->>Flake: run `nix flake update models-dev`
  GH->>NixBuild: build `models-dev-pricing` derivation
  NixBuild->>Generator: execute generator (Bun) with OUTFILE
  Generator->>Repo: write models-dev-pricing.json (build output)
  GH->>Repo: copy output into repo and run `just check`
  GH->>PR: create/update automation/models-dev-pricing PR if files changed

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

ryoppippi/ccusage#1186: CI workflow changes to update-pricing scheduling and automation that overlap with this PR’s workflow edits.
ryoppippi/ccusage#1185: Adds a lazy runtime models.dev fallback; closely related to embedded vs runtime models.dev fallback behavior.
ryoppippi/ccusage#1061: Modifies embedded/offline pricing logic in pricing.rs, overlapping resolution behavior changes.

Suggested reviewers

pullfrog

Poem

🐰
I hopped through flakes and Bun’s bright light,
Collected Claude prices late at night,
Sorted keys neat, wrote JSON with care,
Embedded a snapshot to carry us there,
Offline we hum, with prices to spare.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 71.43% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately captures the main change: embedding a models.dev pricing snapshot for offline pricing functionality. It is concise, specific, and directly reflects the core objective of the PR.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/embed-models-dev-pricing

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cloudflare-workers-and-pages · 2026-06-09T21:53:05Z

Deploying with Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status	Name	Latest Commit	Preview URL	Updated (UTC)
✅ Deployment successful! View logs	ccusage-guide	`70dca9e`	Commit Preview URL Branch Preview URL	Jun 09 2026, 10:48 PM

pullfrog

Reviewed changes — embeds a pinned, generated models.dev pricing snapshot as a third fallback tier in PricingMap, enabling offline resolution of newly released Anthropic models that aren't yet in LiteLLM.

Embedded models.dev snapshot — committed at rust/crates/ccusage/src/models-dev-pricing.json, embedded via include_str!, and consulted by find()/context_limit() after the primary table and the network models.dev cache.
Reproducible generator — nix/models-dev-gen.ts imports models.dev's own generateCatalog, filters to Anthropic models, and outputs compacted JSON with stable key ordering. nix/models-dev-pricing.nix vendors remeda/zod by hash for reproducibility.
Runtime fallback chain — find() now chains through the embedded snapshot with its own enable_embedded_models_dev_fallback flag, keeping the embedded map separate so it never interferes with the primary table's fuzzy alias matching.
Hourly automation — update-pricing.yaml gains a update-models-dev-pricing job alongside the existing LiteLLM job, with per-job permissions scoping.
Tests — three new tests cover snapshot parseability, offline fallback for models only in the embedded snapshot, and resolution of claude-fable-5 specifically.

ℹ️ No critical issues — one minor suggestion inline.

^{｜ Fix all ➔ ｜ Fix 👍s ➔ ｜ View workflow run ｜ Using DeepSeek Pro (free via Pullfrog for OSS) ｜ 𝕏}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

.github/workflows/update-pricing.yaml (1)
76-133: ⚖️ Poor tradeoff

Consider potential race condition between parallel jobs.

Both update-pricing and update-models-dev-pricing jobs update flake.lock (different inputs) and can run simultaneously. While --force-with-lease and different branch names reduce the risk, concurrent updates to flake.lock could cause one job to fail if they finish within the same minute.

Impact is minimal since the failed job will retry hourly, but you could serialize them using needs: if deterministic execution order is preferred.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/update-pricing.yaml around lines 76 - 133, Two jobs can
race updating flake.lock: serialize the jobs by adding a dependency so they
don't run in parallel; specifically add a needs: reference on the
update-models-dev-pricing job (or on update-pricing depending which should run
first) so GitHub Actions will wait for the prior job to finish before starting
the other, e.g., make update-models-dev-pricing depend on update-pricing (use
the job name update-pricing in needs) to prevent concurrent flake.lock edits and
eliminate the --force-with-lease collision risk.
nix/models-dev-gen.ts (1)
79-79: 💤 Low value

Consider adding OUTFILE validation for better error messaging.

While the non-null assertion is safe in the Nix build context (where OUTFILE is always set), adding an explicit check would provide a clearer error if the script is accidentally run outside Nix:
+const outfile = process.env.OUTFILE;
+if (!outfile) {
+  throw new Error('OUTFILE environment variable is required');
+}
-await Bun.write(process.env.OUTFILE!, `${JSON.stringify(sortObject(out), null, 2)}\n`);
+await Bun.write(outfile, `${JSON.stringify(sortObject(out), null, 2)}\n`);
However, since this is strictly a Nix-invoked build script and any failure would be caught at build time, the current approach is acceptable.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@nix/models-dev-gen.ts` at line 79, Replace the non-null assertion on
process.env.OUTFILE before calling Bun.write with an explicit validation: check
that process.env.OUTFILE is defined (and optionally non-empty) and if not, throw
or log a clear error and exit (so the failure message explains OUTFILE is
missing), then call Bun.write with process.env.OUTFILE; reference the OUTFILE
env var, the Bun.write(...) call, and the sortObject(out) usage when making the
change.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/update-pricing.yaml:
- Line 17: The checkout steps currently use "uses:
actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10" without disabling
persisted credentials; update both checkout actions (the ones using
actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10) to include the input
"persist-credentials: false" under their step configuration so the runner will
not leak git credentials (add the single key/value to each checkout step).

---

Nitpick comments:
In @.github/workflows/update-pricing.yaml:
- Around line 76-133: Two jobs can race updating flake.lock: serialize the jobs
by adding a dependency so they don't run in parallel; specifically add a needs:
reference on the update-models-dev-pricing job (or on update-pricing depending
which should run first) so GitHub Actions will wait for the prior job to finish
before starting the other, e.g., make update-models-dev-pricing depend on
update-pricing (use the job name update-pricing in needs) to prevent concurrent
flake.lock edits and eliminate the --force-with-lease collision risk.

In `@nix/models-dev-gen.ts`:
- Line 79: Replace the non-null assertion on process.env.OUTFILE before calling
Bun.write with an explicit validation: check that process.env.OUTFILE is defined
(and optionally non-empty) and if not, throw or log a clear error and exit (so
the failure message explains OUTFILE is missing), then call Bun.write with
process.env.OUTFILE; reference the OUTFILE env var, the Bun.write(...) call, and
the sortObject(out) usage when making the change.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 51ec09c2-9484-4503-8e7d-5c686e76ca50

📥 Commits

Reviewing files that changed from the base of the PR and between fae44a5 and cbc9ec5.

⛔ Files ignored due to path filters (1)

flake.lock is excluded by !**/*.lock

📒 Files selected for processing (9)

.github/workflows/update-pricing.yaml
flake.nix
justfile
nix/models-dev-gen.ts
nix/models-dev-pricing.nix
nix/packages.nix
package.nix
rust/crates/ccusage/src/models-dev-pricing.json
rust/crates/ccusage/src/pricing.rs

pkg-pr-new · 2026-06-09T22:01:53Z

Open in StackBlitz

ccusage

npx https://pkg.pr.new/ccusage@1242

@ccusage/ccusage-darwin-arm64

npx https://pkg.pr.new/@ccusage/ccusage-darwin-arm64@1242

@ccusage/ccusage-darwin-x64

npx https://pkg.pr.new/@ccusage/ccusage-darwin-x64@1242

@ccusage/ccusage-linux-arm64

npx https://pkg.pr.new/@ccusage/ccusage-linux-arm64@1242

@ccusage/ccusage-linux-x64

npx https://pkg.pr.new/@ccusage/ccusage-linux-x64@1242

@ccusage/ccusage-win32-arm64

npx https://pkg.pr.new/@ccusage/ccusage-win32-arm64@1242

@ccusage/ccusage-win32-x64

npx https://pkg.pr.new/@ccusage/ccusage-win32-x64@1242

commit: 70dca9e

cubic-dev-ai

2 issues found across 10 files

_{Reply with feedback, questions, or to request a fix.

Re-trigger cubic}

The update-models-dev-pricing recipe used just dependencies for generation and validation, which run before the recipe body. That regenerated and checked against the old models-dev lock, then updated only flake.lock afterward. Run the models.dev input update first, then regenerate the committed snapshot and run the normal check. Also make the claude-fable-5 offline pricing test assert that the embedded snapshot actually contains the target model instead of silently returning.

github-actions · 2026-06-09T22:04:05Z

ccusage performance comparison

PR SHA: cbc9ec533b53
Base SHA: fae44a52f183

This compares the PR package against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package	SHA	Execution setup	Bunx temp cache	Bunx warm median	Warm samples
Base pkg.pr.new	`fae44a52f183`	1.508s	863.2ms	30.0ms	3
PR pkg.pr.new	`cbc9ec5`	1.261s	1.143s	30.5ms	3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: fae44a52f183; PR package: cbc9ec5. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`bunx -p <pkg> ccusage claude --offline --json`	1.01 GiB	546.7ms	549.0ms	1.00x	315.95 MiB	319.20 MiB	1.01x	1.84 GiB/s	1.83 GiB/s
`bunx -p <pkg> ccusage codex --offline --json`	1.01 GiB	368.7ms	362.1ms	1.02x	82.33 MiB	69.70 MiB	0.85x	2.73 GiB/s	2.78 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command	Runtime	Input	Median	Throughput	Samples
`claude --offline --json`	Package wrapper	1.01 GiB	542.1ms	1.86 GiB/s	1
`claude --offline --json`	Installed native binary	1.01 GiB	525.2ms	1.92 GiB/s	1
`codex --offline --json`	Package wrapper	1.01 GiB	362.5ms	2.78 GiB/s	1
`codex --offline --json`	Installed native binary	1.01 GiB	335.5ms	3.00 GiB/s	1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude daily --offline --json`	29.7ms	29.1ms	1.02x	43.61 MiB	43.48 MiB	1.00x	0.05 MiB/s	0.05 MiB/s
`claude session --offline --json`	30.0ms	29.8ms	1.00x	43.61 MiB	43.48 MiB	1.00x	0.05 MiB/s	0.05 MiB/s
`codex daily --offline --json`	29.0ms	28.9ms	1.00x	43.48 MiB	43.48 MiB	1.00x	0.03 MiB/s	0.03 MiB/s
`codex session --offline --json`	29.2ms	28.9ms	1.01x	43.61 MiB	43.48 MiB	1.00x	0.03 MiB/s	0.03 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude --offline --json`	1.01 GiB	554.4ms	582.2ms	0.95x	322.33 MiB	320.20 MiB	0.99x	1.82 GiB/s	1.73 GiB/s
`codex --offline --json`	1.01 GiB	384.8ms	374.2ms	1.03x	82.70 MiB	70.70 MiB	0.85x	2.62 GiB/s	2.69 GiB/s

Artifact size

Artifact	Base	PR	Delta	Ratio
packed `ccusage-*.tgz`	17.30 KiB	17.30 KiB	+0.00 KiB	1.00x
installed native package binary	3353.74 KiB	3417.74 KiB	+64.00 KiB	0.98x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

github-actions · 2026-06-09T22:04:23Z

ccusage performance comparison

PR SHA: cbc9ec533b53
Base SHA: fae44a52f183

This compares the Rust PR release binary against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package	SHA	Execution setup	Bunx temp cache	Bunx warm median	Warm samples
Base pkg.pr.new	`fae44a52f183`	765.8ms	636.1ms	32.0ms	3
PR pkg.pr.new	`cbc9ec5`	858.5ms	673.5ms	32.7ms	3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: fae44a52f183; PR package: cbc9ec5. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`bunx -p <pkg> ccusage claude --offline --json`	1.01 GiB	572.4ms	548.0ms	1.04x	342.20 MiB	326.83 MiB	0.96x	1.76 GiB/s	1.84 GiB/s
`bunx -p <pkg> ccusage codex --offline --json`	1.01 GiB	367.8ms	371.2ms	0.99x	82.45 MiB	79.70 MiB	0.97x	2.74 GiB/s	2.71 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command	Runtime	Input	Median	Throughput	Samples
`claude --offline --json`	Package wrapper	1.01 GiB	551.7ms	1.82 GiB/s	1
`claude --offline --json`	Installed native binary	1.01 GiB	533.6ms	1.89 GiB/s	1
`codex --offline --json`	Package wrapper	1.01 GiB	362.8ms	2.77 GiB/s	1
`codex --offline --json`	Installed native binary	1.01 GiB	336.1ms	3.00 GiB/s	1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude daily --offline --json`	29.5ms	4.0ms	7.31x	43.61 MiB	2.70 MiB	0.06x	0.05 MiB/s	0.38 MiB/s
`claude session --offline --json`	30.4ms	4.1ms	7.46x	43.48 MiB	2.70 MiB	0.06x	0.05 MiB/s	0.38 MiB/s
`codex daily --offline --json`	29.6ms	3.8ms	7.84x	43.48 MiB	2.70 MiB	0.06x	0.03 MiB/s	0.23 MiB/s
`codex session --offline --json`	29.3ms	3.9ms	7.58x	43.48 MiB	2.70 MiB	0.06x	0.03 MiB/s	0.22 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude --offline --json`	1.01 GiB	556.0ms	529.6ms	1.05x	338.70 MiB	318.95 MiB	0.94x	1.81 GiB/s	1.90 GiB/s
`codex --offline --json`	1.01 GiB	367.0ms	337.2ms	1.09x	70.70 MiB	71.08 MiB	1.01x	2.74 GiB/s	2.99 GiB/s

Artifact size

Artifact	Base	PR	Delta	Ratio
packed `ccusage-*.tgz`	17.30 KiB	17.30 KiB	+0.00 KiB	1.00x
installed native package binary	3353.74 KiB	3417.74 KiB	+64.00 KiB	0.98x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

ryoppippi · 2026-06-09T22:04:59Z

@coderabbitai review
@cubic-dev-ai review

cubic-dev-ai · 2026-06-09T22:05:09Z

@coderabbitai review
@cubic-dev-ai review

@ryoppippi I have started the AI code review. It will take a few minutes to complete.

Address reviewer feedback on the pricing refresh workflow. Disable persisted checkout credentials because the jobs configure git identity explicitly and use GH_TOKEN only for pull-request operations. Serialize the models.dev refresh after the LiteLLM refresh to avoid simultaneous flake.lock update jobs, and validate OUTFILE before writing the generated models.dev pricing snapshot so local/manual failures are explicit.

ryoppippi · 2026-06-09T22:06:41Z

@coderabbitai review
@cubic-dev-ai review

cubic-dev-ai · 2026-06-09T22:06:50Z

@coderabbitai review
@cubic-dev-ai review

@ryoppippi I have started the AI code review. It will take a few minutes to complete.

coderabbitai · 2026-06-09T22:07:23Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Read the main Rust pricing file
cat rust/crates/ccusage/src/pricing.rs

echo "=== models-dev-gen.ts ==="
cat nix/models-dev-gen.ts

echo "=== models-dev-pricing.nix ==="
cat nix/models-dev-pricing.nix

echo "=== update-pricing.yaml ==="
cat .github/workflows/update-pricing.yaml

echo "=== justfile relevant section ==="
cat justfile

echo "=== nix/packages.nix ==="
cat nix/packages.nix

echo "=== package.nix ==="
cat package.nix

echo "=== flake.nix relevant section ==="
cat flake.nix

Length of output: 92594

@ryoppippi: Clean, well-thought-out PR. The three-tier fallback design (primary → network models.dev → embedded) is sound, the fuzzy-match isolation is correct (embedded map uses find_entry not find, preventing recursion), and the automation avoids lock-only PR noise. A few issues worth addressing:

`rust/crates/ccusage/src/pricing.rs`

Silently ignored parse failure in embedded_models_dev_pricing

fn embedded_models_dev_pricing() -> &'static PricingMap {
    static EMBEDDED_MODELS_DEV_PRICING: OnceLock<PricingMap> = OnceLock::new();
    EMBEDDED_MODELS_DEV_PRICING.get_or_init(|| {
        let mut map = PricingMap::default();
        map.load_models_dev_json_missing(BUILD_TIME_MODELS_DEV_JSON);  // return value ignored
        map
    })
}

load_models_dev_json_missing returns Option<usize> — None on parse failure. Because BUILD_TIME_MODELS_DEV_JSON is embedded at compile time and validated by the embedded_models_dev_snapshot_is_parseable test, a parse failure at runtime would be a bug (corrupted binary or schema change). Consider panicking with a descriptive message so such bugs are caught early rather than manifesting as silent zero-cost lookups:

map.load_models_dev_json_missing(BUILD_TIME_MODELS_DEV_JSON)
    .expect("embedded models-dev-pricing.json failed to parse; this is a build-time bug");

Fragile test dependency on claude-fable-5

#[test]
fn offline_prices_new_anthropic_model_from_embedded_models_dev() {
    assert!(embedded_models_dev_pricing()
        .find_entry("claude-fable-5")
        .is_some());
    ...
}

This test will start failing as soon as models.dev removes or renames claude-fable-5. The existing offline_resolves_models_only_in_embedded_models_dev test already covers the general case (any model present in embedded but absent from the primary table). Consider either removing the claude-fable-5 assertion or adding a comment explaining it's expected to be updated when the snapshot is regenerated.

`nix/models-dev-gen.ts`

Non-null assertion on process.env.OUTFILE

await Bun.write(process.env.OUTFILE!, `${JSON.stringify(sortObject(out), null, 2)}\n`);

In a Nix build OUTFILE is always set, but a plain bun run nix/models-dev-gen.ts from outside Nix will throw an opaque TypeError: Argument must be a string or Blob. A guard at the top improves the developer experience:

const outfile = process.env.OUTFILE;
if (!outfile) {
  throw new Error('OUTFILE environment variable is required (set by the Nix build or just gen-models-dev-pricing)');
}
// ...
await Bun.write(outfile, `${JSON.stringify(sortObject(out), null, 2)}\n`);

`.github/workflows/update-pricing.yaml`

Hourly cron with two jobs touching flake.lock

Both jobs run concurrently on the same schedule and both call nix flake update on the checked-out main HEAD. While they push to separate branches (automation/litellm-pricing vs automation/models-dev-pricing), each job reads the current flake.lock independently, so a concurrent run can produce two PRs with inconsistent lockfile states. This is unlikely to cause real harm (each PR is self-contained and validated), but it's worth noting. Adding concurrency groups per job would make the intent explicit:

jobs:
  update-pricing:
    concurrency:
      group: update-litellm-pricing
      cancel-in-progress: true
    ...
  update-models-dev-pricing:
    concurrency:
      group: update-models-dev-pricing
      cancel-in-progress: true
    ...

`nix/models-dev-pricing.nix`

remeda/zod hashes must be kept in sync with upstream bun.lock

The comment "Keep these in sync with the matching entries in ${modelsDevSrc}/bun.lock whenever the input is bumped" is good — but the update-models-dev-pricing workflow and just update-models-dev-pricing recipe don't update these hashes automatically. A failed nix build .#models-dev-pricing after bumping the input is the only signal. Consider adding a brief comment in the justfile recipe or workflow step to remind maintainers to re-check the hashes on build failure.

Minor / nits

nix/models-dev-gen.ts line const { providers } = (await generateCatalog('.')) as { providers: ... } — the cast is unvalidated; a shape mismatch silently produces an empty snapshot. A runtime check on providers being a non-null object before iterating would make failures explicit.
package.nix source filter: the models-dev-pricing.json suffix check uses lib.hasSuffix which correctly matches rust/crates/ccusage/src/models-dev-pricing.json. ✓

coderabbitai · 2026-06-09T22:08:59Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Read the main files for review
echo "=== nix/models-dev-gen.ts ==="
cat nix/models-dev-gen.ts

echo ""
echo "=== nix/models-dev-pricing.nix ==="
cat nix/models-dev-pricing.nix

echo ""
echo "=== rust/crates/ccusage/src/pricing.rs ==="
cat rust/crates/ccusage/src/pricing.rs

echo ""
echo "=== .github/workflows/update-pricing.yaml ==="
cat .github/workflows/update-pricing.yaml

echo ""
echo "=== justfile (relevant recipes) ==="
cat justfile

echo ""
echo "=== nix/packages.nix ==="
cat nix/packages.nix

echo ""
echo "=== package.nix ==="
cat package.nix

echo ""
echo "=== flake.nix (head) ==="
head -60 flake.nix

Length of output: 93054

@ryoppippi: Here is the review for PR #1242.

Review: `feat(pricing): embed models.dev snapshot for offline pricing`

The overall design is clean — a separate embedded fallback map that never participates in fuzzy alias matching is the right call, and the 3-tier resolution chain (primary → network models.dev → embedded models.dev) is easy to follow. The Nix derivation for reproducible, offline-capable generation is well-structured. A few items worth addressing:

🔴 Issue — `update-models-dev-pricing` unnecessarily blocks on `update-pricing`

File: .github/workflows/update-pricing.yaml

update-models-dev-pricing:
  needs: update-pricing   # ← blocks even on unrelated LiteLLM failures

Both jobs check out main independently, update different flake inputs (litellm vs models-dev), and push to different branches (automation/litellm-pricing vs automation/models-dev-pricing). There is no shared mutable state between them, so the needs: constraint means a transient LiteLLM outage silently prevents the models.dev snapshot from being refreshed. Consider removing the dependency to let the jobs run in parallel:

update-models-dev-pricing:
  runs-on: ubuntu-24.04-arm
  permissions:
    contents: write
    pull-requests: write
  steps:
    ...

If sequential execution is desired for rate-limiting reasons, using concurrency: at the workflow level is a more explicit way to express that.

🟡 Issue — SRI hashes in `models-dev-pricing.nix` require silent manual updates

File: nix/models-dev-pricing.nix

# Keep these in sync with the matching entries in `${modelsDevSrc}/bun.lock`
# whenever the input is bumped.
remeda = pkgs.fetchurl {
  url = "https://registry.npmjs.org/remeda/-/remeda-2.33.7.tgz";
  hash = "sha512-cXlyjevWx5AcslOUEETG4o8XYi9UkoCXcJmj7XhPFVbla+...";
};

When nix flake update models-dev bumps the input (e.g. for a new remeda/zod version), the hashes will silently go stale and the Nix build will fail with a hash mismatch. Options to make this more robust:

Derive the hashes from modelsDevSrc at evaluation time — read them out of ${modelsDevSrc}/bun.lock using builtins.fromJSON / a small Nix expression instead of hardcoding them. This keeps them auto-synced.
Or add a nix/update-models-dev-deps.sh helper that re-extracts the hashes and is called from update-models-dev-pricing, and document it clearly.
Minimum viable: add a CI check that verifies the hashes match bun.lock, so a stale hash is caught early rather than at user build time.

🟡 Issue — `persist-credentials: false` with bare `git push`

File: .github/workflows/update-pricing.yaml (both jobs)

- uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10
  with:
    persist-credentials: false   # removes the default credential helper
...
- run: git push --force-with-lease origin "$branch"  # ← no credentials configured

With persist-credentials: false, the default GITHUB_TOKEN-backed credential helper is removed. The bare git push will fail unless credentials are re-established somewhere. If .github/actions/setup-nix calls gh auth setup-git, that is sufficient — but it should be explicitly documented. If it does not, adding an explicit step is needed:

- name: Configure git credentials
  run: gh auth setup-git
  env:
    GH_TOKEN: ${{ github.token }}

🟡 Issue — `load_models_dev_json_missing` iterates `into_values()`, losing provider keys

File: rust/crates/ccusage/src/pricing.rs, line ~278

for provider in raw.into_values() {
    for (model_key, model) in provider.models {
        let model_id = model.id.unwrap_or(model_key);
        // stored under model_id only

The provider identifier (e.g. "anthropic") is dropped. For entries where model.id is absent (so model_id = model_key), provider-namespaced keys like anthropic/claude-fable-5 that appear in models.dev won't be stored under their namespaced form — they'll only be found as claude-fable-5. This is probably intentional given the Anthropic-first scope, but it means users who log usage with provider-prefixed model names will get a miss from the embedded fallback. A comment confirming this is deliberate would help:

// Provider namespace is intentionally dropped here; ccusage looks up models
// by their canonical id (e.g. "claude-fable-5"), not by provider-prefixed keys
// (e.g. "anthropic/claude-fable-5"). The network fallback's fuzzy matching
// handles provider prefixes; the embedded snapshot does not.
for provider in raw.into_values() {

🟢 Nit — `gen-models-dev-pricing` copies from a Nix store path; the `--no-link` flag is implicit

File: justfile

gen-models-dev-pricing:
    cp "$(nix build .#models-dev-pricing --no-link --print-out-paths)" \
       rust/crates/ccusage/src/models-dev-pricing.json
    chmod u+w rust/crates/ccusage/src/models-dev-pricing.json
    nix fmt rust/crates/ccusage/src/models-dev-pricing.json

The sequence is correct. One note: nix build with --print-out-paths prints the path of the output file (which is the JSON itself since the derivation's $out is the file). cp here works because the derivation writes directly to $out. This is fine, but a brief comment would aid future readers who expect $out to be a directory.

🟢 Nit — `KEEP` regex also matches non-Anthropic providers

File: nix/models-dev-gen.ts

const KEEP = /claude|anthropic/i;

anthropic will also match hypothetical third-party providers whose name contains that string (e.g. "not-anthropic" would match). For a snapshot that only needs to cover Anthropic/Claude models this is fine in practice, but tightening to /^(claude|anthropic)/i or using word boundaries would be more precise.

✅ Positive observations

The three-tier find() chain with fallback flags is clean and makes the offline-vs-network distinction explicit.
embedded_models_dev_pricing() correctly initialises a plain PricingMap::default() (no flags set) and queries it via find_entry() — no risk of infinite recursion or double-fallback.
load_models_dev_json_missing correctly skips entries already present in the primary map (if self.entries.contains_key(&model_id) { continue; }), preserving LiteLLM precedence.
Stable key ordering (sortObject) in the generator keeps diffs minimal — good for automated PRs.
force-with-lease over force in the workflow — safe push pattern.
Per-job permissions scoped to minimum required (contents: write, pull-requests: write).

CodeRabbit pointed out that the embedded models.dev fallback ignored parse failures from the compile-time snapshot. Treat that as a build-time invariant violation instead of silently returning an empty fallback map. Keep the claude-fable-5 offline test because it is the concrete motivation for this PR, but make the assertion failure explain that the pinned models.dev snapshot is expected to contain it.

ryoppippi · 2026-06-09T22:10:22Z

@coderabbitai Addressed the valid runtime invariant from your latest analysis in 85c285b: embedded models.dev snapshot parse failures now fail fast instead of silently producing an empty fallback map. The OUTFILE validation and workflow serialization items were already fixed in ed99589. I kept the claude-fable-5 assertion intentionally because it is the concrete regression target for this PR, and added a clear assertion message. Validation: focused cargo test for embedded_models_dev, just fmt, and pre-push clippy/treefmt/gitleaks/cargo test passed.

ryoppippi · 2026-06-09T22:10:22Z

@coderabbitai review
@cubic-dev-ai review

cubic-dev-ai · 2026-06-09T22:10:32Z

@coderabbitai review
@cubic-dev-ai review

@ryoppippi I have started the AI code review. It will take a few minutes to complete.

coderabbitai · 2026-06-09T22:11:04Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify the fixes in the two commits
git show 85c285b6 --stat
echo "---"
git show 85c285b6 -- rust/crates/ccusage/src/pricing.rs | head -80

echo "==========================="
git show ed995893 --stat
echo "---"
git show ed995893 -- .github/workflows/update-pricing.yaml | head -100
git show ed995893 -- nix/models-dev-gen.ts | head -50

Length of output: 6674

@ryoppippi: Confirmed all three fixes via the diffs:

85c285b — .expect("embedded models-dev-pricing.json must parse") correctly converts a silent no-op into a build-time invariant violation. The claude-fable-5 assertion message clearly communicates its intent as a regression anchor. ✓
ed99589 — persist-credentials: false added to both checkout steps, needs: update-pricing serialises the jobs (accepted tradeoff over full parallelism), and the OUTFILE guard is explicit and correct. ✓

Everything looks good. Thanks for the thorough follow-up!

(ᵔ◡ᵔ)🐇

If you found this review helpful, would you consider giving us a shout-out on X?

Thank you for using CodeRabbit!

pullfrog

✅ No new issues found.

Reviewed changes — the two follow-up commits harden CI, fix recipe ordering, validate env vars, and strengthen tests.

CI hardening — both checkout steps now use persist-credentials: false; update-models-dev-pricing is serialized after update-pricing via needs:; the LiteLLM job captures pre-update JSON to compare content and skip lock-only churn.
Recipe ordering fix — update-models-dev-pricing now runs nix flake update models-dev before just gen-models-dev-pricing and just check.
OUTFILE validation — nix/models-dev-gen.ts replaces process.env.OUTFILE! with an explicit null/empty check and a descriptive error.
Test assertion strengthening — offline_prices_new_anthropic_model_from_embedded_models_dev now asserts the entry exists instead of conditionally returning.

^{｜ View workflow run ｜ Using DeepSeek Pro (free via Pullfrog for OSS) ｜ 𝕏}

coderabbitai · 2026-06-09T22:11:58Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "=== rust/crates/ccusage/src/pricing.rs ==="
cat rust/crates/ccusage/src/pricing.rs

echo ""
echo "=== nix/models-dev-gen.ts ==="
cat nix/models-dev-gen.ts

echo ""
echo "=== .github/workflows/update-pricing.yaml ==="
cat .github/workflows/update-pricing.yaml

echo ""
echo "=== justfile ==="
cat justfile

echo ""
echo "=== nix/models-dev-pricing.nix ==="
cat nix/models-dev-pricing.nix

echo ""
echo "=== rust/crates/ccusage/src/models-dev-pricing.json (first 50 lines) ==="
head -50 rust/crates/ccusage/src/models-dev-pricing.json 2>/dev/null || echo "File not found"

Length of output: 89893

@ryoppippi: The two confirmed commits look good.

Addressed ✅

embedded_models_dev_pricing() now .expect("embedded models-dev-pricing.json must parse") — fail-fast is correct.
OUTFILE guard in nix/models-dev-gen.ts now throws a typed, descriptive error.
claude-fable-5 assertion now carries a clear message — the intent is self-documenting.
needs: update-pricing keeping sequential execution is a valid resolution of the concurrent flake.lock concern I raised; the two jobs now can't race on the same main HEAD.

One remaining question — persist-credentials: false + bare git push

Both jobs still check out with persist-credentials: false and then run git push --force-with-lease origin "$branch" inside the "Create pull request" step (where GH_TOKEN is set as an env var). GH_TOKEN is consumed by the gh CLI for PR operations, but plain git push over HTTPS won't use it unless gh auth setup-git (or an equivalent credential helper configuration) has been called first.

If .github/actions/setup-nix calls gh auth setup-git or sets up a credential helper, this is fine — could you confirm? If it doesn't, an explicit step is needed:

- name: Configure git credentials
  run: gh auth setup-git
  env:
    GH_TOKEN: ${{ github.token }}

Remaining low-priority items (not blocking, can be addressed in follow-up)

SRI hashes in nix/models-dev-pricing.nix — still require manual sync when nix flake update models-dev bumps the input; no automation was added. The existing comment is sufficient for now, but a follow-up to automate hash extraction would remove the silent build-failure footgun.
Provider key comment in load_models_dev_json_missing — dropping the provider namespace is intentional (Anthropic-first, find() handles prefix variants via fuzzy matching) but undocumented. Minor nit.
KEEP = /claude|anthropic/i — still matches substrings of "not-anthropic"; tightening to /\b(claude|anthropic)\b/i or anchoring is cosmetic but would be more precise.

Overall the PR is in solid shape — the three-tier fallback design, test coverage, and automation are all clean. Just need to confirm the credential helper question before merging.

pullfrog

✅ No new issues found.

Reviewed changes — a single follow-up commit makes the embedded snapshot parse failure a hard error and adds a descriptive assertion message.

Fail-fast on parse errors — embedded_models_dev_pricing() now calls .expect("...") on load_models_dev_json_missing so a corrupt committed snapshot halts immediately instead of being silently skipped.
Descriptive assertion — the claude-fable-5 resolution test uses assert!(..., "...") with a failure message instead of a bare assertion.

^{｜ View workflow run ｜ Using DeepSeek Pro (free via Pullfrog for OSS) ｜ 𝕏}

cubic-dev-ai

1 issue found across 10 files

_{Reply with feedback, questions, or to request a fix.

Re-trigger cubic}

github-actions · 2026-06-09T22:20:28Z

ccusage performance comparison

PR SHA: e7be91e324b6
Base SHA: fae44a52f183

This compares the Rust PR release binary against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package	SHA	Execution setup	Bunx temp cache	Bunx warm median	Warm samples
Base pkg.pr.new	`fae44a52f183`	708.1ms	645.7ms	31.5ms	3
PR pkg.pr.new	`e7be91e`	1.245s	901.7ms	31.1ms	3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: fae44a52f183; PR package: e7be91e. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`bunx -p <pkg> ccusage claude --offline --json`	1.01 GiB	550.3ms	558.2ms	0.99x	325.95 MiB	309.95 MiB	0.95x	1.83 GiB/s	1.80 GiB/s
`bunx -p <pkg> ccusage codex --offline --json`	1.01 GiB	366.6ms	370.2ms	0.99x	68.08 MiB	77.20 MiB	1.13x	2.75 GiB/s	2.72 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command	Runtime	Input	Median	Throughput	Samples
`claude --offline --json`	Package wrapper	1.01 GiB	557.5ms	1.81 GiB/s	1
`claude --offline --json`	Installed native binary	1.01 GiB	534.6ms	1.88 GiB/s	1
`codex --offline --json`	Package wrapper	1.01 GiB	368.2ms	2.73 GiB/s	1
`codex --offline --json`	Installed native binary	1.01 GiB	335.9ms	3.00 GiB/s	1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude daily --offline --json`	30.9ms	4.4ms	7.08x	43.73 MiB	2.70 MiB	0.06x	0.05 MiB/s	0.35 MiB/s
`claude session --offline --json`	30.5ms	4.3ms	7.16x	43.48 MiB	2.70 MiB	0.06x	0.05 MiB/s	0.36 MiB/s
`codex daily --offline --json`	29.6ms	3.8ms	7.72x	43.54 MiB	2.70 MiB	0.06x	0.03 MiB/s	0.22 MiB/s
`codex session --offline --json`	30.5ms	3.8ms	8.04x	43.61 MiB	2.70 MiB	0.06x	0.03 MiB/s	0.23 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude --offline --json`	1.01 GiB	563.2ms	525.5ms	1.07x	-	328.33 MiB	-	1.79 GiB/s	1.92 GiB/s
`codex --offline --json`	1.01 GiB	367.8ms	338.8ms	1.09x	81.70 MiB	79.70 MiB	0.98x	2.74 GiB/s	2.97 GiB/s

Artifact size

Artifact	Base	PR	Delta	Ratio
packed `ccusage-*.tgz`	17.30 KiB	17.30 KiB	+0.00 KiB	1.00x
installed native package binary	3353.74 KiB	3417.74 KiB	+64.00 KiB	0.98x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

github-actions · 2026-06-09T22:20:35Z

ccusage performance comparison

PR SHA: e7be91e324b6
Base SHA: fae44a52f183

This compares the PR package against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package	SHA	Execution setup	Bunx temp cache	Bunx warm median	Warm samples
Base pkg.pr.new	`fae44a52f183`	628.4ms	543.3ms	32.4ms	3
PR pkg.pr.new	`e7be91e`	742.3ms	513.0ms	32.6ms	3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: fae44a52f183; PR package: e7be91e. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`bunx -p <pkg> ccusage claude --offline --json`	1.01 GiB	573.0ms	569.6ms	1.01x	323.83 MiB	292.58 MiB	0.90x	1.76 GiB/s	1.77 GiB/s
`bunx -p <pkg> ccusage codex --offline --json`	1.01 GiB	369.0ms	374.9ms	0.98x	68.08 MiB	79.83 MiB	1.17x	2.73 GiB/s	2.69 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command	Runtime	Input	Median	Throughput	Samples
`claude --offline --json`	Package wrapper	1.01 GiB	558.9ms	1.80 GiB/s	1
`claude --offline --json`	Installed native binary	1.01 GiB	528.8ms	1.90 GiB/s	1
`codex --offline --json`	Package wrapper	1.01 GiB	378.3ms	2.66 GiB/s	1
`codex --offline --json`	Installed native binary	1.01 GiB	350.0ms	2.88 GiB/s	1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude daily --offline --json`	30.5ms	30.1ms	1.01x	43.48 MiB	43.61 MiB	1.00x	0.05 MiB/s	0.05 MiB/s
`claude session --offline --json`	29.8ms	30.1ms	0.99x	43.48 MiB	43.61 MiB	1.00x	0.05 MiB/s	0.05 MiB/s
`codex daily --offline --json`	30.3ms	30.6ms	0.99x	43.61 MiB	43.61 MiB	1.00x	0.03 MiB/s	0.03 MiB/s
`codex session --offline --json`	30.3ms	30.5ms	0.99x	43.48 MiB	43.61 MiB	1.00x	0.03 MiB/s	0.03 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude --offline --json`	1.01 GiB	584.2ms	564.6ms	1.03x	-	313.45 MiB	-	1.72 GiB/s	1.78 GiB/s
`codex --offline --json`	1.01 GiB	372.1ms	371.3ms	1.00x	82.20 MiB	74.08 MiB	0.90x	2.71 GiB/s	2.71 GiB/s

Artifact size

Artifact	Base	PR	Delta	Ratio
packed `ccusage-*.tgz`	17.30 KiB	17.30 KiB	+0.00 KiB	1.00x
installed native package binary	3353.74 KiB	3417.74 KiB	+64.00 KiB	0.98x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

github-actions · 2026-06-09T22:21:58Z

ccusage performance comparison

PR SHA: ed9958934534
Base SHA: fae44a52f183

This compares the PR package against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package	SHA	Execution setup	Bunx temp cache	Bunx warm median	Warm samples
Base pkg.pr.new	`fae44a52f183`	552.6ms	445.8ms	31.7ms	3
PR pkg.pr.new	`ed99589`	1.735s	726.9ms	33.0ms	3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: fae44a52f183; PR package: ed99589. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`bunx -p <pkg> ccusage claude --offline --json`	1.01 GiB	571.0ms	560.8ms	1.02x	319.70 MiB	316.45 MiB	0.99x	1.76 GiB/s	1.80 GiB/s
`bunx -p <pkg> ccusage codex --offline --json`	1.01 GiB	370.8ms	365.0ms	1.02x	73.95 MiB	82.45 MiB	1.11x	2.72 GiB/s	2.76 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command	Runtime	Input	Median	Throughput	Samples
`claude --offline --json`	Package wrapper	1.01 GiB	550.7ms	1.83 GiB/s	1
`claude --offline --json`	Installed native binary	1.01 GiB	526.5ms	1.91 GiB/s	1
`codex --offline --json`	Package wrapper	1.01 GiB	364.2ms	2.76 GiB/s	1
`codex --offline --json`	Installed native binary	1.01 GiB	341.7ms	2.95 GiB/s	1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude daily --offline --json`	29.0ms	30.1ms	0.96x	43.73 MiB	43.73 MiB	1.00x	0.05 MiB/s	0.05 MiB/s
`claude session --offline --json`	30.1ms	30.5ms	0.99x	43.48 MiB	-	-	0.05 MiB/s	0.05 MiB/s
`codex daily --offline --json`	29.3ms	29.1ms	1.01x	43.61 MiB	43.48 MiB	1.00x	0.03 MiB/s	0.03 MiB/s
`codex session --offline --json`	29.2ms	29.7ms	0.99x	43.61 MiB	43.48 MiB	1.00x	0.03 MiB/s	0.03 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude --offline --json`	1.01 GiB	566.9ms	538.8ms	1.05x	-	338.58 MiB	-	1.78 GiB/s	1.87 GiB/s
`codex --offline --json`	1.01 GiB	362.7ms	371.0ms	0.98x	72.20 MiB	77.33 MiB	1.07x	2.78 GiB/s	2.71 GiB/s

Artifact size

Artifact	Base	PR	Delta	Ratio
packed `ccusage-*.tgz`	17.30 KiB	17.30 KiB	+0.00 KiB	1.00x
installed native package binary	3353.74 KiB	3417.74 KiB	+64.00 KiB	0.98x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

github-actions · 2026-06-09T22:21:59Z

ccusage performance comparison

PR SHA: ed9958934534
Base SHA: fae44a52f183

This compares the Rust PR release binary against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package	SHA	Execution setup	Bunx temp cache	Bunx warm median	Warm samples
Base pkg.pr.new	`fae44a52f183`	1.091s	965.8ms	30.8ms	3
PR pkg.pr.new	`ed99589`	1.130s	728.1ms	32.3ms	3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: fae44a52f183; PR package: ed99589. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`bunx -p <pkg> ccusage claude --offline --json`	1.01 GiB	557.8ms	545.5ms	1.02x	315.20 MiB	351.33 MiB	1.11x	1.80 GiB/s	1.85 GiB/s
`bunx -p <pkg> ccusage codex --offline --json`	1.01 GiB	368.4ms	410.6ms	0.90x	82.08 MiB	78.08 MiB	0.95x	2.73 GiB/s	2.45 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command	Runtime	Input	Median	Throughput	Samples
`claude --offline --json`	Package wrapper	1.01 GiB	544.8ms	1.85 GiB/s	1
`claude --offline --json`	Installed native binary	1.01 GiB	519.0ms	1.94 GiB/s	1
`codex --offline --json`	Package wrapper	1.01 GiB	374.4ms	2.69 GiB/s	1
`codex --offline --json`	Installed native binary	1.01 GiB	342.5ms	2.94 GiB/s	1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude daily --offline --json`	29.3ms	4.0ms	7.29x	43.73 MiB	2.70 MiB	0.06x	0.05 MiB/s	0.38 MiB/s
`claude session --offline --json`	29.3ms	3.9ms	7.43x	43.48 MiB	2.70 MiB	0.06x	0.05 MiB/s	0.39 MiB/s
`codex daily --offline --json`	28.6ms	3.7ms	7.77x	43.61 MiB	2.70 MiB	0.06x	0.03 MiB/s	0.23 MiB/s
`codex session --offline --json`	28.0ms	3.6ms	7.81x	43.61 MiB	2.70 MiB	0.06x	0.03 MiB/s	0.24 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude --offline --json`	1.01 GiB	576.8ms	510.2ms	1.13x	317.20 MiB	333.45 MiB	1.05x	1.75 GiB/s	1.97 GiB/s
`codex --offline --json`	1.01 GiB	364.0ms	328.9ms	1.11x	79.45 MiB	77.70 MiB	0.98x	2.77 GiB/s	3.06 GiB/s

Artifact size

Artifact	Base	PR	Delta	Ratio
packed `ccusage-*.tgz`	17.30 KiB	17.30 KiB	+0.00 KiB	1.00x
installed native package binary	3353.74 KiB	3417.74 KiB	+64.00 KiB	0.98x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

github-actions · 2026-06-09T22:25:32Z

ccusage performance comparison

PR SHA: 85c285b65899
Base SHA: fae44a52f183

This compares the PR package against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package	SHA	Execution setup	Bunx temp cache	Bunx warm median	Warm samples
Base pkg.pr.new	`fae44a52f183`	1.083s	738.6ms	32.6ms	3
PR pkg.pr.new	`85c285b`	668.8ms	797.8ms	30.4ms	3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: fae44a52f183; PR package: 85c285b. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`bunx -p <pkg> ccusage claude --offline --json`	1.01 GiB	552.4ms	563.2ms	0.98x	336.83 MiB	323.45 MiB	0.96x	1.82 GiB/s	1.79 GiB/s
`bunx -p <pkg> ccusage codex --offline --json`	1.01 GiB	368.9ms	364.8ms	1.01x	70.83 MiB	72.33 MiB	1.02x	2.73 GiB/s	2.76 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command	Runtime	Input	Median	Throughput	Samples
`claude --offline --json`	Package wrapper	1.01 GiB	550.4ms	1.83 GiB/s	1
`claude --offline --json`	Installed native binary	1.01 GiB	520.2ms	1.94 GiB/s	1
`codex --offline --json`	Package wrapper	1.01 GiB	359.1ms	2.80 GiB/s	1
`codex --offline --json`	Installed native binary	1.01 GiB	335.2ms	3.00 GiB/s	1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude daily --offline --json`	29.4ms	28.4ms	1.03x	43.48 MiB	43.48 MiB	1.00x	0.05 MiB/s	0.05 MiB/s
`claude session --offline --json`	29.0ms	28.5ms	1.02x	43.73 MiB	43.48 MiB	0.99x	0.05 MiB/s	0.05 MiB/s
`codex daily --offline --json`	28.2ms	28.7ms	0.99x	43.48 MiB	43.61 MiB	1.00x	0.03 MiB/s	0.03 MiB/s
`codex session --offline --json`	28.4ms	28.2ms	1.01x	43.48 MiB	43.48 MiB	1.00x	0.03 MiB/s	0.03 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude --offline --json`	1.01 GiB	554.7ms	553.8ms	1.00x	293.83 MiB	331.20 MiB	1.13x	1.82 GiB/s	1.82 GiB/s
`codex --offline --json`	1.01 GiB	362.8ms	365.3ms	0.99x	80.20 MiB	81.20 MiB	1.01x	2.78 GiB/s	2.76 GiB/s

Artifact size

Artifact	Base	PR	Delta	Ratio
packed `ccusage-*.tgz`	17.30 KiB	17.30 KiB	+0.00 KiB	1.00x
installed native package binary	3353.74 KiB	3417.74 KiB	+64.00 KiB	0.98x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

github-actions · 2026-06-09T22:26:05Z

ccusage performance comparison

PR SHA: 85c285b65899
Base SHA: fae44a52f183

This compares the Rust PR release binary against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package	SHA	Execution setup	Bunx temp cache	Bunx warm median	Warm samples
Base pkg.pr.new	`fae44a52f183`	1.685s	1.935s	30.3ms	3
PR pkg.pr.new	`85c285b`	1.063s	1.158s	30.8ms	3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: fae44a52f183; PR package: 85c285b. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`bunx -p <pkg> ccusage claude --offline --json`	1.01 GiB	572.7ms	566.0ms	1.01x	294.70 MiB	318.08 MiB	1.08x	1.76 GiB/s	1.78 GiB/s
`bunx -p <pkg> ccusage codex --offline --json`	1.01 GiB	385.6ms	385.2ms	1.00x	68.70 MiB	80.70 MiB	1.17x	2.61 GiB/s	2.61 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command	Runtime	Input	Median	Throughput	Samples
`claude --offline --json`	Package wrapper	1.01 GiB	584.0ms	1.72 GiB/s	1
`claude --offline --json`	Installed native binary	1.01 GiB	553.1ms	1.82 GiB/s	1
`codex --offline --json`	Package wrapper	1.01 GiB	379.9ms	2.65 GiB/s	1
`codex --offline --json`	Installed native binary	1.01 GiB	359.1ms	2.80 GiB/s	1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude daily --offline --json`	29.1ms	3.9ms	7.39x	43.61 MiB	2.70 MiB	0.06x	0.05 MiB/s	0.39 MiB/s
`claude session --offline --json`	28.6ms	3.9ms	7.35x	43.61 MiB	2.70 MiB	0.06x	0.05 MiB/s	0.40 MiB/s
`codex daily --offline --json`	28.9ms	3.7ms	7.89x	43.48 MiB	2.70 MiB	0.06x	0.03 MiB/s	0.23 MiB/s
`codex session --offline --json`	29.2ms	3.7ms	7.86x	43.61 MiB	2.70 MiB	0.06x	0.03 MiB/s	0.23 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude --offline --json`	1.01 GiB	554.2ms	512.4ms	1.08x	332.70 MiB	317.45 MiB	0.95x	1.82 GiB/s	1.96 GiB/s
`codex --offline --json`	1.01 GiB	365.8ms	339.4ms	1.08x	77.45 MiB	70.33 MiB	0.91x	2.75 GiB/s	2.97 GiB/s

Artifact size

Artifact	Base	PR	Delta	Ratio
packed `ccusage-*.tgz`	17.30 KiB	17.30 KiB	+0.00 KiB	1.00x
installed native package binary	3353.74 KiB	3417.74 KiB	+64.00 KiB	0.98x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

Keep the models.dev pricing updater serialized after the LiteLLM updater to avoid concurrent flake.lock edits, but run it with always() so a LiteLLM refresh failure does not skip the independent models.dev refresh.

ryoppippi · 2026-06-09T22:27:35Z

@coderabbitai review
@cubic-dev-ai review

cubic-dev-ai · 2026-06-09T22:27:44Z

@coderabbitai review
@cubic-dev-ai review

@ryoppippi I have started the AI code review. It will take a few minutes to complete.

coderabbitai · 2026-06-09T22:28:33Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "=== .github/workflows/update-pricing.yaml ==="
cat .github/workflows/update-pricing.yaml

echo ""
echo "=== .github/actions/setup-nix (if exists) ==="
find . -path './.github/actions/setup-nix*' -exec cat {} \;

echo ""
echo "=== rust/crates/ccusage/src/pricing.rs ==="
cat rust/crates/ccusage/src/pricing.rs

echo ""
echo "=== nix/models-dev-gen.ts ==="
cat nix/models-dev-gen.ts

echo ""
echo "=== nix/models-dev-pricing.nix ==="
cat nix/models-dev-pricing.nix

echo ""
echo "=== justfile (update-models-dev-pricing recipe) ==="
cat justfile

Length of output: 92106

@ryoppippi: Thanks for the follow-ups. After re-reading the current state of the files, everything from the previous rounds is addressed except one item that still needs a fix.

🔴 Unresolved — `git push` will fail without a credential helper

Both jobs check out with persist-credentials: false, which removes the default GITHUB_TOKEN credential helper. Inside the "Create pull request" step, GH_TOKEN is set as an environment variable — but that env var is consumed by the gh CLI, not by git itself. A plain git push --force-with-lease origin "$branch" over HTTPS has no credential source and will fail with a 403/auth error at runtime.

Looking at .github/actions/setup-nix/action.yml, it does not call gh auth setup-git, so there's no credential helper set up anywhere in the chain.

The fix is a one-liner in the "Create pull request" step (or a dedicated step before it):

- name: Create pull request
  env:
    GH_TOKEN: ${{ github.token }}
  run: |
    # Configure git to authenticate using the gh credential helper
    gh auth setup-git

    git config user.name "github-actions[bot]"
    git config user.email "41898282+github-actions[bot]`@users.noreply.github.com`"
    ...

This needs to be applied to both update-pricing and update-models-dev-pricing jobs.

✅ Confirmed addressed from prior rounds

Item	Status
`embedded_models_dev_pricing()` — `.expect("…must parse")`	✓
`OUTFILE` guard in `nix/models-dev-gen.ts`	✓
`claude-fable-5` assertion with clear message	✓
`needs: update-pricing` + `if: ${{ always() }}` serialization	✓

🟡 Remaining low-priority nits (not blocking)

SRI hashes in nix/models-dev-pricing.nix — still require manual sync when nix flake update models-dev bumps the input; consider a follow-up to automate hash extraction or add a CI guard.
load_models_dev_json_missing drops provider namespace — intentional for Anthropic scope, but a one-line comment confirming this would help future maintainers.
KEEP = /claude|anthropic/i — still substring-matches; \b(claude|anthropic)\b would be more precise (cosmetic).

cubic-dev-ai

1 issue found across 10 files

_{Reply with feedback, questions, or to request a fix.

Re-trigger cubic}

pullfrog

✅ No new issues found.

Reviewed changes — the single follow-up commit adds if: ${{ always() }} to the update-models-dev-pricing job so a LiteLLM refresh failure does not silently skip the models.dev snapshot refresh.

CI resilience — update-models-dev-pricing now uses if: ${{ always() }} alongside needs: update-pricing, preserving serialized execution order while preventing a failed LiteLLM update from blocking the independent models.dev refresh.

^{｜ View workflow run ｜ Using DeepSeek Pro (free via Pullfrog for OSS) ｜ 𝕏}

github-actions · 2026-06-09T22:42:54Z

ccusage performance comparison

PR SHA: ab1eb978574c
Base SHA: fae44a52f183

This compares the PR package against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package	SHA	Execution setup	Bunx temp cache	Bunx warm median	Warm samples
Base pkg.pr.new	`fae44a52f183`	599.1ms	502.5ms	30.3ms	3
PR pkg.pr.new	`ab1eb97`	636.8ms	734.5ms	30.2ms	3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: fae44a52f183; PR package: ab1eb97. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`bunx -p <pkg> ccusage claude --offline --json`	1.01 GiB	576.5ms	636.3ms	0.91x	342.70 MiB	359.83 MiB	1.05x	1.75 GiB/s	1.58 GiB/s
`bunx -p <pkg> ccusage codex --offline --json`	1.01 GiB	367.3ms	363.4ms	1.01x	75.70 MiB	76.70 MiB	1.01x	2.74 GiB/s	2.77 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command	Runtime	Input	Median	Throughput	Samples
`claude --offline --json`	Package wrapper	1.01 GiB	554.1ms	1.82 GiB/s	1
`claude --offline --json`	Installed native binary	1.01 GiB	523.2ms	1.92 GiB/s	1
`codex --offline --json`	Package wrapper	1.01 GiB	355.7ms	2.83 GiB/s	1
`codex --offline --json`	Installed native binary	1.01 GiB	338.4ms	2.98 GiB/s	1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude daily --offline --json`	28.1ms	28.0ms	1.00x	43.73 MiB	43.48 MiB	0.99x	0.05 MiB/s	0.06 MiB/s
`claude session --offline --json`	28.2ms	28.4ms	0.99x	43.48 MiB	43.48 MiB	1.00x	0.05 MiB/s	0.05 MiB/s
`codex daily --offline --json`	29.2ms	28.8ms	1.01x	43.48 MiB	-	-	0.03 MiB/s	0.03 MiB/s
`codex session --offline --json`	28.4ms	28.7ms	0.99x	43.48 MiB	43.48 MiB	1.00x	0.03 MiB/s	0.03 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude --offline --json`	1.01 GiB	570.4ms	549.6ms	1.04x	318.70 MiB	320.70 MiB	1.01x	1.77 GiB/s	1.83 GiB/s
`codex --offline --json`	1.01 GiB	356.2ms	360.5ms	0.99x	82.20 MiB	72.45 MiB	0.88x	2.83 GiB/s	2.79 GiB/s

Artifact size

Artifact	Base	PR	Delta	Ratio
packed `ccusage-*.tgz`	17.30 KiB	17.30 KiB	+0.00 KiB	1.00x
installed native package binary	3353.74 KiB	3417.74 KiB	+64.00 KiB	0.98x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

github-actions · 2026-06-09T22:43:02Z

ccusage performance comparison

PR SHA: ab1eb978574c
Base SHA: fae44a52f183

This compares the Rust PR release binary against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package	SHA	Execution setup	Bunx temp cache	Bunx warm median	Warm samples
Base pkg.pr.new	`fae44a52f183`	406.1ms	747.8ms	33.1ms	3
PR pkg.pr.new	`ab1eb97`	601.4ms	609.0ms	32.6ms	3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: fae44a52f183; PR package: ab1eb97. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`bunx -p <pkg> ccusage claude --offline --json`	1.01 GiB	573.6ms	568.1ms	1.01x	327.08 MiB	309.20 MiB	0.95x	1.76 GiB/s	1.77 GiB/s
`bunx -p <pkg> ccusage codex --offline --json`	1.01 GiB	374.9ms	424.3ms	0.88x	66.83 MiB	77.45 MiB	1.16x	2.69 GiB/s	2.37 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command	Runtime	Input	Median	Throughput	Samples
`claude --offline --json`	Package wrapper	1.01 GiB	575.4ms	1.75 GiB/s	1
`claude --offline --json`	Installed native binary	1.01 GiB	550.2ms	1.83 GiB/s	1
`codex --offline --json`	Package wrapper	1.01 GiB	377.5ms	2.67 GiB/s	1
`codex --offline --json`	Installed native binary	1.01 GiB	350.7ms	2.87 GiB/s	1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude daily --offline --json`	30.4ms	4.2ms	7.21x	43.61 MiB	2.70 MiB	0.06x	0.05 MiB/s	0.37 MiB/s
`claude session --offline --json`	30.2ms	4.2ms	7.16x	43.73 MiB	2.70 MiB	0.06x	0.05 MiB/s	0.37 MiB/s
`codex daily --offline --json`	30.4ms	3.9ms	7.82x	43.42 MiB	2.70 MiB	0.06x	0.03 MiB/s	0.22 MiB/s
`codex session --offline --json`	30.4ms	3.9ms	7.73x	43.61 MiB	2.70 MiB	0.06x	0.03 MiB/s	0.22 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude --offline --json`	1.01 GiB	563.7ms	536.0ms	1.05x	330.58 MiB	325.95 MiB	0.99x	1.79 GiB/s	1.88 GiB/s
`codex --offline --json`	1.01 GiB	381.4ms	341.6ms	1.12x	72.20 MiB	81.33 MiB	1.13x	2.64 GiB/s	2.95 GiB/s

Artifact size

Artifact	Base	PR	Delta	Ratio
packed `ccusage-*.tgz`	17.30 KiB	17.30 KiB	+0.00 KiB	1.00x
installed native package binary	3353.74 KiB	3417.74 KiB	+64.00 KiB	0.98x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

Keep checkout credential persistence disabled in the pricing update workflow while restoring an explicit authentication path for later git push commands. Both pricing automation jobs already receive GH_TOKEN for gh CLI operations. Running gh auth setup-git configures Git to use that token-backed auth before pushing the automation branches, so the workflow no longer depends on credentials persisted by actions/checkout.

ryoppippi · 2026-06-09T22:47:54Z

@coderabbitai review
@cubic-dev-ai review

cubic-dev-ai · 2026-06-09T22:48:01Z

@coderabbitai review
@cubic-dev-ai review

@ryoppippi I have started the AI code review. It will take a few minutes to complete.

coderabbitai · 2026-06-09T22:48:33Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "=== .github/workflows/update-pricing.yaml ==="
cat .github/workflows/update-pricing.yaml

echo ""
echo "=== .github/actions/setup-nix/action.yml (if exists) ==="
find . -path './.github/actions/setup-nix*' | xargs cat 2>/dev/null || echo "Not found"

echo ""
echo "=== git log --oneline -8 ==="
git log --oneline -8

Length of output: 9347

@ryoppippi: Confirmed — commit 70dca9e resolves the last blocking issue. Both jobs now call gh auth setup-git (with GH_TOKEN in env) before git push --force-with-lease, which correctly re-establishes the credential helper after persist-credentials: false. ✓

All blocking issues are now addressed:

Item	Commit	Status
`embedded_models_dev_pricing()` fail-fast `.expect()`	`85c285b`	✅
`OUTFILE` guard in `nix/models-dev-gen.ts`	`ed99589`	✅
`claude-fable-5` assertion message	`85c285b`	✅
`needs: update-pricing` + `if: always()`	`ab1eb97`	✅
`gh auth setup-git` before `git push`	`70dca9e`	✅

The three remaining low-priority nits (SRI hash automation in nix/models-dev-pricing.nix, a comment on provider-key dropping in load_models_dev_json_missing, and tightening KEEP = /claude|anthropic/i) are all non-blocking and fine to handle in follow-up PRs. The PR is in solid shape — good to merge.

(ᵔ◡ᵔ)🐇

cubic-dev-ai

No issues found across 10 files

_{Re-trigger cubic}

github-actions · 2026-06-09T23:03:04Z

ccusage performance comparison

PR SHA: 70dca9e6718f
Base SHA: fae44a52f183

This compares the Rust PR release binary against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package	SHA	Execution setup	Bunx temp cache	Bunx warm median	Warm samples
Base pkg.pr.new	`fae44a52f183`	558.1ms	628.3ms	30.3ms	3
PR pkg.pr.new	`70dca9e`	1.098s	841.2ms	30.3ms	3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: fae44a52f183; PR package: 70dca9e. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`bunx -p <pkg> ccusage claude --offline --json`	1.01 GiB	547.6ms	570.7ms	0.96x	324.83 MiB	323.95 MiB	1.00x	1.84 GiB/s	1.76 GiB/s
`bunx -p <pkg> ccusage codex --offline --json`	1.01 GiB	372.4ms	370.8ms	1.00x	85.83 MiB	73.20 MiB	0.85x	2.70 GiB/s	2.72 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command	Runtime	Input	Median	Throughput	Samples
`claude --offline --json`	Package wrapper	1.01 GiB	557.1ms	1.81 GiB/s	1
`claude --offline --json`	Installed native binary	1.01 GiB	532.0ms	1.89 GiB/s	1
`codex --offline --json`	Package wrapper	1.01 GiB	376.3ms	2.68 GiB/s	1
`codex --offline --json`	Installed native binary	1.01 GiB	338.6ms	2.97 GiB/s	1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude daily --offline --json`	29.0ms	3.9ms	7.43x	43.48 MiB	2.70 MiB	0.06x	0.05 MiB/s	0.40 MiB/s
`claude session --offline --json`	28.4ms	3.9ms	7.23x	43.61 MiB	2.70 MiB	0.06x	0.05 MiB/s	0.39 MiB/s
`codex daily --offline --json`	28.0ms	3.8ms	7.30x	43.73 MiB	2.70 MiB	0.06x	0.03 MiB/s	0.22 MiB/s
`codex session --offline --json`	28.5ms	3.7ms	7.78x	43.48 MiB	2.70 MiB	0.06x	0.03 MiB/s	0.23 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude --offline --json`	1.01 GiB	551.1ms	530.5ms	1.04x	295.70 MiB	330.45 MiB	1.12x	1.83 GiB/s	1.90 GiB/s
`codex --offline --json`	1.01 GiB	363.7ms	333.8ms	1.09x	81.33 MiB	77.45 MiB	0.95x	2.77 GiB/s	3.02 GiB/s

Artifact size

Artifact	Base	PR	Delta	Ratio
packed `ccusage-*.tgz`	17.30 KiB	17.30 KiB	+0.00 KiB	1.00x
installed native package binary	3353.74 KiB	3417.74 KiB	+64.00 KiB	0.98x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

github-actions · 2026-06-09T23:03:05Z

ccusage performance comparison

PR SHA: 70dca9e6718f
Base SHA: fae44a52f183

This compares the PR package against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package	SHA	Execution setup	Bunx temp cache	Bunx warm median	Warm samples
Base pkg.pr.new	`fae44a52f183`	669.7ms	664.9ms	33.4ms	3
PR pkg.pr.new	`70dca9e`	766.5ms	544.6ms	32.7ms	3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: fae44a52f183; PR package: 70dca9e. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`bunx -p <pkg> ccusage claude --offline --json`	1.01 GiB	583.1ms	570.2ms	1.02x	324.20 MiB	324.70 MiB	1.00x	1.73 GiB/s	1.77 GiB/s
`bunx -p <pkg> ccusage codex --offline --json`	1.01 GiB	374.5ms	382.9ms	0.98x	79.58 MiB	82.45 MiB	1.04x	2.69 GiB/s	2.63 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command	Runtime	Input	Median	Throughput	Samples
`claude --offline --json`	Package wrapper	1.01 GiB	587.5ms	1.71 GiB/s	1
`claude --offline --json`	Installed native binary	1.01 GiB	541.2ms	1.86 GiB/s	1
`codex --offline --json`	Package wrapper	1.01 GiB	371.0ms	2.71 GiB/s	1
`codex --offline --json`	Installed native binary	1.01 GiB	340.6ms	2.96 GiB/s	1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude daily --offline --json`	31.2ms	31.1ms	1.00x	-	43.61 MiB	-	0.05 MiB/s	0.05 MiB/s
`claude session --offline --json`	30.7ms	30.6ms	1.00x	43.61 MiB	43.61 MiB	1.00x	0.05 MiB/s	0.05 MiB/s
`codex daily --offline --json`	30.5ms	30.0ms	1.02x	43.61 MiB	43.48 MiB	1.00x	0.03 MiB/s	0.03 MiB/s
`codex session --offline --json`	30.4ms	30.1ms	1.01x	43.48 MiB	-	-	0.03 MiB/s	0.03 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command	Input	Base median	PR median	PR vs base	Base peak RSS	PR peak RSS	PR/base RSS	Base throughput	PR throughput
`claude --offline --json`	1.01 GiB	565.5ms	561.5ms	1.01x	326.70 MiB	-	-	1.78 GiB/s	1.79 GiB/s
`codex --offline --json`	1.01 GiB	417.1ms	369.8ms	1.13x	81.83 MiB	81.70 MiB	1.00x	2.41 GiB/s	2.72 GiB/s

Artifact size

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

pullfrog Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread justfile

coderabbitai Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread .github/workflows/update-pricing.yaml

cubic-dev-ai Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread .github/workflows/update-pricing.yaml

Comment thread rust/crates/ccusage/src/pricing.rs

pullfrog Bot approved these changes Jun 9, 2026

View reviewed changes

cubic-dev-ai Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread .github/workflows/update-pricing.yaml

fix(ci): keep models.dev refresh resilient

ab1eb97

Keep the models.dev pricing updater serialized after the LiteLLM updater to avoid concurrent flake.lock edits, but run it with always() so a LiteLLM refresh failure does not skip the independent models.dev refresh.

cubic-dev-ai Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread .github/workflows/update-pricing.yaml

pullfrog Bot approved these changes Jun 9, 2026

View reviewed changes

cubic-dev-ai Bot reviewed Jun 9, 2026

View reviewed changes

ryoppippi merged commit 15fd42a into main Jun 9, 2026
40 checks passed

ryoppippi deleted the feat/embed-models-dev-pricing branch June 9, 2026 23:04

This was referenced Jun 10, 2026

fix(nix): stabilise crane dependency cache #1243

Merged

perf(pricing): shrink embedded snapshots 58% and binary 2.9% #1244

Merged

Sunbelt Computer Software

PL/B Language Development and Support

Uh oh!

Uh oh!

Conversation

ryoppippi commented Jun 9, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Why

Testing

Summary by cubic

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

cloudflare-workers-and-pages Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Deploying with Cloudflare Workers

Uh oh!

pullfrog Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pkg-pr-new Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cubic-dev-ai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 9, 2026

ccusage performance comparison

Package runner startup

Cached bunx execution performance

Package runtime diagnostics

Committed fixture performance

Large real-world-shaped fixture performance

Artifact size

Uh oh!

github-actions Bot commented Jun 9, 2026

ccusage performance comparison

Package runner startup

Cached bunx execution performance

Package runtime diagnostics

Committed fixture performance

Large real-world-shaped fixture performance

Artifact size

Uh oh!

ryoppippi commented Jun 9, 2026

Uh oh!

cubic-dev-ai Bot commented Jun 9, 2026

Uh oh!

ryoppippi commented Jun 9, 2026

Uh oh!

cubic-dev-ai Bot commented Jun 9, 2026

Uh oh!

coderabbitai Bot commented Jun 9, 2026

rust/crates/ccusage/src/pricing.rs

nix/models-dev-gen.ts

.github/workflows/update-pricing.yaml

nix/models-dev-pricing.nix

Minor / nits

Uh oh!

coderabbitai Bot commented Jun 9, 2026

ryoppippi commented Jun 9, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 9, 2026 •

edited

Loading

cloudflare-workers-and-pages Bot commented Jun 9, 2026 •

edited

Loading

pkg-pr-new Bot commented Jun 9, 2026 •

edited

Loading

cubic-dev-ai Bot left a comment •

edited

Loading

`rust/crates/ccusage/src/pricing.rs`

`nix/models-dev-gen.ts`

`.github/workflows/update-pricing.yaml`

`nix/models-dev-pricing.nix`

Review: `feat(pricing): embed models.dev snapshot for offline pricing`

🔴 Issue — `update-models-dev-pricing` unnecessarily blocks on `update-pricing`

🟡 Issue — SRI hashes in `models-dev-pricing.nix` require silent manual updates

🟡 Issue — `persist-credentials: false` with bare `git push`

🟡 Issue — `load_models_dev_json_missing` iterates `into_values()`, losing provider keys

🟢 Nit — `gen-models-dev-pricing` copies from a Nix store path; the `--no-link` flag is implicit

🟢 Nit — `KEEP` regex also matches non-Anthropic providers

cubic-dev-ai Bot left a comment •

edited

Loading