feat(pricing): embed models.dev snapshot for offline pricing by ryoppippi · Pull Request #1242 · ccusage/ccusage · GitHub
Skip to content

feat(pricing): embed models.dev snapshot for offline pricing#1242

Merged
ryoppippi merged 6 commits into
mainfrom
feat/embed-models-dev-pricing
Jun 9, 2026
Merged

feat(pricing): embed models.dev snapshot for offline pricing#1242
ryoppippi merged 6 commits into
mainfrom
feat/embed-models-dev-pricing

Conversation

@ryoppippi

@ryoppippi ryoppippi commented Jun 9, 2026

Copy link
Copy Markdown
Member

Summary

Prices brand-new Anthropic models offline by embedding a pinned, self-generated
models.dev pricing snapshot. Models such as claude-fable-5 ship on models.dev
before LiteLLM publishes them, so they previously could not be priced without
network access (the models.dev fallback was online-only).

What changed

  • Pin the source: add anomalyco/models.dev as a flake input.
  • Reproducible generator (nix/models-dev-gen.ts + nix/models-dev-pricing.nix):
    models.dev ships per-model TOML, not a prebuilt catalog, so the snapshot is
    built with the project's own generateCatalog (Bun + the zero-dependency
    remeda/zod vendored from the pinned bun.lock hashes) and compacted to the
    Anthropic models and pricing fields ccusage consumes. Exposed as the
    .#models-dev-pricing package.
  • Commit + embed: the compacted snapshot is committed at
    rust/crates/ccusage/src/models-dev-pricing.json and embedded via
    include_str!. No build-time network on any platform (Nix builds and plain
    cargo build on macOS/Windows ship identical, pinned data). build.rs is
    unchanged.
  • Runtime (pricing.rs): the embedded models.dev data is a separate,
    offline-capable fallback map, consulted only when the primary table misses, so
    it never perturbs the primary table's fuzzy alias matching.
  • Automation (.github/workflows/update-pricing.yaml): refresh the LiteLLM
    and models.dev snapshots hourly, each opening its own PR only when the
    pricing actually changes. Workflow permissions scoped per job.
  • Recipes (justfile): gen-models-dev-pricing and
    update-models-dev-pricing.

Why

The hardcoded built-in pricing table only covers a fixed set of models, and the
models.dev fallback required network access. Embedding a pinned snapshot closes
the offline gap for newly released models while keeping the data reproducible and
reviewable in git (refreshed automatically).

Testing

  • just check (clippy, treefmt, schema drift, gitleaks, nix build) — passed
  • cargo test -p ccusage pricing suite (52 tests, incl. offline resolution of
    models.dev-only models like claude-fable-5) — passed
  • nix build .#ccusage and nix build .#models-dev-pricing — passed

Summary by cubic

Embed a pinned models.dev pricing snapshot into ccusage to enable offline pricing for new Anthropic models when LiteLLM hasn’t published them yet. Builds reproducibly with no build-time network, keeps primary pricing behavior unchanged, and fails fast if the embedded snapshot is invalid.

  • New Features

    • Pin anomalyco/models.dev as a flake input.
    • Reproducible snapshot generator (nix/models-dev-gen.ts + nix/models-dev-pricing.nix) using Bun with vendored remeda/zod; compacts to Anthropic models and required fields; validates OUTFILE before write.
    • Commit snapshot to rust/crates/ccusage/src/models-dev-pricing.json and embed via include_str!; Nix and cargo builds ship identical, pinned data.
    • Runtime adds a separate embedded models.dev fallback map in pricing.rs, used only on misses so fuzzy alias matching in the primary table is unaffected; works offline; treat embedded snapshot parse errors as build-time failures.
  • Automation

    • .github/workflows/update-pricing.yaml: hourly refresh for LiteLLM and models.dev; serialize the models.dev updater after LiteLLM; run models.dev refresh even if the LiteLLM job fails (always()).
    • Configure git auth with gh auth setup-git for pushes using GH_TOKEN; disable persisted checkout credentials; job-scoped permissions.
    • Open PRs only when pricing changes; skip lock-only churn.

Written for commit 70dca9e. Summary will update on new commits.

Review in cubic

Summary by CodeRabbit

  • New Features

    • Offline embedded models.dev pricing snapshot for reliable local access to model pricing and context limits.
    • Improved offline resolution so previously-unavailable Anthropic models can be resolved without network access.
    • Build now includes the generated models.dev pricing snapshot so it’s available at runtime.
  • Chores

    • Pricing refresh workflow runs hourly and adds an automated job to regenerate, validate, and publish the models.dev pricing snapshot (suppresses PR noise when unchanged).
    • CI step added to ensure git auth before pushing automation branches; added local commands to regenerate the snapshot.

ccusage prices models from the embedded LiteLLM snapshot plus a runtime
models.dev fallback that was only consulted when online. Newly released
Anthropic models (e.g. claude-fable-5) ship on models.dev before LiteLLM
publishes them, so they could not be priced offline at all.

Pin the models.dev source as a flake input and reproducibly regenerate a
compacted, Anthropic-only pricing snapshot from it. models.dev ships per-model
TOML rather than a prebuilt catalog, so the snapshot is built with the project's
own generateCatalog routine (Bun + the zero-dependency remeda/zod vendored from
the pinned bun.lock hashes) and then trimmed to the pricing fields ccusage
consumes. The result is committed to the repo and embedded via include_str!, so
every platform (Nix and plain cargo on macOS/Windows) ships identical, pinned
data with no build-time network access.

At runtime the embedded models.dev data is kept as a separate fallback map,
consulted only when the primary table misses, so it never perturbs the primary
table's fuzzy alias matching. Unlike the network source it stays available
offline.

- flake.nix/flake.lock: pin anomalyco/models.dev input
- nix/models-dev-gen.ts + nix/models-dev-pricing.nix: reproducible generator,
  exposed as the .#models-dev-pricing package
- justfile: gen-models-dev-pricing / update-models-dev-pricing recipes
- .github/workflows/update-pricing.yaml: refresh LiteLLM and models.dev
  snapshots hourly, each opening its own PR when the pricing actually changes
- pricing.rs: embed the committed snapshot and resolve it as an offline-capable
  fallback, with tests covering offline resolution of models.dev-only models
@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown

@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jun 9, 2026

Copy link
Copy Markdown

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
ccusage-guide 70dca9e Commit Preview URL

Branch Preview URL
Jun 09 2026, 10:48 PM

@pullfrog pullfrog Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed changes — embeds a pinned, generated models.dev pricing snapshot as a third fallback tier in PricingMap, enabling offline resolution of newly released Anthropic models that aren't yet in LiteLLM.

  • Embedded models.dev snapshot — committed at rust/crates/ccusage/src/models-dev-pricing.json, embedded via include_str!, and consulted by find()/context_limit() after the primary table and the network models.dev cache.
  • Reproducible generatornix/models-dev-gen.ts imports models.dev's own generateCatalog, filters to Anthropic models, and outputs compacted JSON with stable key ordering. nix/models-dev-pricing.nix vendors remeda/zod by hash for reproducibility.
  • Runtime fallback chainfind() now chains through the embedded snapshot with its own enable_embedded_models_dev_fallback flag, keeping the embedded map separate so it never interferes with the primary table's fuzzy alias matching.
  • Hourly automationupdate-pricing.yaml gains a update-models-dev-pricing job alongside the existing LiteLLM job, with per-job permissions scoping.
  • Tests — three new tests cover snapshot parseability, offline fallback for models only in the embedded snapshot, and resolution of claude-fable-5 specifically.

ℹ️ No critical issues — one minor suggestion inline.

Pullfrog  | Fix all ➔Fix 👍s ➔View workflow run | Using DeepSeek Pro (free via Pullfrog for OSS) | 𝕏

Comment thread justfile

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
.github/workflows/update-pricing.yaml (1)

76-133: ⚖️ Poor tradeoff

Consider potential race condition between parallel jobs.

Both update-pricing and update-models-dev-pricing jobs update flake.lock (different inputs) and can run simultaneously. While --force-with-lease and different branch names reduce the risk, concurrent updates to flake.lock could cause one job to fail if they finish within the same minute.

Impact is minimal since the failed job will retry hourly, but you could serialize them using needs: if deterministic execution order is preferred.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @.github/workflows/update-pricing.yaml around lines 76 - 133, Two jobs can
race updating flake.lock: serialize the jobs by adding a dependency so they
don't run in parallel; specifically add a needs: reference on the
update-models-dev-pricing job (or on update-pricing depending which should run
first) so GitHub Actions will wait for the prior job to finish before starting
the other, e.g., make update-models-dev-pricing depend on update-pricing (use
the job name update-pricing in needs) to prevent concurrent flake.lock edits and
eliminate the --force-with-lease collision risk.
nix/models-dev-gen.ts (1)

79-79: 💤 Low value

Consider adding OUTFILE validation for better error messaging.

While the non-null assertion is safe in the Nix build context (where OUTFILE is always set), adding an explicit check would provide a clearer error if the script is accidentally run outside Nix:

+const outfile = process.env.OUTFILE;
+if (!outfile) {
+  throw new Error('OUTFILE environment variable is required');
+}
-await Bun.write(process.env.OUTFILE!, `${JSON.stringify(sortObject(out), null, 2)}\n`);
+await Bun.write(outfile, `${JSON.stringify(sortObject(out), null, 2)}\n`);

However, since this is strictly a Nix-invoked build script and any failure would be caught at build time, the current approach is acceptable.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@nix/models-dev-gen.ts` at line 79, Replace the non-null assertion on
process.env.OUTFILE before calling Bun.write with an explicit validation: check
that process.env.OUTFILE is defined (and optionally non-empty) and if not, throw
or log a clear error and exit (so the failure message explains OUTFILE is
missing), then call Bun.write with process.env.OUTFILE; reference the OUTFILE
env var, the Bun.write(...) call, and the sortObject(out) usage when making the
change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/update-pricing.yaml:
- Line 17: The checkout steps currently use "uses:
actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10" without disabling
persisted credentials; update both checkout actions (the ones using
actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10) to include the input
"persist-credentials: false" under their step configuration so the runner will
not leak git credentials (add the single key/value to each checkout step).

---

Nitpick comments:
In @.github/workflows/update-pricing.yaml:
- Around line 76-133: Two jobs can race updating flake.lock: serialize the jobs
by adding a dependency so they don't run in parallel; specifically add a needs:
reference on the update-models-dev-pricing job (or on update-pricing depending
which should run first) so GitHub Actions will wait for the prior job to finish
before starting the other, e.g., make update-models-dev-pricing depend on
update-pricing (use the job name update-pricing in needs) to prevent concurrent
flake.lock edits and eliminate the --force-with-lease collision risk.

In `@nix/models-dev-gen.ts`:
- Line 79: Replace the non-null assertion on process.env.OUTFILE before calling
Bun.write with an explicit validation: check that process.env.OUTFILE is defined
(and optionally non-empty) and if not, throw or log a clear error and exit (so
the failure message explains OUTFILE is missing), then call Bun.write with
process.env.OUTFILE; reference the OUTFILE env var, the Bun.write(...) call, and
the sortObject(out) usage when making the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 51ec09c2-9484-4503-8e7d-5c686e76ca50

📥 Commits

Reviewing files that changed from the base of the PR and between fae44a5 and cbc9ec5.

⛔ Files ignored due to path filters (1)
  • flake.lock is excluded by !**/*.lock
📒 Files selected for processing (9)
  • .github/workflows/update-pricing.yaml
  • flake.nix
  • justfile
  • nix/models-dev-gen.ts
  • nix/models-dev-pricing.nix
  • nix/packages.nix
  • package.nix
  • rust/crates/ccusage/src/models-dev-pricing.json
  • rust/crates/ccusage/src/pricing.rs

Comment thread .github/workflows/update-pricing.yaml
@pkg-pr-new

pkg-pr-new Bot commented Jun 9, 2026

Copy link
Copy Markdown

Open in StackBlitz

ccusage

npx https://pkg.pr.new/ccusage@1242

@ccusage/ccusage-darwin-arm64

npx https://pkg.pr.new/@ccusage/ccusage-darwin-arm64@1242

@ccusage/ccusage-darwin-x64

npx https://pkg.pr.new/@ccusage/ccusage-darwin-x64@1242

@ccusage/ccusage-linux-arm64

npx https://pkg.pr.new/@ccusage/ccusage-linux-arm64@1242

@ccusage/ccusage-linux-x64

npx https://pkg.pr.new/@ccusage/ccusage-linux-x64@1242

@ccusage/ccusage-win32-arm64

npx https://pkg.pr.new/@ccusage/ccusage-win32-arm64@1242

@ccusage/ccusage-win32-x64

npx https://pkg.pr.new/@ccusage/ccusage-win32-x64@1242

commit: 70dca9e

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 10 files

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread .github/workflows/update-pricing.yaml
Comment thread rust/crates/ccusage/src/pricing.rs
The update-models-dev-pricing recipe used just dependencies for generation and validation, which run before the recipe body. That regenerated and checked against the old models-dev lock, then updated only flake.lock afterward.

Run the models.dev input update first, then regenerate the committed snapshot and run the normal check. Also make the claude-fable-5 offline pricing test assert that the embedded snapshot actually contains the target model instead of silently returning.
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

ccusage performance comparison

PR SHA: cbc9ec533b53
Base SHA: fae44a52f183

This compares the PR package against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package SHA Execution setup Bunx temp cache Bunx warm median Warm samples
Base pkg.pr.new fae44a52f183 1.508s 863.2ms 30.0ms 3
PR pkg.pr.new cbc9ec5 1.261s 1.143s 30.5ms 3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: fae44a52f183; PR package: cbc9ec5. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
bunx -p <pkg> ccusage claude --offline --json 1.01 GiB 546.7ms 549.0ms 1.00x 315.95 MiB 319.20 MiB 1.01x 1.84 GiB/s 1.83 GiB/s
bunx -p <pkg> ccusage codex --offline --json 1.01 GiB 368.7ms 362.1ms 1.02x 82.33 MiB 69.70 MiB 0.85x 2.73 GiB/s 2.78 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command Runtime Input Median Throughput Samples
claude --offline --json Package wrapper 1.01 GiB 542.1ms 1.86 GiB/s 1
claude --offline --json Installed native binary 1.01 GiB 525.2ms 1.92 GiB/s 1
codex --offline --json Package wrapper 1.01 GiB 362.5ms 2.78 GiB/s 1
codex --offline --json Installed native binary 1.01 GiB 335.5ms 3.00 GiB/s 1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude daily --offline --json 0.00 MiB 29.7ms 29.1ms 1.02x 43.61 MiB 43.48 MiB 1.00x 0.05 MiB/s 0.05 MiB/s
claude session --offline --json 0.00 MiB 30.0ms 29.8ms 1.00x 43.61 MiB 43.48 MiB 1.00x 0.05 MiB/s 0.05 MiB/s
codex daily --offline --json 0.00 MiB 29.0ms 28.9ms 1.00x 43.48 MiB 43.48 MiB 1.00x 0.03 MiB/s 0.03 MiB/s
codex session --offline --json 0.00 MiB 29.2ms 28.9ms 1.01x 43.61 MiB 43.48 MiB 1.00x 0.03 MiB/s 0.03 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude --offline --json 1.01 GiB 554.4ms 582.2ms 0.95x 322.33 MiB 320.20 MiB 0.99x 1.82 GiB/s 1.73 GiB/s
codex --offline --json 1.01 GiB 384.8ms 374.2ms 1.03x 82.70 MiB 70.70 MiB 0.85x 2.62 GiB/s 2.69 GiB/s

Artifact size

Artifact Base PR Delta Ratio
packed ccusage-*.tgz 17.30 KiB 17.30 KiB +0.00 KiB 1.00x
installed native package binary 3353.74 KiB 3417.74 KiB +64.00 KiB 0.98x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

ccusage performance comparison

PR SHA: cbc9ec533b53
Base SHA: fae44a52f183

This compares the Rust PR release binary against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package SHA Execution setup Bunx temp cache Bunx warm median Warm samples
Base pkg.pr.new fae44a52f183 765.8ms 636.1ms 32.0ms 3
PR pkg.pr.new cbc9ec5 858.5ms 673.5ms 32.7ms 3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: fae44a52f183; PR package: cbc9ec5. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
bunx -p <pkg> ccusage claude --offline --json 1.01 GiB 572.4ms 548.0ms 1.04x 342.20 MiB 326.83 MiB 0.96x 1.76 GiB/s 1.84 GiB/s
bunx -p <pkg> ccusage codex --offline --json 1.01 GiB 367.8ms 371.2ms 0.99x 82.45 MiB 79.70 MiB 0.97x 2.74 GiB/s 2.71 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command Runtime Input Median Throughput Samples
claude --offline --json Package wrapper 1.01 GiB 551.7ms 1.82 GiB/s 1
claude --offline --json Installed native binary 1.01 GiB 533.6ms 1.89 GiB/s 1
codex --offline --json Package wrapper 1.01 GiB 362.8ms 2.77 GiB/s 1
codex --offline --json Installed native binary 1.01 GiB 336.1ms 3.00 GiB/s 1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude daily --offline --json 0.00 MiB 29.5ms 4.0ms 7.31x 43.61 MiB 2.70 MiB 0.06x 0.05 MiB/s 0.38 MiB/s
claude session --offline --json 0.00 MiB 30.4ms 4.1ms 7.46x 43.48 MiB 2.70 MiB 0.06x 0.05 MiB/s 0.38 MiB/s
codex daily --offline --json 0.00 MiB 29.6ms 3.8ms 7.84x 43.48 MiB 2.70 MiB 0.06x 0.03 MiB/s 0.23 MiB/s
codex session --offline --json 0.00 MiB 29.3ms 3.9ms 7.58x 43.48 MiB 2.70 MiB 0.06x 0.03 MiB/s 0.22 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude --offline --json 1.01 GiB 556.0ms 529.6ms 1.05x 338.70 MiB 318.95 MiB 0.94x 1.81 GiB/s 1.90 GiB/s
codex --offline --json 1.01 GiB 367.0ms 337.2ms 1.09x 70.70 MiB 71.08 MiB 1.01x 2.74 GiB/s 2.99 GiB/s

Artifact size

Artifact Base PR Delta Ratio
packed ccusage-*.tgz 17.30 KiB 17.30 KiB +0.00 KiB 1.00x
installed native package binary 3353.74 KiB 3417.74 KiB +64.00 KiB 0.98x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

@ryoppippi

Copy link
Copy Markdown
Member Author

@coderabbitai review
@cubic-dev-ai review

@cubic-dev-ai

cubic-dev-ai Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

@coderabbitai review
@cubic-dev-ai review

@ryoppippi I have started the AI code review. It will take a few minutes to complete.

Address reviewer feedback on the pricing refresh workflow. Disable persisted checkout credentials because the jobs configure git identity explicitly and use GH_TOKEN only for pull-request operations.

Serialize the models.dev refresh after the LiteLLM refresh to avoid simultaneous flake.lock update jobs, and validate OUTFILE before writing the generated models.dev pricing snapshot so local/manual failures are explicit.
@ryoppippi

Copy link
Copy Markdown
Member Author

@coderabbitai review
@cubic-dev-ai review

@cubic-dev-ai

cubic-dev-ai Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

@coderabbitai review
@cubic-dev-ai review

@ryoppippi I have started the AI code review. It will take a few minutes to complete.

@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Read the main Rust pricing file
cat rust/crates/ccusage/src/pricing.rs

echo "=== models-dev-gen.ts ==="
cat nix/models-dev-gen.ts

echo "=== models-dev-pricing.nix ==="
cat nix/models-dev-pricing.nix

echo "=== update-pricing.yaml ==="
cat .github/workflows/update-pricing.yaml

echo "=== justfile relevant section ==="
cat justfile

echo "=== nix/packages.nix ==="
cat nix/packages.nix

echo "=== package.nix ==="
cat package.nix

echo "=== flake.nix relevant section ==="
cat flake.nix

Length of output: 92594



@ryoppippi: Clean, well-thought-out PR. The three-tier fallback design (primary → network models.dev → embedded) is sound, the fuzzy-match isolation is correct (embedded map uses find_entry not find, preventing recursion), and the automation avoids lock-only PR noise. A few issues worth addressing:


rust/crates/ccusage/src/pricing.rs

Silently ignored parse failure in embedded_models_dev_pricing

fn embedded_models_dev_pricing() -> &'static PricingMap {
    static EMBEDDED_MODELS_DEV_PRICING: OnceLock<PricingMap> = OnceLock::new();
    EMBEDDED_MODELS_DEV_PRICING.get_or_init(|| {
        let mut map = PricingMap::default();
        map.load_models_dev_json_missing(BUILD_TIME_MODELS_DEV_JSON);  // return value ignored
        map
    })
}

load_models_dev_json_missing returns Option<usize>None on parse failure. Because BUILD_TIME_MODELS_DEV_JSON is embedded at compile time and validated by the embedded_models_dev_snapshot_is_parseable test, a parse failure at runtime would be a bug (corrupted binary or schema change). Consider panicking with a descriptive message so such bugs are caught early rather than manifesting as silent zero-cost lookups:

map.load_models_dev_json_missing(BUILD_TIME_MODELS_DEV_JSON)
    .expect("embedded models-dev-pricing.json failed to parse; this is a build-time bug");

Fragile test dependency on claude-fable-5

#[test]
fn offline_prices_new_anthropic_model_from_embedded_models_dev() {
    assert!(embedded_models_dev_pricing()
        .find_entry("claude-fable-5")
        .is_some());
    ...
}

This test will start failing as soon as models.dev removes or renames claude-fable-5. The existing offline_resolves_models_only_in_embedded_models_dev test already covers the general case (any model present in embedded but absent from the primary table). Consider either removing the claude-fable-5 assertion or adding a comment explaining it's expected to be updated when the snapshot is regenerated.


nix/models-dev-gen.ts

Non-null assertion on process.env.OUTFILE

await Bun.write(process.env.OUTFILE!, `${JSON.stringify(sortObject(out), null, 2)}\n`);

In a Nix build OUTFILE is always set, but a plain bun run nix/models-dev-gen.ts from outside Nix will throw an opaque TypeError: Argument must be a string or Blob. A guard at the top improves the developer experience:

const outfile = process.env.OUTFILE;
if (!outfile) {
  throw new Error('OUTFILE environment variable is required (set by the Nix build or just gen-models-dev-pricing)');
}
// ...
await Bun.write(outfile, `${JSON.stringify(sortObject(out), null, 2)}\n`);

.github/workflows/update-pricing.yaml

Hourly cron with two jobs touching flake.lock

Both jobs run concurrently on the same schedule and both call nix flake update on the checked-out main HEAD. While they push to separate branches (automation/litellm-pricing vs automation/models-dev-pricing), each job reads the current flake.lock independently, so a concurrent run can produce two PRs with inconsistent lockfile states. This is unlikely to cause real harm (each PR is self-contained and validated), but it's worth noting. Adding concurrency groups per job would make the intent explicit:

jobs:
  update-pricing:
    concurrency:
      group: update-litellm-pricing
      cancel-in-progress: true
    ...
  update-models-dev-pricing:
    concurrency:
      group: update-models-dev-pricing
      cancel-in-progress: true
    ...

nix/models-dev-pricing.nix

remeda/zod hashes must be kept in sync with upstream bun.lock

The comment "Keep these in sync with the matching entries in ${modelsDevSrc}/bun.lock whenever the input is bumped" is good — but the update-models-dev-pricing workflow and just update-models-dev-pricing recipe don't update these hashes automatically. A failed nix build .#models-dev-pricing after bumping the input is the only signal. Consider adding a brief comment in the justfile recipe or workflow step to remind maintainers to re-check the hashes on build failure.


Minor / nits

  • nix/models-dev-gen.ts line const { providers } = (await generateCatalog('.')) as { providers: ... } — the cast is unvalidated; a shape mismatch silently produces an empty snapshot. A runtime check on providers being a non-null object before iterating would make failures explicit.
  • package.nix source filter: the models-dev-pricing.json suffix check uses lib.hasSuffix which correctly matches rust/crates/ccusage/src/models-dev-pricing.json. ✓

@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Read the main files for review
echo "=== nix/models-dev-gen.ts ==="
cat nix/models-dev-gen.ts

echo ""
echo "=== nix/models-dev-pricing.nix ==="
cat nix/models-dev-pricing.nix

echo ""
echo "=== rust/crates/ccusage/src/pricing.rs ==="
cat rust/crates/ccusage/src/pricing.rs

echo ""
echo "=== .github/workflows/update-pricing.yaml ==="
cat .github/workflows/update-pricing.yaml

echo ""
echo "=== justfile (relevant recipes) ==="
cat justfile

echo ""
echo "=== nix/packages.nix ==="
cat nix/packages.nix

echo ""
echo "=== package.nix ==="
cat package.nix

echo ""
echo "=== flake.nix (head) ==="
head -60 flake.nix

Length of output: 93054



@ryoppippi: Here is the review for PR #1242.


Review: feat(pricing): embed models.dev snapshot for offline pricing

The overall design is clean — a separate embedded fallback map that never participates in fuzzy alias matching is the right call, and the 3-tier resolution chain (primary → network models.dev → embedded models.dev) is easy to follow. The Nix derivation for reproducible, offline-capable generation is well-structured. A few items worth addressing:


🔴 Issue — update-models-dev-pricing unnecessarily blocks on update-pricing

File: .github/workflows/update-pricing.yaml

update-models-dev-pricing:
  needs: update-pricing   # ← blocks even on unrelated LiteLLM failures

Both jobs check out main independently, update different flake inputs (litellm vs models-dev), and push to different branches (automation/litellm-pricing vs automation/models-dev-pricing). There is no shared mutable state between them, so the needs: constraint means a transient LiteLLM outage silently prevents the models.dev snapshot from being refreshed. Consider removing the dependency to let the jobs run in parallel:

update-models-dev-pricing:
  runs-on: ubuntu-24.04-arm
  permissions:
    contents: write
    pull-requests: write
  steps:
    ...

If sequential execution is desired for rate-limiting reasons, using concurrency: at the workflow level is a more explicit way to express that.


🟡 Issue — SRI hashes in models-dev-pricing.nix require silent manual updates

File: nix/models-dev-pricing.nix

# Keep these in sync with the matching entries in `${modelsDevSrc}/bun.lock`
# whenever the input is bumped.
remeda = pkgs.fetchurl {
  url = "https://registry.npmjs.org/remeda/-/remeda-2.33.7.tgz";
  hash = "sha512-cXlyjevWx5AcslOUEETG4o8XYi9UkoCXcJmj7XhPFVbla+...";
};

When nix flake update models-dev bumps the input (e.g. for a new remeda/zod version), the hashes will silently go stale and the Nix build will fail with a hash mismatch. Options to make this more robust:

  1. Derive the hashes from modelsDevSrc at evaluation time — read them out of ${modelsDevSrc}/bun.lock using builtins.fromJSON / a small Nix expression instead of hardcoding them. This keeps them auto-synced.
  2. Or add a nix/update-models-dev-deps.sh helper that re-extracts the hashes and is called from update-models-dev-pricing, and document it clearly.
  3. Minimum viable: add a CI check that verifies the hashes match bun.lock, so a stale hash is caught early rather than at user build time.

🟡 Issue — persist-credentials: false with bare git push

File: .github/workflows/update-pricing.yaml (both jobs)

- uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10
  with:
    persist-credentials: false   # removes the default credential helper
...
- run: git push --force-with-lease origin "$branch"  # ← no credentials configured

With persist-credentials: false, the default GITHUB_TOKEN-backed credential helper is removed. The bare git push will fail unless credentials are re-established somewhere. If .github/actions/setup-nix calls gh auth setup-git, that is sufficient — but it should be explicitly documented. If it does not, adding an explicit step is needed:

- name: Configure git credentials
  run: gh auth setup-git
  env:
    GH_TOKEN: ${{ github.token }}

🟡 Issue — load_models_dev_json_missing iterates into_values(), losing provider keys

File: rust/crates/ccusage/src/pricing.rs, line ~278

for provider in raw.into_values() {
    for (model_key, model) in provider.models {
        let model_id = model.id.unwrap_or(model_key);
        // stored under model_id only

The provider identifier (e.g. "anthropic") is dropped. For entries where model.id is absent (so model_id = model_key), provider-namespaced keys like anthropic/claude-fable-5 that appear in models.dev won't be stored under their namespaced form — they'll only be found as claude-fable-5. This is probably intentional given the Anthropic-first scope, but it means users who log usage with provider-prefixed model names will get a miss from the embedded fallback. A comment confirming this is deliberate would help:

// Provider namespace is intentionally dropped here; ccusage looks up models
// by their canonical id (e.g. "claude-fable-5"), not by provider-prefixed keys
// (e.g. "anthropic/claude-fable-5"). The network fallback's fuzzy matching
// handles provider prefixes; the embedded snapshot does not.
for provider in raw.into_values() {

🟢 Nit — gen-models-dev-pricing copies from a Nix store path; the --no-link flag is implicit

File: justfile

gen-models-dev-pricing:
    cp "$(nix build .#models-dev-pricing --no-link --print-out-paths)" \
       rust/crates/ccusage/src/models-dev-pricing.json
    chmod u+w rust/crates/ccusage/src/models-dev-pricing.json
    nix fmt rust/crates/ccusage/src/models-dev-pricing.json

The sequence is correct. One note: nix build with --print-out-paths prints the path of the output file (which is the JSON itself since the derivation's $out is the file). cp here works because the derivation writes directly to $out. This is fine, but a brief comment would aid future readers who expect $out to be a directory.


🟢 Nit — KEEP regex also matches non-Anthropic providers

File: nix/models-dev-gen.ts

const KEEP = /claude|anthropic/i;

anthropic will also match hypothetical third-party providers whose name contains that string (e.g. "not-anthropic" would match). For a snapshot that only needs to cover Anthropic/Claude models this is fine in practice, but tightening to /^(claude|anthropic)/i or using word boundaries would be more precise.


✅ Positive observations

  • The three-tier find() chain with fallback flags is clean and makes the offline-vs-network distinction explicit.
  • embedded_models_dev_pricing() correctly initialises a plain PricingMap::default() (no flags set) and queries it via find_entry() — no risk of infinite recursion or double-fallback.
  • load_models_dev_json_missing correctly skips entries already present in the primary map (if self.entries.contains_key(&model_id) { continue; }), preserving LiteLLM precedence.
  • Stable key ordering (sortObject) in the generator keeps diffs minimal — good for automated PRs.
  • force-with-lease over force in the workflow — safe push pattern.
  • Per-job permissions scoped to minimum required (contents: write, pull-requests: write).

CodeRabbit pointed out that the embedded models.dev fallback ignored parse failures from the compile-time snapshot. Treat that as a build-time invariant violation instead of silently returning an empty fallback map.

Keep the claude-fable-5 offline test because it is the concrete motivation for this PR, but make the assertion failure explain that the pinned models.dev snapshot is expected to contain it.
@ryoppippi

Copy link
Copy Markdown
Member Author

@coderabbitai Addressed the valid runtime invariant from your latest analysis in 85c285b: embedded models.dev snapshot parse failures now fail fast instead of silently producing an empty fallback map. The OUTFILE validation and workflow serialization items were already fixed in ed99589. I kept the claude-fable-5 assertion intentionally because it is the concrete regression target for this PR, and added a clear assertion message. Validation: focused cargo test for embedded_models_dev, just fmt, and pre-push clippy/treefmt/gitleaks/cargo test passed.

@ryoppippi

Copy link
Copy Markdown
Member Author

@coderabbitai review
@cubic-dev-ai review

@cubic-dev-ai

cubic-dev-ai Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

@coderabbitai review
@cubic-dev-ai review

@ryoppippi I have started the AI code review. It will take a few minutes to complete.

@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify the fixes in the two commits
git show 85c285b6 --stat
echo "---"
git show 85c285b6 -- rust/crates/ccusage/src/pricing.rs | head -80

echo "==========================="
git show ed995893 --stat
echo "---"
git show ed995893 -- .github/workflows/update-pricing.yaml | head -100
git show ed995893 -- nix/models-dev-gen.ts | head -50

Length of output: 6674



@ryoppippi: Confirmed all three fixes via the diffs:

  • 85c285b.expect("embedded models-dev-pricing.json must parse") correctly converts a silent no-op into a build-time invariant violation. The claude-fable-5 assertion message clearly communicates its intent as a regression anchor. ✓
  • ed99589persist-credentials: false added to both checkout steps, needs: update-pricing serialises the jobs (accepted tradeoff over full parallelism), and the OUTFILE guard is explicit and correct. ✓

Everything looks good. Thanks for the thorough follow-up!

(ᵔ◡ᵔ)🐇


If you found this review helpful, would you consider giving us a shout-out on X?

Thank you for using CodeRabbit!

@pullfrog pullfrog Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ No new issues found.

Reviewed changes — the two follow-up commits harden CI, fix recipe ordering, validate env vars, and strengthen tests.

  • CI hardening — both checkout steps now use persist-credentials: false; update-models-dev-pricing is serialized after update-pricing via needs:; the LiteLLM job captures pre-update JSON to compare content and skip lock-only churn.
  • Recipe ordering fixupdate-models-dev-pricing now runs nix flake update models-dev before just gen-models-dev-pricing and just check.
  • OUTFILE validationnix/models-dev-gen.ts replaces process.env.OUTFILE! with an explicit null/empty check and a descriptive error.
  • Test assertion strengtheningoffline_prices_new_anthropic_model_from_embedded_models_dev now asserts the entry exists instead of conditionally returning.

Pullfrog  | View workflow run | Using DeepSeek Pro (free via Pullfrog for OSS) | 𝕏

@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "=== rust/crates/ccusage/src/pricing.rs ==="
cat rust/crates/ccusage/src/pricing.rs

echo ""
echo "=== nix/models-dev-gen.ts ==="
cat nix/models-dev-gen.ts

echo ""
echo "=== .github/workflows/update-pricing.yaml ==="
cat .github/workflows/update-pricing.yaml

echo ""
echo "=== justfile ==="
cat justfile

echo ""
echo "=== nix/models-dev-pricing.nix ==="
cat nix/models-dev-pricing.nix

echo ""
echo "=== rust/crates/ccusage/src/models-dev-pricing.json (first 50 lines) ==="
head -50 rust/crates/ccusage/src/models-dev-pricing.json 2>/dev/null || echo "File not found"

Length of output: 89893



@ryoppippi: The two confirmed commits look good.

Addressed ✅

  • embedded_models_dev_pricing() now .expect("embedded models-dev-pricing.json must parse") — fail-fast is correct.
  • OUTFILE guard in nix/models-dev-gen.ts now throws a typed, descriptive error.
  • claude-fable-5 assertion now carries a clear message — the intent is self-documenting.
  • needs: update-pricing keeping sequential execution is a valid resolution of the concurrent flake.lock concern I raised; the two jobs now can't race on the same main HEAD.

One remaining question — persist-credentials: false + bare git push

Both jobs still check out with persist-credentials: false and then run git push --force-with-lease origin "$branch" inside the "Create pull request" step (where GH_TOKEN is set as an env var). GH_TOKEN is consumed by the gh CLI for PR operations, but plain git push over HTTPS won't use it unless gh auth setup-git (or an equivalent credential helper configuration) has been called first.

If .github/actions/setup-nix calls gh auth setup-git or sets up a credential helper, this is fine — could you confirm? If it doesn't, an explicit step is needed:

- name: Configure git credentials
  run: gh auth setup-git
  env:
    GH_TOKEN: ${{ github.token }}

Remaining low-priority items (not blocking, can be addressed in follow-up)

  • SRI hashes in nix/models-dev-pricing.nix — still require manual sync when nix flake update models-dev bumps the input; no automation was added. The existing comment is sufficient for now, but a follow-up to automate hash extraction would remove the silent build-failure footgun.
  • Provider key comment in load_models_dev_json_missing — dropping the provider namespace is intentional (Anthropic-first, find() handles prefix variants via fuzzy matching) but undocumented. Minor nit.
  • KEEP = /claude|anthropic/i — still matches substrings of "not-anthropic"; tightening to /\b(claude|anthropic)\b/i or anchoring is cosmetic but would be more precise.

Overall the PR is in solid shape — the three-tier fallback design, test coverage, and automation are all clean. Just need to confirm the credential helper question before merging.

@pullfrog pullfrog Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ No new issues found.

Reviewed changes — a single follow-up commit makes the embedded snapshot parse failure a hard error and adds a descriptive assertion message.

  • Fail-fast on parse errorsembedded_models_dev_pricing() now calls .expect("...") on load_models_dev_json_missing so a corrupt committed snapshot halts immediately instead of being silently skipped.
  • Descriptive assertion — the claude-fable-5 resolution test uses assert!(..., "...") with a failure message instead of a bare assertion.

Pullfrog  | View workflow run | Using DeepSeek Pro (free via Pullfrog for OSS) | 𝕏

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 10 files

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread .github/workflows/update-pricing.yaml
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

ccusage performance comparison

PR SHA: e7be91e324b6
Base SHA: fae44a52f183

This compares the Rust PR release binary against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package SHA Execution setup Bunx temp cache Bunx warm median Warm samples
Base pkg.pr.new fae44a52f183 708.1ms 645.7ms 31.5ms 3
PR pkg.pr.new e7be91e 1.245s 901.7ms 31.1ms 3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: fae44a52f183; PR package: e7be91e. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
bunx -p <pkg> ccusage claude --offline --json 1.01 GiB 550.3ms 558.2ms 0.99x 325.95 MiB 309.95 MiB 0.95x 1.83 GiB/s 1.80 GiB/s
bunx -p <pkg> ccusage codex --offline --json 1.01 GiB 366.6ms 370.2ms 0.99x 68.08 MiB 77.20 MiB 1.13x 2.75 GiB/s 2.72 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command Runtime Input Median Throughput Samples
claude --offline --json Package wrapper 1.01 GiB 557.5ms 1.81 GiB/s 1
claude --offline --json Installed native binary 1.01 GiB 534.6ms 1.88 GiB/s 1
codex --offline --json Package wrapper 1.01 GiB 368.2ms 2.73 GiB/s 1
codex --offline --json Installed native binary 1.01 GiB 335.9ms 3.00 GiB/s 1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude daily --offline --json 0.00 MiB 30.9ms 4.4ms 7.08x 43.73 MiB 2.70 MiB 0.06x 0.05 MiB/s 0.35 MiB/s
claude session --offline --json 0.00 MiB 30.5ms 4.3ms 7.16x 43.48 MiB 2.70 MiB 0.06x 0.05 MiB/s 0.36 MiB/s
codex daily --offline --json 0.00 MiB 29.6ms 3.8ms 7.72x 43.54 MiB 2.70 MiB 0.06x 0.03 MiB/s 0.22 MiB/s
codex session --offline --json 0.00 MiB 30.5ms 3.8ms 8.04x 43.61 MiB 2.70 MiB 0.06x 0.03 MiB/s 0.23 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude --offline --json 1.01 GiB 563.2ms 525.5ms 1.07x - 328.33 MiB - 1.79 GiB/s 1.92 GiB/s
codex --offline --json 1.01 GiB 367.8ms 338.8ms 1.09x 81.70 MiB 79.70 MiB 0.98x 2.74 GiB/s 2.97 GiB/s

Artifact size

Artifact Base PR Delta Ratio
packed ccusage-*.tgz 17.30 KiB 17.30 KiB +0.00 KiB 1.00x
installed native package binary 3353.74 KiB 3417.74 KiB +64.00 KiB 0.98x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

ccusage performance comparison

PR SHA: e7be91e324b6
Base SHA: fae44a52f183

This compares the PR package against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package SHA Execution setup Bunx temp cache Bunx warm median Warm samples
Base pkg.pr.new fae44a52f183 628.4ms 543.3ms 32.4ms 3
PR pkg.pr.new e7be91e 742.3ms 513.0ms 32.6ms 3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: fae44a52f183; PR package: e7be91e. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
bunx -p <pkg> ccusage claude --offline --json 1.01 GiB 573.0ms 569.6ms 1.01x 323.83 MiB 292.58 MiB 0.90x 1.76 GiB/s 1.77 GiB/s
bunx -p <pkg> ccusage codex --offline --json 1.01 GiB 369.0ms 374.9ms 0.98x 68.08 MiB 79.83 MiB 1.17x 2.73 GiB/s 2.69 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command Runtime Input Median Throughput Samples
claude --offline --json Package wrapper 1.01 GiB 558.9ms 1.80 GiB/s 1
claude --offline --json Installed native binary 1.01 GiB 528.8ms 1.90 GiB/s 1
codex --offline --json Package wrapper 1.01 GiB 378.3ms 2.66 GiB/s 1
codex --offline --json Installed native binary 1.01 GiB 350.0ms 2.88 GiB/s 1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude daily --offline --json 0.00 MiB 30.5ms 30.1ms 1.01x 43.48 MiB 43.61 MiB 1.00x 0.05 MiB/s 0.05 MiB/s
claude session --offline --json 0.00 MiB 29.8ms 30.1ms 0.99x 43.48 MiB 43.61 MiB 1.00x 0.05 MiB/s 0.05 MiB/s
codex daily --offline --json 0.00 MiB 30.3ms 30.6ms 0.99x 43.61 MiB 43.61 MiB 1.00x 0.03 MiB/s 0.03 MiB/s
codex session --offline --json 0.00 MiB 30.3ms 30.5ms 0.99x 43.48 MiB 43.61 MiB 1.00x 0.03 MiB/s 0.03 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude --offline --json 1.01 GiB 584.2ms 564.6ms 1.03x - 313.45 MiB - 1.72 GiB/s 1.78 GiB/s
codex --offline --json 1.01 GiB 372.1ms 371.3ms 1.00x 82.20 MiB 74.08 MiB 0.90x 2.71 GiB/s 2.71 GiB/s

Artifact size

Artifact Base PR Delta Ratio
packed ccusage-*.tgz 17.30 KiB 17.30 KiB +0.00 KiB 1.00x
installed native package binary 3353.74 KiB 3417.74 KiB +64.00 KiB 0.98x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

ccusage performance comparison

PR SHA: ed9958934534
Base SHA: fae44a52f183

This compares the PR package against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package SHA Execution setup Bunx temp cache Bunx warm median Warm samples
Base pkg.pr.new fae44a52f183 552.6ms 445.8ms 31.7ms 3
PR pkg.pr.new ed99589 1.735s 726.9ms 33.0ms 3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: fae44a52f183; PR package: ed99589. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
bunx -p <pkg> ccusage claude --offline --json 1.01 GiB 571.0ms 560.8ms 1.02x 319.70 MiB 316.45 MiB 0.99x 1.76 GiB/s 1.80 GiB/s
bunx -p <pkg> ccusage codex --offline --json 1.01 GiB 370.8ms 365.0ms 1.02x 73.95 MiB 82.45 MiB 1.11x 2.72 GiB/s 2.76 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command Runtime Input Median Throughput Samples
claude --offline --json Package wrapper 1.01 GiB 550.7ms 1.83 GiB/s 1
claude --offline --json Installed native binary 1.01 GiB 526.5ms 1.91 GiB/s 1
codex --offline --json Package wrapper 1.01 GiB 364.2ms 2.76 GiB/s 1
codex --offline --json Installed native binary 1.01 GiB 341.7ms 2.95 GiB/s 1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude daily --offline --json 0.00 MiB 29.0ms 30.1ms 0.96x 43.73 MiB 43.73 MiB 1.00x 0.05 MiB/s 0.05 MiB/s
claude session --offline --json 0.00 MiB 30.1ms 30.5ms 0.99x 43.48 MiB - - 0.05 MiB/s 0.05 MiB/s
codex daily --offline --json 0.00 MiB 29.3ms 29.1ms 1.01x 43.61 MiB 43.48 MiB 1.00x 0.03 MiB/s 0.03 MiB/s
codex session --offline --json 0.00 MiB 29.2ms 29.7ms 0.99x 43.61 MiB 43.48 MiB 1.00x 0.03 MiB/s 0.03 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude --offline --json 1.01 GiB 566.9ms 538.8ms 1.05x - 338.58 MiB - 1.78 GiB/s 1.87 GiB/s
codex --offline --json 1.01 GiB 362.7ms 371.0ms 0.98x 72.20 MiB 77.33 MiB 1.07x 2.78 GiB/s 2.71 GiB/s

Artifact size

Artifact Base PR Delta Ratio
packed ccusage-*.tgz 17.30 KiB 17.30 KiB +0.00 KiB 1.00x
installed native package binary 3353.74 KiB 3417.74 KiB +64.00 KiB 0.98x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

ccusage performance comparison

PR SHA: ed9958934534
Base SHA: fae44a52f183

This compares the Rust PR release binary against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package SHA Execution setup Bunx temp cache Bunx warm median Warm samples
Base pkg.pr.new fae44a52f183 1.091s 965.8ms 30.8ms 3
PR pkg.pr.new ed99589 1.130s 728.1ms 32.3ms 3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: fae44a52f183; PR package: ed99589. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
bunx -p <pkg> ccusage claude --offline --json 1.01 GiB 557.8ms 545.5ms 1.02x 315.20 MiB 351.33 MiB 1.11x 1.80 GiB/s 1.85 GiB/s
bunx -p <pkg> ccusage codex --offline --json 1.01 GiB 368.4ms 410.6ms 0.90x 82.08 MiB 78.08 MiB 0.95x 2.73 GiB/s 2.45 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command Runtime Input Median Throughput Samples
claude --offline --json Package wrapper 1.01 GiB 544.8ms 1.85 GiB/s 1
claude --offline --json Installed native binary 1.01 GiB 519.0ms 1.94 GiB/s 1
codex --offline --json Package wrapper 1.01 GiB 374.4ms 2.69 GiB/s 1
codex --offline --json Installed native binary 1.01 GiB 342.5ms 2.94 GiB/s 1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude daily --offline --json 0.00 MiB 29.3ms 4.0ms 7.29x 43.73 MiB 2.70 MiB 0.06x 0.05 MiB/s 0.38 MiB/s
claude session --offline --json 0.00 MiB 29.3ms 3.9ms 7.43x 43.48 MiB 2.70 MiB 0.06x 0.05 MiB/s 0.39 MiB/s
codex daily --offline --json 0.00 MiB 28.6ms 3.7ms 7.77x 43.61 MiB 2.70 MiB 0.06x 0.03 MiB/s 0.23 MiB/s
codex session --offline --json 0.00 MiB 28.0ms 3.6ms 7.81x 43.61 MiB 2.70 MiB 0.06x 0.03 MiB/s 0.24 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude --offline --json 1.01 GiB 576.8ms 510.2ms 1.13x 317.20 MiB 333.45 MiB 1.05x 1.75 GiB/s 1.97 GiB/s
codex --offline --json 1.01 GiB 364.0ms 328.9ms 1.11x 79.45 MiB 77.70 MiB 0.98x 2.77 GiB/s 3.06 GiB/s

Artifact size

Artifact Base PR Delta Ratio
packed ccusage-*.tgz 17.30 KiB 17.30 KiB +0.00 KiB 1.00x
installed native package binary 3353.74 KiB 3417.74 KiB +64.00 KiB 0.98x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

ccusage performance comparison

PR SHA: 85c285b65899
Base SHA: fae44a52f183

This compares the PR package against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package SHA Execution setup Bunx temp cache Bunx warm median Warm samples
Base pkg.pr.new fae44a52f183 1.083s 738.6ms 32.6ms 3
PR pkg.pr.new 85c285b 668.8ms 797.8ms 30.4ms 3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: fae44a52f183; PR package: 85c285b. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
bunx -p <pkg> ccusage claude --offline --json 1.01 GiB 552.4ms 563.2ms 0.98x 336.83 MiB 323.45 MiB 0.96x 1.82 GiB/s 1.79 GiB/s
bunx -p <pkg> ccusage codex --offline --json 1.01 GiB 368.9ms 364.8ms 1.01x 70.83 MiB 72.33 MiB 1.02x 2.73 GiB/s 2.76 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command Runtime Input Median Throughput Samples
claude --offline --json Package wrapper 1.01 GiB 550.4ms 1.83 GiB/s 1
claude --offline --json Installed native binary 1.01 GiB 520.2ms 1.94 GiB/s 1
codex --offline --json Package wrapper 1.01 GiB 359.1ms 2.80 GiB/s 1
codex --offline --json Installed native binary 1.01 GiB 335.2ms 3.00 GiB/s 1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude daily --offline --json 0.00 MiB 29.4ms 28.4ms 1.03x 43.48 MiB 43.48 MiB 1.00x 0.05 MiB/s 0.05 MiB/s
claude session --offline --json 0.00 MiB 29.0ms 28.5ms 1.02x 43.73 MiB 43.48 MiB 0.99x 0.05 MiB/s 0.05 MiB/s
codex daily --offline --json 0.00 MiB 28.2ms 28.7ms 0.99x 43.48 MiB 43.61 MiB 1.00x 0.03 MiB/s 0.03 MiB/s
codex session --offline --json 0.00 MiB 28.4ms 28.2ms 1.01x 43.48 MiB 43.48 MiB 1.00x 0.03 MiB/s 0.03 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude --offline --json 1.01 GiB 554.7ms 553.8ms 1.00x 293.83 MiB 331.20 MiB 1.13x 1.82 GiB/s 1.82 GiB/s
codex --offline --json 1.01 GiB 362.8ms 365.3ms 0.99x 80.20 MiB 81.20 MiB 1.01x 2.78 GiB/s 2.76 GiB/s

Artifact size

Artifact Base PR Delta Ratio
packed ccusage-*.tgz 17.30 KiB 17.30 KiB +0.00 KiB 1.00x
installed native package binary 3353.74 KiB 3417.74 KiB +64.00 KiB 0.98x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

ccusage performance comparison

PR SHA: 85c285b65899
Base SHA: fae44a52f183

This compares the Rust PR release binary against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package SHA Execution setup Bunx temp cache Bunx warm median Warm samples
Base pkg.pr.new fae44a52f183 1.685s 1.935s 30.3ms 3
PR pkg.pr.new 85c285b 1.063s 1.158s 30.8ms 3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: fae44a52f183; PR package: 85c285b. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
bunx -p <pkg> ccusage claude --offline --json 1.01 GiB 572.7ms 566.0ms 1.01x 294.70 MiB 318.08 MiB 1.08x 1.76 GiB/s 1.78 GiB/s
bunx -p <pkg> ccusage codex --offline --json 1.01 GiB 385.6ms 385.2ms 1.00x 68.70 MiB 80.70 MiB 1.17x 2.61 GiB/s 2.61 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command Runtime Input Median Throughput Samples
claude --offline --json Package wrapper 1.01 GiB 584.0ms 1.72 GiB/s 1
claude --offline --json Installed native binary 1.01 GiB 553.1ms 1.82 GiB/s 1
codex --offline --json Package wrapper 1.01 GiB 379.9ms 2.65 GiB/s 1
codex --offline --json Installed native binary 1.01 GiB 359.1ms 2.80 GiB/s 1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude daily --offline --json 0.00 MiB 29.1ms 3.9ms 7.39x 43.61 MiB 2.70 MiB 0.06x 0.05 MiB/s 0.39 MiB/s
claude session --offline --json 0.00 MiB 28.6ms 3.9ms 7.35x 43.61 MiB 2.70 MiB 0.06x 0.05 MiB/s 0.40 MiB/s
codex daily --offline --json 0.00 MiB 28.9ms 3.7ms 7.89x 43.48 MiB 2.70 MiB 0.06x 0.03 MiB/s 0.23 MiB/s
codex session --offline --json 0.00 MiB 29.2ms 3.7ms 7.86x 43.61 MiB 2.70 MiB 0.06x 0.03 MiB/s 0.23 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude --offline --json 1.01 GiB 554.2ms 512.4ms 1.08x 332.70 MiB 317.45 MiB 0.95x 1.82 GiB/s 1.96 GiB/s
codex --offline --json 1.01 GiB 365.8ms 339.4ms 1.08x 77.45 MiB 70.33 MiB 0.91x 2.75 GiB/s 2.97 GiB/s

Artifact size

Artifact Base PR Delta Ratio
packed ccusage-*.tgz 17.30 KiB 17.30 KiB +0.00 KiB 1.00x
installed native package binary 3353.74 KiB 3417.74 KiB +64.00 KiB 0.98x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

Keep the models.dev pricing updater serialized after the LiteLLM updater to avoid concurrent flake.lock edits, but run it with always() so a LiteLLM refresh failure does not skip the independent models.dev refresh.
@ryoppippi

Copy link
Copy Markdown
Member Author

@coderabbitai review
@cubic-dev-ai review

@cubic-dev-ai

cubic-dev-ai Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

@coderabbitai review
@cubic-dev-ai review

@ryoppippi I have started the AI code review. It will take a few minutes to complete.

@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "=== .github/workflows/update-pricing.yaml ==="
cat .github/workflows/update-pricing.yaml

echo ""
echo "=== .github/actions/setup-nix (if exists) ==="
find . -path './.github/actions/setup-nix*' -exec cat {} \;

echo ""
echo "=== rust/crates/ccusage/src/pricing.rs ==="
cat rust/crates/ccusage/src/pricing.rs

echo ""
echo "=== nix/models-dev-gen.ts ==="
cat nix/models-dev-gen.ts

echo ""
echo "=== nix/models-dev-pricing.nix ==="
cat nix/models-dev-pricing.nix

echo ""
echo "=== justfile (update-models-dev-pricing recipe) ==="
cat justfile

Length of output: 92106



@ryoppippi: Thanks for the follow-ups. After re-reading the current state of the files, everything from the previous rounds is addressed except one item that still needs a fix.


🔴 Unresolved — git push will fail without a credential helper

Both jobs check out with persist-credentials: false, which removes the default GITHUB_TOKEN credential helper. Inside the "Create pull request" step, GH_TOKEN is set as an environment variable — but that env var is consumed by the gh CLI, not by git itself. A plain git push --force-with-lease origin "$branch" over HTTPS has no credential source and will fail with a 403/auth error at runtime.

Looking at .github/actions/setup-nix/action.yml, it does not call gh auth setup-git, so there's no credential helper set up anywhere in the chain.

The fix is a one-liner in the "Create pull request" step (or a dedicated step before it):

- name: Create pull request
  env:
    GH_TOKEN: ${{ github.token }}
  run: |
    # Configure git to authenticate using the gh credential helper
    gh auth setup-git

    git config user.name "github-actions[bot]"
    git config user.email "41898282+github-actions[bot]`@users.noreply.github.com`"
    ...

This needs to be applied to both update-pricing and update-models-dev-pricing jobs.


✅ Confirmed addressed from prior rounds

Item Status
embedded_models_dev_pricing().expect("…must parse")
OUTFILE guard in nix/models-dev-gen.ts
claude-fable-5 assertion with clear message
needs: update-pricing + if: ${{ always() }} serialization

🟡 Remaining low-priority nits (not blocking)

  • SRI hashes in nix/models-dev-pricing.nix — still require manual sync when nix flake update models-dev bumps the input; consider a follow-up to automate hash extraction or add a CI guard.
  • load_models_dev_json_missing drops provider namespace — intentional for Anthropic scope, but a one-line comment confirming this would help future maintainers.
  • KEEP = /claude|anthropic/i — still substring-matches; \b(claude|anthropic)\b would be more precise (cosmetic).

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 10 files

Reply with feedback, questions, or to request a fix.

Re-trigger cubic

Comment thread .github/workflows/update-pricing.yaml

@pullfrog pullfrog Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ No new issues found.

Reviewed changes — the single follow-up commit adds if: ${{ always() }} to the update-models-dev-pricing job so a LiteLLM refresh failure does not silently skip the models.dev snapshot refresh.

  • CI resilienceupdate-models-dev-pricing now uses if: ${{ always() }} alongside needs: update-pricing, preserving serialized execution order while preventing a failed LiteLLM update from blocking the independent models.dev refresh.

Pullfrog  | View workflow run | Using DeepSeek Pro (free via Pullfrog for OSS) | 𝕏

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

ccusage performance comparison

PR SHA: ab1eb978574c
Base SHA: fae44a52f183

This compares the PR package against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package SHA Execution setup Bunx temp cache Bunx warm median Warm samples
Base pkg.pr.new fae44a52f183 599.1ms 502.5ms 30.3ms 3
PR pkg.pr.new ab1eb97 636.8ms 734.5ms 30.2ms 3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: fae44a52f183; PR package: ab1eb97. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
bunx -p <pkg> ccusage claude --offline --json 1.01 GiB 576.5ms 636.3ms 0.91x 342.70 MiB 359.83 MiB 1.05x 1.75 GiB/s 1.58 GiB/s
bunx -p <pkg> ccusage codex --offline --json 1.01 GiB 367.3ms 363.4ms 1.01x 75.70 MiB 76.70 MiB 1.01x 2.74 GiB/s 2.77 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command Runtime Input Median Throughput Samples
claude --offline --json Package wrapper 1.01 GiB 554.1ms 1.82 GiB/s 1
claude --offline --json Installed native binary 1.01 GiB 523.2ms 1.92 GiB/s 1
codex --offline --json Package wrapper 1.01 GiB 355.7ms 2.83 GiB/s 1
codex --offline --json Installed native binary 1.01 GiB 338.4ms 2.98 GiB/s 1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude daily --offline --json 0.00 MiB 28.1ms 28.0ms 1.00x 43.73 MiB 43.48 MiB 0.99x 0.05 MiB/s 0.06 MiB/s
claude session --offline --json 0.00 MiB 28.2ms 28.4ms 0.99x 43.48 MiB 43.48 MiB 1.00x 0.05 MiB/s 0.05 MiB/s
codex daily --offline --json 0.00 MiB 29.2ms 28.8ms 1.01x 43.48 MiB - - 0.03 MiB/s 0.03 MiB/s
codex session --offline --json 0.00 MiB 28.4ms 28.7ms 0.99x 43.48 MiB 43.48 MiB 1.00x 0.03 MiB/s 0.03 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude --offline --json 1.01 GiB 570.4ms 549.6ms 1.04x 318.70 MiB 320.70 MiB 1.01x 1.77 GiB/s 1.83 GiB/s
codex --offline --json 1.01 GiB 356.2ms 360.5ms 0.99x 82.20 MiB 72.45 MiB 0.88x 2.83 GiB/s 2.79 GiB/s

Artifact size

Artifact Base PR Delta Ratio
packed ccusage-*.tgz 17.30 KiB 17.30 KiB +0.00 KiB 1.00x
installed native package binary 3353.74 KiB 3417.74 KiB +64.00 KiB 0.98x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

ccusage performance comparison

PR SHA: ab1eb978574c
Base SHA: fae44a52f183

This compares the Rust PR release binary against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package SHA Execution setup Bunx temp cache Bunx warm median Warm samples
Base pkg.pr.new fae44a52f183 406.1ms 747.8ms 33.1ms 3
PR pkg.pr.new ab1eb97 601.4ms 609.0ms 32.6ms 3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: fae44a52f183; PR package: ab1eb97. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
bunx -p <pkg> ccusage claude --offline --json 1.01 GiB 573.6ms 568.1ms 1.01x 327.08 MiB 309.20 MiB 0.95x 1.76 GiB/s 1.77 GiB/s
bunx -p <pkg> ccusage codex --offline --json 1.01 GiB 374.9ms 424.3ms 0.88x 66.83 MiB 77.45 MiB 1.16x 2.69 GiB/s 2.37 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command Runtime Input Median Throughput Samples
claude --offline --json Package wrapper 1.01 GiB 575.4ms 1.75 GiB/s 1
claude --offline --json Installed native binary 1.01 GiB 550.2ms 1.83 GiB/s 1
codex --offline --json Package wrapper 1.01 GiB 377.5ms 2.67 GiB/s 1
codex --offline --json Installed native binary 1.01 GiB 350.7ms 2.87 GiB/s 1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude daily --offline --json 0.00 MiB 30.4ms 4.2ms 7.21x 43.61 MiB 2.70 MiB 0.06x 0.05 MiB/s 0.37 MiB/s
claude session --offline --json 0.00 MiB 30.2ms 4.2ms 7.16x 43.73 MiB 2.70 MiB 0.06x 0.05 MiB/s 0.37 MiB/s
codex daily --offline --json 0.00 MiB 30.4ms 3.9ms 7.82x 43.42 MiB 2.70 MiB 0.06x 0.03 MiB/s 0.22 MiB/s
codex session --offline --json 0.00 MiB 30.4ms 3.9ms 7.73x 43.61 MiB 2.70 MiB 0.06x 0.03 MiB/s 0.22 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude --offline --json 1.01 GiB 563.7ms 536.0ms 1.05x 330.58 MiB 325.95 MiB 0.99x 1.79 GiB/s 1.88 GiB/s
codex --offline --json 1.01 GiB 381.4ms 341.6ms 1.12x 72.20 MiB 81.33 MiB 1.13x 2.64 GiB/s 2.95 GiB/s

Artifact size

Artifact Base PR Delta Ratio
packed ccusage-*.tgz 17.30 KiB 17.30 KiB +0.00 KiB 1.00x
installed native package binary 3353.74 KiB 3417.74 KiB +64.00 KiB 0.98x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

Keep checkout credential persistence disabled in the pricing update workflow while restoring an explicit authentication path for later git push commands.

Both pricing automation jobs already receive GH_TOKEN for gh CLI operations. Running gh auth setup-git configures Git to use that token-backed auth before pushing the automation branches, so the workflow no longer depends on credentials persisted by actions/checkout.
@ryoppippi

Copy link
Copy Markdown
Member Author

@coderabbitai review
@cubic-dev-ai review

@cubic-dev-ai

cubic-dev-ai Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

@coderabbitai review
@cubic-dev-ai review

@ryoppippi I have started the AI code review. It will take a few minutes to complete.

@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
echo "=== .github/workflows/update-pricing.yaml ==="
cat .github/workflows/update-pricing.yaml

echo ""
echo "=== .github/actions/setup-nix/action.yml (if exists) ==="
find . -path './.github/actions/setup-nix*' | xargs cat 2>/dev/null || echo "Not found"

echo ""
echo "=== git log --oneline -8 ==="
git log --oneline -8

Length of output: 9347



@ryoppippi: Confirmed — commit 70dca9e resolves the last blocking issue. Both jobs now call gh auth setup-git (with GH_TOKEN in env) before git push --force-with-lease, which correctly re-establishes the credential helper after persist-credentials: false. ✓

All blocking issues are now addressed:

Item Commit Status
embedded_models_dev_pricing() fail-fast .expect() 85c285b
OUTFILE guard in nix/models-dev-gen.ts ed99589
claude-fable-5 assertion message 85c285b
needs: update-pricing + if: always() ab1eb97
gh auth setup-git before git push 70dca9e

The three remaining low-priority nits (SRI hash automation in nix/models-dev-pricing.nix, a comment on provider-key dropping in load_models_dev_json_missing, and tightening KEEP = /claude|anthropic/i) are all non-blocking and fine to handle in follow-up PRs. The PR is in solid shape — good to merge.

(ᵔ◡ᵔ)🐇

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 10 files

Re-trigger cubic

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

ccusage performance comparison

PR SHA: 70dca9e6718f
Base SHA: fae44a52f183

This compares the Rust PR release binary against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package SHA Execution setup Bunx temp cache Bunx warm median Warm samples
Base pkg.pr.new fae44a52f183 558.1ms 628.3ms 30.3ms 3
PR pkg.pr.new 70dca9e 1.098s 841.2ms 30.3ms 3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: fae44a52f183; PR package: 70dca9e. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
bunx -p <pkg> ccusage claude --offline --json 1.01 GiB 547.6ms 570.7ms 0.96x 324.83 MiB 323.95 MiB 1.00x 1.84 GiB/s 1.76 GiB/s
bunx -p <pkg> ccusage codex --offline --json 1.01 GiB 372.4ms 370.8ms 1.00x 85.83 MiB 73.20 MiB 0.85x 2.70 GiB/s 2.72 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command Runtime Input Median Throughput Samples
claude --offline --json Package wrapper 1.01 GiB 557.1ms 1.81 GiB/s 1
claude --offline --json Installed native binary 1.01 GiB 532.0ms 1.89 GiB/s 1
codex --offline --json Package wrapper 1.01 GiB 376.3ms 2.68 GiB/s 1
codex --offline --json Installed native binary 1.01 GiB 338.6ms 2.97 GiB/s 1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude daily --offline --json 0.00 MiB 29.0ms 3.9ms 7.43x 43.48 MiB 2.70 MiB 0.06x 0.05 MiB/s 0.40 MiB/s
claude session --offline --json 0.00 MiB 28.4ms 3.9ms 7.23x 43.61 MiB 2.70 MiB 0.06x 0.05 MiB/s 0.39 MiB/s
codex daily --offline --json 0.00 MiB 28.0ms 3.8ms 7.30x 43.73 MiB 2.70 MiB 0.06x 0.03 MiB/s 0.22 MiB/s
codex session --offline --json 0.00 MiB 28.5ms 3.7ms 7.78x 43.48 MiB 2.70 MiB 0.06x 0.03 MiB/s 0.23 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs rust/target/release/ccusage directly. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude --offline --json 1.01 GiB 551.1ms 530.5ms 1.04x 295.70 MiB 330.45 MiB 1.12x 1.83 GiB/s 1.90 GiB/s
codex --offline --json 1.01 GiB 363.7ms 333.8ms 1.09x 81.33 MiB 77.45 MiB 0.95x 2.77 GiB/s 3.02 GiB/s

Artifact size

Artifact Base PR Delta Ratio
packed ccusage-*.tgz 17.30 KiB 17.30 KiB +0.00 KiB 1.00x
installed native package binary 3353.74 KiB 3417.74 KiB +64.00 KiB 0.98x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

ccusage performance comparison

PR SHA: 70dca9e6718f
Base SHA: fae44a52f183

This compares the PR package against the configured base package on the same CI runner.

Package runner startup

Execution setup measures any pre-benchmark package materialization used by the execution benchmark. Bunx temp cache measures one bunx -p <url> ccusage --version run with an empty Bun install cache. Warm reuses that cache and reports the median of repeated runs.

Package SHA Execution setup Bunx temp cache Bunx warm median Warm samples
Base pkg.pr.new fae44a52f183 669.7ms 664.9ms 33.4ms 3
PR pkg.pr.new 70dca9e 766.5ms 544.6ms 32.7ms 3

Cached bunx execution performance

Runs the same large fixture through bunx -p <pkg.pr.new URL> ccusage after the Bun install cache has already been populated by the startup measurement. This separates cached package-runner execution from first-fetch package materialization.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base package: fae44a52f183; PR package: 70dca9e. Both run through bunx -p <pkg.pr.new URL> ccusage using the warmed Bun install cache from package runner startup, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
bunx -p <pkg> ccusage claude --offline --json 1.01 GiB 583.1ms 570.2ms 1.02x 324.20 MiB 324.70 MiB 1.00x 1.73 GiB/s 1.77 GiB/s
bunx -p <pkg> ccusage codex --offline --json 1.01 GiB 374.5ms 382.9ms 0.98x 79.58 MiB 82.45 MiB 1.04x 2.69 GiB/s 2.63 GiB/s

Package runtime diagnostics

Compares the PR package wrapper, the installed native optional dependency binary, and the workspace release binary on the same large fixture. This identifies whether slow package results come from JavaScript wrapper overhead, the published native binary build, or the Rust core itself.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
All rows run --offline --json, measured by hyperfine with 0 warmups and 1 runs. This isolates wrapper overhead from the installed native optional dependency and the workspace release binary built on the runner.

Command Runtime Input Median Throughput Samples
claude --offline --json Package wrapper 1.01 GiB 587.5ms 1.71 GiB/s 1
claude --offline --json Installed native binary 1.01 GiB 541.2ms 1.86 GiB/s 1
codex --offline --json Package wrapper 1.01 GiB 371.0ms 2.71 GiB/s 1
codex --offline --json Installed native binary 1.01 GiB 340.6ms 2.96 GiB/s 1

Committed fixture performance

Committed small fixtures for stable PR-to-PR feedback and explicit Claude/Codex command coverage.

Fixtures: Claude apps/ccusage/test/fixtures/claude (0.00 MiB, 2 files), Codex apps/ccusage/test/fixtures/codex (0.00 MiB, 1 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 2 warmups and 7 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude daily --offline --json 0.00 MiB 31.2ms 31.1ms 1.00x - 43.61 MiB - 0.05 MiB/s 0.05 MiB/s
claude session --offline --json 0.00 MiB 30.7ms 30.6ms 1.00x 43.61 MiB 43.61 MiB 1.00x 0.05 MiB/s 0.05 MiB/s
codex daily --offline --json 0.00 MiB 30.5ms 30.0ms 1.02x 43.61 MiB 43.48 MiB 1.00x 0.03 MiB/s 0.03 MiB/s
codex session --offline --json 0.00 MiB 30.4ms 30.1ms 1.01x 43.48 MiB - - 0.03 MiB/s 0.03 MiB/s

Large real-world-shaped fixture performance

Generated fixtures shaped from aggregate local log statistics: thousands of JSONL files, many small sessions, and a long tail of larger sessions. No real prompts, paths, or outputs are stored in the fixtures.

Fixtures: Claude /home/runner/work/_temp/ccusage-large-fixture (1.01 GiB, 2,597 files), Codex /home/runner/work/_temp/ccusage-large-codex-fixture (1.01 GiB, 2,597 files)
Base runs the published ccusage package from pkg.pr.new, installed before measurement; PR runs the published ccusage package from pkg.pr.new, installed before measurement. Both run --offline --json, measured by hyperfine with 0 warmups and 1 runs.
Peak RSS is measured separately with /usr/bin/time using 1 runs. Lower RSS ratios are better.

Command Input Base median PR median PR vs base Base peak RSS PR peak RSS PR/base RSS Base throughput PR throughput
claude --offline --json 1.01 GiB 565.5ms 561.5ms 1.01x 326.70 MiB - - 1.78 GiB/s 1.79 GiB/s
codex --offline --json 1.01 GiB 417.1ms 369.8ms 1.13x 81.83 MiB 81.70 MiB 1.00x 2.41 GiB/s 2.72 GiB/s

Artifact size

Artifact Base PR Delta Ratio
packed ccusage-*.tgz 17.30 KiB 17.30 KiB +0.00 KiB 1.00x
installed native package binary 3353.74 KiB 3417.74 KiB +64.00 KiB 0.98x

Lower medians and smaller artifacts are better. CI runner noise still applies; use same-run ratios as directional PR feedback, not release guarantees.

@ryoppippi ryoppippi merged commit 15fd42a into main Jun 9, 2026
40 checks passed
@ryoppippi ryoppippi deleted the feat/embed-models-dev-pricing branch June 9, 2026 23:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant