test: add summary-cache pytest-benchmark suite with CI regression gate#120
Conversation
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
scripts/check_benchmark_regression.py (1)
95-99: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick winA missing benchmark result silently passes the gate.
When a baseline benchmark has no current result, the script only prints a
WARNand continues. If a gated benchmark is renamed, removed, or fails to run, the regression gate will report success and the performance coverage silently disappears. Consider treating a missing gated benchmark as a failure (or a separate exit code) so coverage gaps are surfaced.♻️ Possible approach
failures: list[str] = [] + missing: list[str] = [] for name, base in baseline_means.items(): cur = flat.get(name) if cur is None: - print(f"WARN: no current result for baseline {name!r}; skipping") + print(f"MISSING: no current result for baseline {name!r}") + missing.append(name) continueThen fail (or warn-as-error) when
missingis non-empty, per your policy.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/check_benchmark_regression.py` around lines 95 - 99, The benchmark gating in check_benchmark_regression.py currently skips missing current results in the baseline loop, which can let renamed, removed, or failing benchmarks pass unnoticed. Update the logic around the baseline_means/flat comparison so missing gated benchmarks are collected and treated as a failure (or a distinct non-zero exit condition) instead of only printing a warning. Make the behavior explicit in the final exit path of the script so coverage gaps are surfaced alongside the existing regression checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@benchmarks/baselines.json`:
- Around line 1-15: The benchmark baselines in the summary-cache group are still
using Windows-derived values with 1.5x slack, but the regression gate runs on
ubuntu-latest. Refresh the values from a green ubuntu-latest benchmark artifact
and keep the _note consistent with the source platform; if platform-matched
baselines are not yet available, make the gate non-blocking until they are.
Focus on the baselines.json entries under groups.summary-cache so the
check_benchmark_regression.py comparisons are against the same OS.
In `@pyproject.toml`:
- Around line 39-44: The pytest configuration currently applies --benchmark-skip
globally in [tool.pytest.ini_options], which breaks the regular integration
pytest run because pytest-benchmark is not installed there. Update the test
setup so the benchmark plugin is available wherever this addopts setting is
used, or move the --benchmark-skip option to only the benchmark-specific job;
check the pytest configuration and the workflows that invoke python -m pytest,
especially the benchmark-related markers and options.
---
Nitpick comments:
In `@scripts/check_benchmark_regression.py`:
- Around line 95-99: The benchmark gating in check_benchmark_regression.py
currently skips missing current results in the baseline loop, which can let
renamed, removed, or failing benchmarks pass unnoticed. Update the logic around
the baseline_means/flat comparison so missing gated benchmarks are collected and
treated as a failure (or a distinct non-zero exit condition) instead of only
printing a warning. Make the behavior explicit in the final exit path of the
script so coverage gaps are surfaced alongside the existing regression checks.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 1cc8514e-6127-42e8-95a6-134382f2ed25
📒 Files selected for processing (7)
.github/workflows/tests.yml.gitignorebenchmarks/baselines.jsonpyproject.tomlscripts/check_benchmark_regression.pytests/benchmarks/conftest.pytests/benchmarks/test_summary_cache_bench.py
|
@CodeRabbit, review this PR in more depth |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
echo "=== .github/workflows/tests.yml ==="
cat -n .github/workflows/tests.yml
echo ""
echo "=== pyproject.toml ==="
cat -n pyproject.toml
echo ""
echo "=== scripts/check_benchmark_regression.py ==="
cat -n scripts/check_benchmark_regression.py
echo ""
echo "=== tests/benchmarks/conftest.py ==="
cat -n tests/benchmarks/conftest.py
echo ""
echo "=== tests/benchmarks/test_summary_cache_bench.py ==="
cat -n tests/benchmarks/test_summary_cache_bench.py
echo ""
echo "=== benchmarks/baselines.json ==="
cat -n benchmarks/baselines.json
echo ""
echo "=== .gitignore (last 20 lines) ==="
tail -20 .gitignore
echo ""
echo "=== requirements-lock.txt ==="
cat -n requirements-lock.txt
echo ""
echo "=== services/summary_cache.py ==="
cat -n services/summary_cache.py
echo ""
echo "=== Directory tree of tests/benchmarks/ ==="
tree tests/benchmarks/ 2>/dev/null || find tests/benchmarks/ -type f
echo ""
echo "=== Directory tree of scripts/ ==="
tree scripts/ 2>/dev/null || find scripts/ -type f
echo ""
echo "=== benchmarks/ directory ==="
tree benchmarks/ 2>/dev/null || find benchmarks/ -type fLength of output: 44058
Deep Review: PR
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/check_benchmark_regression.py`:
- Around line 17-19: The normalize_benchmark_name helper currently uses
rsplit("::", 1), which can incorrectly strip parts of parametrized benchmark
names when parameter values include "::". Update normalize_benchmark_name to
split on the first "::" only so pytest node prefixes are removed without
truncating names like test_x[param::v], and keep the behavior consistent for
both short and full benchmark names.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c745c7f7-791c-4802-9b4a-d7dc19639498
📒 Files selected for processing (8)
.github/workflows/tests.ymlbenchmarks/baselines.jsonpyproject.tomlscripts/check_benchmark_regression.pytests/benchmarks/__init__.pytests/benchmarks/conftest.pytests/benchmarks/test_summary_cache_bench.pytests/test_check_benchmark_regression.py
✅ Files skipped from review due to trivial changes (1)
- benchmarks/baselines.json
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/benchmarks/test_summary_cache_bench.py
- .github/workflows/tests.yml
wpak-ai
left a comment
There was a problem hiding this comment.
Update the docstrings and you're ready to go.

Closes #115
Summary
Adds a
pytest-benchmarksuite forservices/summary_cache.pywith synthetic fixtures and a CI regression gate. No production code changes.tests/benchmarks/test_summary_cache_bench.pymeasures cache hit, cache miss, fingerprint cost (10/50/200 workspace entries), and round-trip write+read latencytests/benchmarks/conftest.pybuilds synthetic workspace trees and isolates cache files to temp dirs (no real Cursor storage)benchmarks/baselines.jsonrecords initial gated means (1.5× slack from local reference run; refresh from ubuntu-latest CI artifact after first green job)scripts/check_benchmark_regression.pyfails CI when any benchmark mean exceeds baseline by >20%pyproject.toml: addpytest-benchmarkdev dep; default--benchmark-skipso normalpyteststays fast.github/workflows/tests.yml: newbenchmarksjob on ubuntu-latestSprint item #7 (June Week 4, Wednesday). Sequences before Thursday #8 (unified perf benchmark suite).
Test plan
pytest tests/benchmarks/test_summary_cache_bench.py --benchmark-only -o addopts=python scripts/check_benchmark_regression.py benchmark-results.json benchmarks/baselines.jsonpytest tests/benchmarks/test_summary_cache_bench.py -o addopts=(benchmarks skipped via--benchmark-skip)python -m unittest discover tests -qbenchmarksCI job is green on ubuntu-latestSummary by CodeRabbit
pytest-benchmarkin dev tooling and configured pytest benchmark defaults.