Target Workflow
Smoke Copilot — selected as highest-AIC eligible workflow (325.64 avg AIC/run) with no prior optimization in the last 14 days. The workflow validates 15 engine integration tests per run and triggers on every PR, making it a high-frequency AIC consumer.
Analysis Period & Runs Audited
|
Value |
| Analysis window |
7-day (2026-06-16 – 2026-06-23) |
| Runs in snapshot |
3 |
| Historical runs reviewed |
20 (via GitHub API) |
| Workflow source |
.github/workflows/smoke-copilot.md |
Cost Profile
| Metric |
Value |
| Total AIC (7-day) |
976.93 |
| Avg AIC/run |
325.64 |
| AIC range (3 runs) |
147 – 477 |
| Input tokens (audited run) |
1,192,459 |
| Output tokens (audited run) |
18,037 |
| Cache hit rate |
91% |
| Avg action-minutes/run |
14 |
| Firewall block rate |
72–74% of requests blocked |
| Failure rate (last 20 runs) |
15% (3 of 20) |
Cache hit rate is excellent at 91%, meaning the system prompt is effectively cached across runs. The primary cost drivers are: (1) the large system prompt (~1.2M input tokens per run), (2) multi-step output generation, and (3) a 15% failure rate generating wasted re-run AIC.
Ranked Recommendations
Recommendation 1 — Reduce safe_outputs failure rate by simplifying Output section
Estimated AIC savings: ~30–45 AIC/run equivalent
Evidence: 2 of 3 run failures (27885948833, 27828520670) failed in the safe_outputs stage at the "Process Safe Outputs" step. The agent step completed successfully in both cases, indicating the failure is in how the agent structured its safe-output calls.
The Output section (steps 1–5) has interacting failure modes:
set_issue_type uses the temporary ID aw_smoke1 set by a prior create_issue — ordering dependency
- The 2-call
add_comment budget constraint is repeated in the Output section even though it's already declared under ## Hard Limit, creating potential double-interpretation
- Conditional add_comment paths (PR vs non-PR) are described in verbose per-step prose, increasing the chance the agent misroutes a call
Action:
- Replace the per-step conditional prose in the Output section with a single compact decision table:
| Trigger | add_comment #1 | add_comment #2 |
|---------|---------------|---------------|
| pull_request | PR summary (auto-target PR) | skip |
| other | discussion #7 result | fun comment on aw_smoke_discussion |
- Remove the redundant add_comment budget reminder from the Output section (it is already declared in
## Hard Limit)
- Make the
set_issue_type dependency explicit: "Call set_issue_type immediately after create_issue succeeds"
- Move
add_labels/remove_labels logic to an explicit "Label Actions" block separate from the output summary
Reducing the failure rate from 15% to ~5% saves approximately 0.10 × 325 = 32.5 AIC/run equivalent (fewer wasted re-runs).
Recommendation 2 — Scope github-queries-mcp-script.md to the one used script
Estimated AIC savings: ~3–5 AIC/run on cache write events; cleaner context
Evidence: The import shared/github-queries-mcp-script.md (16,546 bytes, 445 lines) defines three MCP scripts:
github-issue-query — not used in any of the 15 tests
github-pr-query — not used (test 2 uses mcpscripts-gh pr list CLI directly)
github-discussion-query — used in test 7 only
Two-thirds of this 16.5KB import is dead weight in the context window.
Action: Replace the full github-queries-mcp-script.md import with a minimal import containing only the github-discussion-query script definition (~5,500 bytes vs 16,546 bytes), or extract github-discussion-query into a standalone shared file and import that.
At 91% cache hit rate, per-run savings are modest (~0.3 AIC on cached reads). The benefit is largest on cache write events (~4 AIC) and as a reduction in ambient context noise that can confuse tool selection.
Recommendation 3 — Remove unused shared/otlp.md import
Estimated AIC savings: <1 AIC/run; hygiene and attack surface reduction
Evidence: None of the 15 smoke tests reference OTLP, Sentry, or Grafana telemetry. The shared/otlp.md import:
- Adds
*.grafana.net and *.sentry.io to the allowed network domain list
- Injects OTLP endpoint configuration (secrets references) into the prompt
- Neither domain appears in any smoke test task
Action: Remove - shared/otlp.md from the imports: section of smoke-copilot.md. If OTLP tracing is needed for smoke run observability, this is handled by the framework layer, not the agent prompt.
Primarily a hygiene optimization: reduces unnecessary network grants and removes ~160 tokens of unused prompt context.
Structural Analysis
| Area |
Finding |
| Inline sub-agents |
Already present (file-summarizer); no additional candidates identified — remaining tests require bash execution with side effects |
| Common setup prefix |
No shared setup prefix across sections; each test is independent |
| Model |
gpt-5.4 is required (tests the Copilot engine); model downgrade not applicable |
Caveats
- Only 3 runs in the 7-day snapshot; AIC range (147–477) indicates high per-run variability driven by experiment toggles (
caveman, subagent_model) and PR context size
- Root cause of
safe_outputs failures was inferred from job stage data; full failure log inspection recommended before implementing Recommendation 1
- The 16.5KB MCP script is already cached at 91%+, so token savings from Recommendation 2 are small per run; main value is prompt clarity
Summary
| Rec |
Description |
Est. AIC/run savings |
Risk |
| #1 |
Simplify Output section conditional logic |
~30–45 (failure reduction) |
Low |
| #2 |
Scope MCP script import to discussion-query only |
~3–5 |
Low |
| #3 |
Remove OTLP import |
<1 |
Very low |
Total conservative estimate: ~35–50 AIC/run saved
References
Generated by Agentic Workflow AIC Usage Optimizer · 140.2 AIC · ⊞ 7.1K · ◷
Target Workflow
Smoke Copilot — selected as highest-AIC eligible workflow (325.64 avg AIC/run) with no prior optimization in the last 14 days. The workflow validates 15 engine integration tests per run and triggers on every PR, making it a high-frequency AIC consumer.
Analysis Period & Runs Audited
.github/workflows/smoke-copilot.mdCost Profile
Ranked Recommendations
Recommendation 1 — Reduce
safe_outputsfailure rate by simplifying Output sectionEstimated AIC savings: ~30–45 AIC/run equivalent
Evidence: 2 of 3 run failures (27885948833, 27828520670) failed in the
safe_outputsstage at the "Process Safe Outputs" step. The agent step completed successfully in both cases, indicating the failure is in how the agent structured its safe-output calls.The Output section (steps 1–5) has interacting failure modes:
set_issue_typeuses the temporary IDaw_smoke1set by a priorcreate_issue— ordering dependencyadd_commentbudget constraint is repeated in the Output section even though it's already declared under## Hard Limit, creating potential double-interpretationAction:
## Hard Limit)set_issue_typedependency explicit: "Callset_issue_typeimmediately aftercreate_issuesucceeds"add_labels/remove_labelslogic to an explicit "Label Actions" block separate from the output summaryRecommendation 2 — Scope
github-queries-mcp-script.mdto the one used scriptEstimated AIC savings: ~3–5 AIC/run on cache write events; cleaner context
Evidence: The import
shared/github-queries-mcp-script.md(16,546 bytes, 445 lines) defines three MCP scripts:github-issue-query— not used in any of the 15 testsgithub-pr-query— not used (test 2 usesmcpscripts-gh pr listCLI directly)github-discussion-query— used in test 7 onlyTwo-thirds of this 16.5KB import is dead weight in the context window.
Action: Replace the full
github-queries-mcp-script.mdimport with a minimal import containing only thegithub-discussion-queryscript definition (~5,500 bytes vs 16,546 bytes), or extractgithub-discussion-queryinto a standalone shared file and import that.Recommendation 3 — Remove unused
shared/otlp.mdimportEstimated AIC savings: <1 AIC/run; hygiene and attack surface reduction
Evidence: None of the 15 smoke tests reference OTLP, Sentry, or Grafana telemetry. The
shared/otlp.mdimport:*.grafana.netand*.sentry.ioto the allowed network domain listAction: Remove
- shared/otlp.mdfrom theimports:section ofsmoke-copilot.md. If OTLP tracing is needed for smoke run observability, this is handled by the framework layer, not the agent prompt.Structural Analysis
file-summarizer); no additional candidates identified — remaining tests require bash execution with side effectsgpt-5.4is required (tests the Copilot engine); model downgrade not applicableCaveats
caveman,subagent_model) and PR context sizesafe_outputsfailures was inferred from job stage data; full failure log inspection recommended before implementing Recommendation 1Summary
Total conservative estimate: ~35–50 AIC/run saved
References