[agentic-token-optimizer] AIC Optimization: Smoke Copilot (325 avg AIC/run, 15% failure rate)

### Target Workflow

**Smoke Copilot** &mdash; selected as highest-AIC eligible workflow (325.64 avg AIC/run) with no prior optimization in the last 14 days. The workflow validates 15 engine integration tests per run and triggers on every PR, making it a high-frequency AIC consumer.

### Analysis Period & Runs Audited

| | Value |
|---|---|
| Analysis window | 7-day (2026-06-16 &ndash; 2026-06-23) |
| Runs in snapshot | 3 |
| Historical runs reviewed | 20 (via GitHub API) |
| Workflow source | `.github/workflows/smoke-copilot.md` |

### Cost Profile

| Metric | Value |
|---|---|
| Total AIC (7-day) | 976.93 |
| Avg AIC/run | 325.64 |
| AIC range (3 runs) | 147 &ndash; 477 |
| Input tokens (audited run) | 1,192,459 |
| Output tokens (audited run) | 18,037 |
| Cache hit rate | 91% |
| Avg action-minutes/run | 14 |
| Firewall block rate | 72&ndash;74% of requests blocked |
| **Failure rate (last 20 runs)** | **15% (3 of 20)** |

> Cache hit rate is excellent at 91%, meaning the system prompt is effectively cached across runs. The primary cost drivers are: (1) the large system prompt (~1.2M input tokens per run), (2) multi-step output generation, and (3) a 15% failure rate generating wasted re-run AIC.

### Ranked Recommendations

#### Recommendation 1 &mdash; Reduce `safe_outputs` failure rate by simplifying Output section

**Estimated AIC savings: ~30&ndash;45 AIC/run equivalent**

**Evidence:** 2 of 3 run failures (27885948833, 27828520670) failed in the `safe_outputs` stage at the "Process Safe Outputs" step. The agent step completed successfully in both cases, indicating the failure is in how the agent structured its safe-output calls.

The Output section (steps 1&ndash;5) has interacting failure modes:
- `set_issue_type` uses the temporary ID `aw_smoke1` set by a prior `create_issue` &mdash; ordering dependency
- The 2-call `add_comment` budget constraint is repeated in the Output section even though it's already declared under `## Hard Limit`, creating potential double-interpretation
- Conditional add_comment paths (PR vs non-PR) are described in verbose per-step prose, increasing the chance the agent misroutes a call

**Action:**
1. Replace the per-step conditional prose in the Output section with a single compact decision table:
   ```
   | Trigger | add_comment #1 | add_comment #2 |
   |---------|---------------|---------------|
   | pull_request | PR summary (auto-target PR) | skip |
   | other | discussion #7 result | fun comment on aw_smoke_discussion |
   ```
2. Remove the redundant add_comment budget reminder from the Output section (it is already declared in `## Hard Limit`)
3. Make the `set_issue_type` dependency explicit: "Call `set_issue_type` immediately after `create_issue` succeeds"
4. Move `add_labels`/`remove_labels` logic to an explicit "Label Actions" block separate from the output summary

> Reducing the failure rate from 15% to ~5% saves approximately 0.10 &times; 325 = **32.5 AIC/run equivalent** (fewer wasted re-runs).

---

#### Recommendation 2 &mdash; Scope `github-queries-mcp-script.md` to the one used script

**Estimated AIC savings: ~3&ndash;5 AIC/run on cache write events; cleaner context**

**Evidence:** The import `shared/github-queries-mcp-script.md` (16,546 bytes, 445 lines) defines three MCP scripts:
- `github-issue-query` &mdash; **not used** in any of the 15 tests
- `github-pr-query` &mdash; **not used** (test 2 uses `mcpscripts-gh pr list` CLI directly)
- `github-discussion-query` &mdash; **used** in test 7 only

Two-thirds of this 16.5KB import is dead weight in the context window.

**Action:** Replace the full `github-queries-mcp-script.md` import with a minimal import containing only the `github-discussion-query` script definition (~5,500 bytes vs 16,546 bytes), or extract `github-discussion-query` into a standalone shared file and import that.

> At 91% cache hit rate, per-run savings are modest (~0.3 AIC on cached reads). The benefit is largest on cache write events (~4 AIC) and as a reduction in ambient context noise that can confuse tool selection.

---

#### Recommendation 3 &mdash; Remove unused `shared/otlp.md` import

**Estimated AIC savings: <1 AIC/run; hygiene and attack surface reduction**

**Evidence:** None of the 15 smoke tests reference OTLP, Sentry, or Grafana telemetry. The `shared/otlp.md` import:
- Adds `*.grafana.net` and `*.sentry.io` to the allowed network domain list
- Injects OTLP endpoint configuration (secrets references) into the prompt
- Neither domain appears in any smoke test task

**Action:** Remove `- shared/otlp.md` from the `imports:` section of `smoke-copilot.md`. If OTLP tracing is needed for smoke run observability, this is handled by the framework layer, not the agent prompt.

> Primarily a hygiene optimization: reduces unnecessary network grants and removes ~160 tokens of unused prompt context.

---

### Structural Analysis

| Area | Finding |
|---|---|
| Inline sub-agents | Already present (`file-summarizer`); no additional candidates identified &mdash; remaining tests require bash execution with side effects |
| Common setup prefix | No shared setup prefix across sections; each test is independent |
| Model | `gpt-5.4` is required (tests the Copilot engine); model downgrade not applicable |

### Caveats

- Only 3 runs in the 7-day snapshot; AIC range (147&ndash;477) indicates high per-run variability driven by experiment toggles (`caveman`, `subagent_model`) and PR context size
- Root cause of `safe_outputs` failures was inferred from job stage data; full failure log inspection recommended before implementing Recommendation 1
- The 16.5KB MCP script is already cached at 91%+, so token savings from Recommendation 2 are small per run; main value is prompt clarity

### Summary

| Rec | Description | Est. AIC/run savings | Risk |
|-----|-------------|----------------------|------|
| #1 | Simplify Output section conditional logic | ~30&ndash;45 (failure reduction) | Low |
| #2 | Scope MCP script import to discussion-query only | ~3&ndash;5 | Low |
| #3 | Remove OTLP import | <1 | Very low |

**Total conservative estimate: ~35&ndash;50 AIC/run saved**

<details>
<summary>References</summary>

- [&sect;28035178551](https://github.com/github/gh-aw/actions/runs/28035178551) &mdash; audited run (AIC 352, input 1.19M tokens, 91% cache)
- [&sect;28030364860](https://github.com/github/gh-aw/actions/runs/28030364860) &mdash; comparison run (AIC 147, moderate resource profile)
- [&sect;27885948833](https://github.com/github/gh-aw/actions/runs/27885948833) &mdash; safe_outputs failure

</details>







> Generated by [Agentic Workflow AIC Usage Optimizer](https://github.com/github/gh-aw/actions/runs/28038201833) &middot; 140.2 AIC &middot; &#8862; 7.1K &middot; [&#9719;](https://github.com/search?q=repo%3Agithub%2Fgh-aw+is%3Aissue+%22gh-aw-workflow-call-id%3A+github%2Fgh-aw%2Fagentic-token-optimizer%22&type=issues)
> - [x] expires  on Jun 30, 2026, 8:02 AM UTC-08:00

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[agentic-token-optimizer] AIC Optimization: Smoke Copilot (325 avg AIC/run, 15% failure rate) #41074

Target Workflow

Analysis Period & Runs Audited

Cost Profile

Ranked Recommendations

Recommendation 1 — Reduce `safe_outputs` failure rate by simplifying Output section

Recommendation 2 — Scope `github-queries-mcp-script.md` to the one used script

Recommendation 3 — Remove unused `shared/otlp.md` import

Structural Analysis

Caveats

Summary

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

	Value
Analysis window	7-day (2026-06-16 – 2026-06-23)
Runs in snapshot	3
Historical runs reviewed	20 (via GitHub API)
Workflow source	`.github/workflows/smoke-copilot.md`

Metric	Value
Total AIC (7-day)	976.93
Avg AIC/run	325.64
AIC range (3 runs)	147 – 477
Input tokens (audited run)	1,192,459
Output tokens (audited run)	18,037
Cache hit rate	91%
Avg action-minutes/run	14
Firewall block rate	72–74% of requests blocked
Failure rate (last 20 runs)	15% (3 of 20)

Area	Finding
Inline sub-agents	Already present (`file-summarizer`); no additional candidates identified — remaining tests require bash execution with side effects
Common setup prefix	No shared setup prefix across sections; each test is independent
Model	`gpt-5.4` is required (tests the Copilot engine); model downgrade not applicable

Rec	Description	Est. AIC/run savings	Risk
#1	Simplify Output section conditional logic	~30–45 (failure reduction)	Low
#2	Scope MCP script import to discussion-query only	~3–5	Low
#3	Remove OTLP import	<1	Very low

Sunbelt Computer Software

PL/B Language Development and Support

Uh oh!

[agentic-token-optimizer] AIC Optimization: Smoke Copilot (325 avg AIC/run, 15% failure rate) #41074

Description

Target Workflow

Analysis Period & Runs Audited

Cost Profile

Ranked Recommendations

Recommendation 1 — Reduce safe_outputs failure rate by simplifying Output section

Recommendation 2 — Scope github-queries-mcp-script.md to the one used script

Recommendation 3 — Remove unused shared/otlp.md import

Structural Analysis

Caveats

Summary

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Recommendation 1 — Reduce `safe_outputs` failure rate by simplifying Output section

Recommendation 2 — Scope `github-queries-mcp-script.md` to the one used script

Recommendation 3 — Remove unused `shared/otlp.md` import