[agentic-token-optimizer] AIC Optimization: Smoke Copilot (325 avg AIC/run, 15% failure rate) · Issue #41074 · github/gh-aw · GitHub
Skip to content

[agentic-token-optimizer] AIC Optimization: Smoke Copilot (325 avg AIC/run, 15% failure rate) #41074

Description

@github-actions

Target Workflow

Smoke Copilot — selected as highest-AIC eligible workflow (325.64 avg AIC/run) with no prior optimization in the last 14 days. The workflow validates 15 engine integration tests per run and triggers on every PR, making it a high-frequency AIC consumer.

Analysis Period & Runs Audited

Value
Analysis window 7-day (2026-06-16 – 2026-06-23)
Runs in snapshot 3
Historical runs reviewed 20 (via GitHub API)
Workflow source .github/workflows/smoke-copilot.md

Cost Profile

Metric Value
Total AIC (7-day) 976.93
Avg AIC/run 325.64
AIC range (3 runs) 147 – 477
Input tokens (audited run) 1,192,459
Output tokens (audited run) 18,037
Cache hit rate 91%
Avg action-minutes/run 14
Firewall block rate 72–74% of requests blocked
Failure rate (last 20 runs) 15% (3 of 20)

Cache hit rate is excellent at 91%, meaning the system prompt is effectively cached across runs. The primary cost drivers are: (1) the large system prompt (~1.2M input tokens per run), (2) multi-step output generation, and (3) a 15% failure rate generating wasted re-run AIC.

Ranked Recommendations

Recommendation 1 — Reduce safe_outputs failure rate by simplifying Output section

Estimated AIC savings: ~30–45 AIC/run equivalent

Evidence: 2 of 3 run failures (27885948833, 27828520670) failed in the safe_outputs stage at the "Process Safe Outputs" step. The agent step completed successfully in both cases, indicating the failure is in how the agent structured its safe-output calls.

The Output section (steps 1–5) has interacting failure modes:

  • set_issue_type uses the temporary ID aw_smoke1 set by a prior create_issue — ordering dependency
  • The 2-call add_comment budget constraint is repeated in the Output section even though it's already declared under ## Hard Limit, creating potential double-interpretation
  • Conditional add_comment paths (PR vs non-PR) are described in verbose per-step prose, increasing the chance the agent misroutes a call

Action:

  1. Replace the per-step conditional prose in the Output section with a single compact decision table:
    | Trigger | add_comment #1 | add_comment #2 |
    |---------|---------------|---------------|
    | pull_request | PR summary (auto-target PR) | skip |
    | other | discussion #7 result | fun comment on aw_smoke_discussion |
    
  2. Remove the redundant add_comment budget reminder from the Output section (it is already declared in ## Hard Limit)
  3. Make the set_issue_type dependency explicit: "Call set_issue_type immediately after create_issue succeeds"
  4. Move add_labels/remove_labels logic to an explicit "Label Actions" block separate from the output summary

Reducing the failure rate from 15% to ~5% saves approximately 0.10 × 325 = 32.5 AIC/run equivalent (fewer wasted re-runs).


Recommendation 2 — Scope github-queries-mcp-script.md to the one used script

Estimated AIC savings: ~3–5 AIC/run on cache write events; cleaner context

Evidence: The import shared/github-queries-mcp-script.md (16,546 bytes, 445 lines) defines three MCP scripts:

  • github-issue-querynot used in any of the 15 tests
  • github-pr-querynot used (test 2 uses mcpscripts-gh pr list CLI directly)
  • github-discussion-queryused in test 7 only

Two-thirds of this 16.5KB import is dead weight in the context window.

Action: Replace the full github-queries-mcp-script.md import with a minimal import containing only the github-discussion-query script definition (~5,500 bytes vs 16,546 bytes), or extract github-discussion-query into a standalone shared file and import that.

At 91% cache hit rate, per-run savings are modest (~0.3 AIC on cached reads). The benefit is largest on cache write events (~4 AIC) and as a reduction in ambient context noise that can confuse tool selection.


Recommendation 3 — Remove unused shared/otlp.md import

Estimated AIC savings: <1 AIC/run; hygiene and attack surface reduction

Evidence: None of the 15 smoke tests reference OTLP, Sentry, or Grafana telemetry. The shared/otlp.md import:

  • Adds *.grafana.net and *.sentry.io to the allowed network domain list
  • Injects OTLP endpoint configuration (secrets references) into the prompt
  • Neither domain appears in any smoke test task

Action: Remove - shared/otlp.md from the imports: section of smoke-copilot.md. If OTLP tracing is needed for smoke run observability, this is handled by the framework layer, not the agent prompt.

Primarily a hygiene optimization: reduces unnecessary network grants and removes ~160 tokens of unused prompt context.


Structural Analysis

Area Finding
Inline sub-agents Already present (file-summarizer); no additional candidates identified — remaining tests require bash execution with side effects
Common setup prefix No shared setup prefix across sections; each test is independent
Model gpt-5.4 is required (tests the Copilot engine); model downgrade not applicable

Caveats

  • Only 3 runs in the 7-day snapshot; AIC range (147–477) indicates high per-run variability driven by experiment toggles (caveman, subagent_model) and PR context size
  • Root cause of safe_outputs failures was inferred from job stage data; full failure log inspection recommended before implementing Recommendation 1
  • The 16.5KB MCP script is already cached at 91%+, so token savings from Recommendation 2 are small per run; main value is prompt clarity

Summary

Rec Description Est. AIC/run savings Risk
#1 Simplify Output section conditional logic ~30–45 (failure reduction) Low
#2 Scope MCP script import to discussion-query only ~3–5 Low
#3 Remove OTLP import <1 Very low

Total conservative estimate: ~35–50 AIC/run saved

References

Generated by Agentic Workflow AIC Usage Optimizer · 140.2 AIC · ⊞ 7.1K ·

  • expires on Jun 30, 2026, 8:02 AM UTC-08:00

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions