test(e2e): export a fully-featured harness in-project and by ARN by padmak30 · Pull Request #1641 · aws/agentcore-cli · GitHub
Skip to content

test(e2e): export a fully-featured harness in-project and by ARN#1641

Merged
padmak30 merged 2 commits into
mainfrom
test/export-harness-e2e
Jun 25, 2026
Merged

test(e2e): export a fully-featured harness in-project and by ARN#1641
padmak30 merged 2 commits into
mainfrom
test/export-harness-e2e

Conversation

@padmak30

@padmak30 padmak30 commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Description

Adds end-to-end coverage for agentcore export harness across both source modes — in-project (--name) and out-of-project (--arn) — proving each exported Strands runtime agent actually works at runtime, not just that the spec/wiring is generated.

The source harness attaches following export surface together:

  • an existing project memory (referenced by name)
  • an agentcore_code_interpreter tool (managed default)
  • a public GitHub skill (cloned at runtime, no credential)
  • an MCP gateway tool (in-project gateway + mcp-server target)

Flow (e2e-tests/export-harness-full.test.ts):

  1. create --no-agent + add memory + add gateway (mcp-server target)
  2. deploy build(deps): bump diff and @aws-cdk/cloudformation-diff #1 — provisions memory + gateway (ARNs now exist)
  3. add harness attaching the memory (by name) + gateway (by --gateway-arn) + code-interpreter tool + git skill
  4. deploy chore: Add 3rd party licenses #2 — harness with all surfaces
  5. invoke the harness; assert the code interpreter runs
  6. export --name → deploy → verify the in-project agent (memory wired via discovery env var; gateway + CI as connections)
  7. export --arn into a fresh project → deploy → verify the out-of-project agent (every resource external → all wired as connections)

Each exported agent is behaviorally verified via four invokes:

  • code interpreter: exact factorial value (the model can't fabricate it)
  • gateway tool: lists a gateway-prefixed / Exa MCP tool (provider-specific token, not generic prose)
  • skill: references the cloned returns-policy skill
  • memory: same-session round-trip recall

The test is skipIf(!canRun)-gated (AWS creds + npm + git), tears down both projects in afterAll, and uses E2e-prefixed, per-run-unique project names so a failed teardown is still swept by global-setup's stale-stack GC.

Type of Change

  • Other: test-only (adds e2e coverage; no product code changes)

Testing

  • I ran npm run typecheck (clean)
  • I ran npm run lint (clean)
  • Verified live end-to-end against a real AWS account (us-east-1): 7/7 steps pass; both exported agents (in-project + by-ARN) deploy and pass all four capability checks; resources auto-torn-down.

Checklist

  • I have added any necessary tests that prove the feature works
  • My changes generate no new warnings

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

End-to-end coverage for `export harness` across both source modes, proving each
exported agent works at runtime (not just that the spec/wiring is generated).

The source harness attaches every export surface: an existing project memory
(by name), an agentcore_code_interpreter tool, a public GitHub skill, and an MCP
gateway tool. Flow: deploy memory+gateway → create the harness attaching both →
deploy → invoke the harness → export --name (in-project) → deploy → verify →
export --arn (new empty project) → deploy → verify.

Each exported agent is behaviorally verified via four invokes: code interpreter
(exact factorial value), gateway tool (assert the gateway-prefixed/Exa MCP tool
is listed), skill (assert the returns-policy skill is referenced), and memory
(same-session round-trip recall). Verified live: 7/7 steps pass; both projects
torn down in afterAll; project names are E2e-prefixed + per-run-unique so a
failed teardown is still swept by global-setup's stale-stack GC.
@padmak30 padmak30 requested a review from a team June 25, 2026 16:10
@github-actions github-actions Bot added the size/l PR size: L label Jun 25, 2026
@agentcore-devx-automation agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jun 25, 2026
@github-actions github-actions Bot added the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 25, 2026
@agentcore-devx-automation

Copy link
Copy Markdown
Contributor

@agentcore-devx-automation agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jun 25, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Package Tarball

aws-agentcore-0.21.0.tgz

How to install

gh release download pr-1641-tarball --repo aws/agentcore-cli --pattern "*.tgz" --dir /tmp/pr-tarball
npm install -g /tmp/pr-tarball/aws-agentcore-0.21.0.tgz

@agentcore-cli-automation agentcore-cli-automation left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test is well-structured and the per-capability behavioral assertions are a clear improvement over "did it return 200" checks. One concern about flakiness needs to be addressed before merging.

Comment thread e2e-tests/export-harness-full.test.ts
@github-actions github-actions Bot removed the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 25, 2026
@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Coverage Report

Status Category Percentage Covered / Total
🔵 Lines 37.42% 13771 / 36800
🔵 Statements 36.69% 14646 / 39910
🔵 Functions 32% 2356 / 7362
🔵 Branches 31.46% 9180 / 29173
Generated in workflow #3839 for commit e9a124e by the Vitest Coverage Report Action

The per-capability checks asserted on a single LLM response outside the retry —
only invokeAndExpectSuccess's CLI-`success` check was retried. That risks
intermittent CI failures from (a) memory same-session write/read visibility lag
on the recall turn and (b) LLM phrasing nondeterminism on the gateway-tool and
skill checks (a `success: true` response that happens not to name the expected
token).

Add an optional `verify` predicate to invokeAndExpectSuccess that runs INSIDE the
retried unit, and move every content assertion (factorial value, gateway tool
token, skill reference, memory recall) into it — so a flaky sample re-invokes
instead of failing. Mirrors the retry pattern already used in harness-e2e-helper.
@github-actions github-actions Bot added size/l PR size: L and removed size/l PR size: L labels Jun 25, 2026
@agentcore-devx-automation agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jun 25, 2026
@agentcore-devx-automation

Copy link
Copy Markdown
Contributor

@agentcore-devx-automation agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jun 25, 2026
@padmak30 padmak30 merged commit aba397a into main Jun 25, 2026
33 checks passed
@padmak30 padmak30 deleted the test/export-harness-e2e branch June 25, 2026 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/l PR size: L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants