{{ message }}
test(e2e): export a fully-featured harness in-project and by ARN#1641
Merged
Conversation
End-to-end coverage for `export harness` across both source modes, proving each exported agent works at runtime (not just that the spec/wiring is generated). The source harness attaches every export surface: an existing project memory (by name), an agentcore_code_interpreter tool, a public GitHub skill, and an MCP gateway tool. Flow: deploy memory+gateway → create the harness attaching both → deploy → invoke the harness → export --name (in-project) → deploy → verify → export --arn (new empty project) → deploy → verify. Each exported agent is behaviorally verified via four invokes: code interpreter (exact factorial value), gateway tool (assert the gateway-prefixed/Exa MCP tool is listed), skill (assert the returns-policy skill is referenced), and memory (same-session round-trip recall). Verified live: 7/7 steps pass; both projects torn down in afterAll; project names are E2e-prefixed + per-run-unique so a failed teardown is still swept by global-setup's stale-stack GC.
Contributor
Contributor
Package TarballHow to installgh release download pr-1641-tarball --repo aws/agentcore-cli --pattern "*.tgz" --dir /tmp/pr-tarball
npm install -g /tmp/pr-tarball/aws-agentcore-0.21.0.tgz |
agentcore-cli-automation
suggested changes
Jun 25, 2026
agentcore-cli-automation
left a comment
There was a problem hiding this comment.
Test is well-structured and the per-capability behavioral assertions are a clear improvement over "did it return 200" checks. One concern about flakiness needs to be addressed before merging.
Contributor
Coverage Report
|
The per-capability checks asserted on a single LLM response outside the retry — only invokeAndExpectSuccess's CLI-`success` check was retried. That risks intermittent CI failures from (a) memory same-session write/read visibility lag on the recall turn and (b) LLM phrasing nondeterminism on the gateway-tool and skill checks (a `success: true` response that happens not to name the expected token). Add an optional `verify` predicate to invokeAndExpectSuccess that runs INSIDE the retried unit, and move every content assertion (factorial value, gateway tool token, skill reference, memory recall) into it — so a flaky sample re-invokes instead of failing. Mirrors the retry pattern already used in harness-e2e-helper.
Contributor
avi-alpert
approved these changes
Jun 25, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Description
Adds end-to-end coverage for
agentcore export harnessacross both source modes — in-project (--name) and out-of-project (--arn) — proving each exported Strands runtime agent actually works at runtime, not just that the spec/wiring is generated.The source harness attaches following export surface together:
agentcore_code_interpretertool (managed default)Flow (
e2e-tests/export-harness-full.test.ts):create --no-agent+ add memory + add gateway (mcp-server target)add harnessattaching the memory (by name) + gateway (by--gateway-arn) + code-interpreter tool + git skillexport --name→ deploy → verify the in-project agent (memory wired via discovery env var; gateway + CI as connections)export --arninto a fresh project → deploy → verify the out-of-project agent (every resource external → all wired as connections)Each exported agent is behaviorally verified via four invokes:
returns-policyskillThe test is
skipIf(!canRun)-gated (AWS creds + npm + git), tears down both projects inafterAll, and usesE2e-prefixed, per-run-unique project names so a failed teardown is still swept by global-setup's stale-stack GC.Type of Change
Testing
npm run typecheck(clean)npm run lint(clean)Checklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.