fix(templates): key per-session agent state by context.session_id across all HTTP starter templates (#808, #809) by aidandaly24 · Pull Request #1639 · aws/agentcore-cli · GitHub
Skip to content

fix(templates): key per-session agent state by context.session_id across all HTTP starter templates (#808, #809)#1639

Open
aidandaly24 wants to merge 3 commits into
aws:mainfrom
aidandaly24:fix/808-809
Open

fix(templates): key per-session agent state by context.session_id across all HTTP starter templates (#808, #809)#1639
aidandaly24 wants to merge 3 commits into
aws:mainfrom
aidandaly24:fix/808-809

Conversation

@aidandaly24

@aidandaly24 aidandaly24 commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Fixes #808
Fixes #809

Problem

The HTTP starter templates rebuild agent/session state on every invocation, so within one session there is no multi-turn recall (turn 2 can't see turn 1) — directly contradicting agentcore invoke --session-id. The Strands no-memory template is worse: it shares one global _agent across all invocations in a process, so conversation history bleeds between sessions/users in agentcore dev and non-Runtime deployments (Lambda/ECS/FastAPI). Affects strands, openaiagents, googleadk, langchain_langgraph, and autogen.

Fix

Each template now keys in-process agent/session state by context.session_id, bounded to 128 sessions with LRU eviction so a long-running process keeps per-session isolation without growing unbounded:

  • strands / autogen: per-session OrderedDict cache (move_to_end on hit, evict least-recently-used at the cap).
  • openaiagents: @lru_cache(maxsize=128) over per-session SQLiteSession.
  • googleadk: module-scoped InMemorySessionService bounded via an LRU key set, evicting via delete_session.
  • langchain_langgraph: module-scoped InMemorySaver bounded via an LRU thread set, evicting via delete_thread.

In-memory is best-effort (resets on cold start); for durable history, attach a session manager / AgentCore Memory.

Validation

Bug-bashed with real deploy + invoke per framework on a test account: per-session recall works and there is no cross-session leak (the #809 check) on all frameworks. Generated templates py_compile cleanly across flag combinations; asset snapshot regenerated; unit suite green.

…oss all HTTP starter base templates (aws#808, aws#809)

All five HTTP starter base templates (strands, openaiagents, googleadk, langchain_langgraph, autogen) plus the regenerated assets snapshot. Only base/main.py template content changed; the strands hasMemory branch and all other framework files are untouched.

Refs aws#808
Refs aws#809
@github-actions github-actions Bot added the size/m PR size: M label Jun 25, 2026
@github-actions github-actions Bot added the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 25, 2026
@agentcore-devx-automation agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jun 25, 2026
@github-actions

Copy link
Copy Markdown
Contributor

@agentcore-devx-automation

Copy link
Copy Markdown
Contributor

Claude Security Review: no high-confidence findings. (run)

@agentcore-devx-automation agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jun 25, 2026

@agentcore-cli-automation agentcore-cli-automation left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix — the per-session memory bug is real and the overall approach (key in-process state by context.session_id) is the right design. Most of the diff looks good; flagging a few correctness/scale issues that I think need to be addressed before merge.

The core concern: the PR description explicitly justifies the 128-entry cap for the openaiagents/autogen/strands templates ("caps process memory and evicted sessions restart fresh"), but the googleadk and langchain_langgraph templates use module-level in-memory stores (InMemorySessionService, InMemorySaver) with no bound at all. That just trades the original cross-session bleed (#809-style) for an unbounded memory leak in any long-running process serving many sessions (e.g. agentcore dev over a long session, or non-Runtime deployments which the PR description specifically calls out as the failure mode of #809). The fix should be consistent across all five templates.

The FIFO-vs-LRU comments inline are smaller but worth correcting since the docstrings claim LRU.

session_service = InMemorySessionService()
session = await session_service.create_session(
# Module-level session service and runner (preserves history across invocations)
_session_service = InMemorySessionService()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unbounded session growth. _session_service = InMemorySessionService() at module scope retains every (app_name, user_id, session_id) triple forever — there is no eviction path. The PR description argues that the 128-entry cap on the other templates "caps process memory and evicted sessions restart fresh", but this template has no cap, so a long-running process (notably agentcore dev over time, and any non-Runtime deployment) will grow without bound as new session ids arrive.

The ADK InMemorySessionService doesn't expose a public eviction API as far as I can see, so options are roughly:

  1. Wrap session creation in a bounded FIFO/LRU and call _session_service.delete_session(...) on eviction (ADK does expose delete_session).
  2. Keep a collections.OrderedDict of recently-touched (user_id, session_id) keys alongside the service and prune via delete_session once it exceeds N (e.g. 128, to match the other templates).
  3. Subclass / wrap to enforce the cap at create-time.

Whichever you pick, please match the 128-entry behavior already documented for openaiagents/autogen/strands so the fix is consistent across templates.

tools = [add_numbers]

# Module-level checkpointer preserves conversation history across invocations
_checkpointer = InMemorySaver()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unbounded checkpoint growth. _checkpointer = InMemorySaver() is module-level with no eviction, so every distinct thread_id (= session_id) keeps its checkpoint forever. Same correctness concern as the googleadk template: the 128-cap rationale in the PR description doesn't apply here, so this template will leak memory in any long-running process.

A couple of options:

  1. Switch to a bounded saver (e.g. wrap InMemorySaver in something that tracks insertion order and calls _checkpointer.delete_thread(thread_id) past 128 entries — InMemorySaver exposes a storage dict you can prune).
  2. Use SqliteSaver/AsyncSqliteSaver with a temp file path for the base template (durable, doesn't grow process memory).
  3. At minimum, document the unboundedness in a comment and explicitly tell the reader to swap in a durable checkpointer for production — but a 128-cap to match the other templates is the cleanest fix.


# Reuses one AssistantAgent per session_id so each session keeps its own
# in-process conversation history (best-effort; resets on cold start). Caches up
# to 128 active sessions; the oldest is evicted and its history reset.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The eviction policy is FIFO, not LRU, despite the docstring. Plain dict preserves insertion order, and next(iter(_agents)) returns the oldest-inserted key. Reading an existing entry via _agents[session_id] does not move it to the end, so an actively-used session can be evicted while an idle one sticks around.

Two ways to fix:

  1. Either drop the "LRU" claim from the comment (the policy is FIFO/insertion-order, which is fine if intended), or
  2. Use collections.OrderedDict and call _agents.move_to_end(session_id) on every hit so it's actually LRU.

Same issue applies to the new strands base-template cache (src/assets/python/http/strands/base/main.py ~line 433-438), since the PR description there also describes it as "LRU-style eviction".

(Separate, smaller concern on this template: get_or_create_agent isn't concurrency-safe — two concurrent first-invocations for the same new session_id will both fall through the if session_id not in _agents check, both await get_streamable_http_mcp_tools(), and both build an AssistantAgent; the loser's agent is discarded mid-flight. Not catastrophic, but worth an asyncio.Lock around the create path if you want it tight.)

def get_or_create_agent(session_id{{#if hasSkillsFetcher}}, skill_plugins=None{{/if}}):
if session_id not in cache:
if len(cache) >= 128:
cache.pop(next(iter(cache)))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIFO, not LRU — same as the autogen template. next(iter(cache)) returns the oldest-inserted key, and reading cache[session_id] doesn't update its position. Either change the comment to say "FIFO eviction (oldest by insertion order)" or switch to collections.OrderedDict + cache.move_to_end(session_id) on hit for real LRU behavior.

This affects the no-memory branch only — the hasMemory branch (lines ~385-423) is unchanged here and was already unbounded; you may want to consider applying a cap there too for consistency, but that's out of scope of #808/#809 so feel free to defer.

# Caches up to 128 active sessions; LRU eviction silently resets history for
# the oldest session. For production use, replace with a durable session store
# (e.g. SQLiteSession with a file path).
@lru_cache(maxsize=128)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not blocking, just confirming: SQLiteSession(session_id) with no db_path uses an in-memory SQLite DB per-instance, so the lru_cache retention semantics here are: "keep up to 128 in-memory SQLite DBs alive; evicted DBs are GC'd and their history is lost on next access." That matches the docstring and is the correct knob for a starter template. ✅

Unlike the autogen/strands dict-based caches, functools.lru_cache is true LRU — accessing an existing entry moves it to most-recently-used — so no FIFO concern here.

@github-actions github-actions Bot removed the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 25, 2026
@aidandaly24 aidandaly24 changed the title fix(templates): Make each HTTP framework template reuse per-session conversa (#808, #809) fix(templates): key per-session agent state by context.session_id across all HTTP starter templates (#808, #809) Jun 25, 2026
… caches true LRU (aws#808, aws#809)

The googleadk (InMemorySessionService) and langchain_langgraph (InMemorySaver)
templates kept per-session state at module scope with no eviction, trading the
cross-session leak for an unbounded memory leak in long-running processes. Bound
both to 128 active sessions with LRU eviction (delete_session / delete_thread on
the least-recently-used key), matching the other three templates.

The autogen and strands no-memory caches were plain dicts evicting by insertion
order (FIFO) despite docstrings claiming LRU. Switch them to OrderedDict with
move_to_end on hit so the eviction policy is genuinely LRU and the docstrings are
accurate. openaiagents already uses functools.lru_cache (true LRU) and is
unchanged.

Regenerated the assets snapshot.

Refs aws#808
Refs aws#809
@github-actions github-actions Bot removed the size/m PR size: M label Jun 25, 2026
@github-actions github-actions Bot added the size/m PR size: M label Jun 25, 2026
@agentcore-devx-automation agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jun 25, 2026
@github-actions github-actions Bot added the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 25, 2026
@agentcore-devx-automation

Copy link
Copy Markdown
Contributor

Claude Security Review: no high-confidence findings. (run)

@agentcore-devx-automation agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jun 25, 2026

@agentcore-cli-automation agentcore-cli-automation left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to merge — all the issues raised in the prior review have been addressed in commit 08bfb25:

  • googleadk: _session_service is now bounded to 128 active sessions via _session_keys OrderedDict with LRU eviction calling delete_session on the evicted entry. ✅
  • langchain_langgraph: _checkpointer is now bounded to 128 active threads via _thread_ids OrderedDict with LRU eviction calling delete_thread. ✅
  • autogen: Switched from plain dict to OrderedDict with move_to_end on hit — now true LRU rather than FIFO. ✅
  • strands: Same OrderedDict + move_to_end treatment for the no-memory cache. ✅
  • openaiagents: Unchanged from the prior review — functools.lru_cache was already correct.

All five HTTP templates are now consistent: a 128-entry per-process cap, true LRU eviction, and context.session_id threaded through. The snapshot is regenerated to match.

A couple of minor things I noticed but won't block on (carried over from the prior review or out of scope):

  • The previously-noted concurrency window in autogen's get_or_create_agent (two concurrent first-invocations for the same new session_id both falling through the miss check) still exists, and the same pattern is now present in strands and googleadk. For a starter template this seems acceptable.
  • The strands hasMemory branch (lines ~387-425) is still an unbounded plain-dict cache, but as the prior reviewer noted that's outside the #808/#809 scope.

No new serious issues found.

@github-actions github-actions Bot removed the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 25, 2026
aws#808)

OrderedDict.popitem() returns (key, value) where key is the (user_id,
session_id) tuple and value is True. The previous unpack assigned the key
tuple to old_user_id and True to old_session_id, so delete_session() was
called with garbage and the real session was never deleted -- the
InMemorySessionService grew unbounded past the 128-session cap, defeating
the aws#808 bound for GoogleADK (LangChain unpacked correctly).

Destructure the key tuple and discard the value so the real ids reach
delete_session. Add an asset test asserting the store caps at 128 with
correct eviction ids, and regenerate the asset snapshot.
@github-actions github-actions Bot added size/m PR size: M and removed size/m PR size: M labels Jun 25, 2026
@agentcore-devx-automation agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jun 25, 2026
@github-actions github-actions Bot added the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 25, 2026
@agentcore-devx-automation

Copy link
Copy Markdown
Contributor

@agentcore-devx-automation agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jun 25, 2026

@agentcore-cli-automation agentcore-cli-automation left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed the new commit f3dabe9 (added after the prior approval). Looks good to merge.

f3dabe9 is a real fix. Confirmed by reading the rendered template at src/assets/python/http/googleadk/base/main.py:

  • The key stored in _session_keys is a 2-tuple (user_id, session_id) (line 150), with the value always True.
  • OrderedDict.popitem(last=False) returns (key, value), so the outer 2-tuple is (tuple_key, True).
  • The previous form old_user_id, old_session_id = popitem(...) unpacked the outer pair, assigning the whole inner key tuple to old_user_id and True to old_session_iddelete_session(user_id=<tuple>, session_id=True) would have errored or no-op'd, leaving the underlying InMemorySessionService to grow past 128 unbounded. The new form (old_user_id, old_session_id), _ = popitem(...) destructures correctly.
  • LangChain (line 75: evicted, _ = _thread_ids.popitem(last=False)) uses a single-string key, so its unpack was already correct — the commit message's parenthetical is accurate. ✅
  • The other LRU caches (autogen line 107, strands line 441) just call cache.popitem(last=False) and discard the result, which is fine since they don't need an async cleanup callback like delete_session. ✅

Tests: The new googleadk-session-eviction.test.ts has two effective regression guards — the substring assertion for the fixed form and the negative-match guard against the buggy form (lines 59 and 61) — which is exactly what's needed to prevent this from regressing. The simulateEviction portion (lines 27-51, 68-85) is a JS re-implementation of the Python eviction algorithm, so the caps the store at 128 assertion isn't actually exercising the template; it's documenting the intended algorithm. Not blocking, but if you wanted that assertion to mean something, you'd need to actually run the rendered template under Python — not worth the infra for a starter-template asset test, the two string guards are sufficient.

No new serious issues. The unaddressed-but-out-of-scope items from the prior review (concurrency window in get_or_create_agent, unbounded hasMemory strands branch) still stand and are fine to defer.

@github-actions github-actions Bot removed the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 25, 2026
@aidandaly24 aidandaly24 marked this pull request as ready for review June 25, 2026 16:42
@aidandaly24 aidandaly24 requested a review from a team June 25, 2026 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/m PR size: M

Projects

None yet

2 participants