Skip to content

fix(templates): give TypeScript templates per-session in-process short-term memory#1666

Open
aidandaly24 wants to merge 3 commits into
mainfrom
fix/ts-template-in-process-stm
Open

fix(templates): give TypeScript templates per-session in-process short-term memory#1666
aidandaly24 wants to merge 3 commits into
mainfrom
fix/ts-template-in-process-stm

Conversation

@aidandaly24

Copy link
Copy Markdown
Contributor

Description

The TypeScript agent templates did not provide correct per-session short-term memory — in-process, turn-to-turn conversation recall, which is distinct from the AgentCore Memory service.

Strands (memory: none) cached a single global Agent, so a process serving more than one session shared one conversation history across all of them. On AgentCore Runtime the 1:1 session→microVM model masks this, but in agentcore dev (one Node process, many sessions) it caused cross-session history bleed and unbounded growth. Reproduced: session A sets "favorite color is teal"; a different session B then asks and is told "teal".

VercelAI called streamText({ prompt }) with no history, so it could not recall earlier turns even within a single session.

This change keys each template's in-process history by sessionId, bounded by an insertion-ordered Map acting as an LRU(128) — mirroring the Python templates. On AgentCore Runtime each microVM serves a single session, so this holds one entry; the bound only matters for many-session processes such as local dev. No AWS service is provisioned: this is free, best-effort short-term memory that resets on cold start. The longAndShortTerm (memory-service) Strands branch is unchanged.

A prior change had disabled TypeScript memory because the Strands TS SDK lacked the AgentCore Memory session manager; that correctly removed the durable service integration but also removed the free in-process continuity, which never needed the session manager. This restores it.

Related Issue

Closes #1665

Documentation PR

N/A — templates are self-documenting via inline comments; no user-facing docs change.

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation update
  • Other (please describe):

Testing

How have you tested the change?

  • I ran npm run test:unit and npm run test:integ
  • I ran npm run typecheck
  • I ran npm run lint
  • If I modified src/assets/, I ran npm run test:update-snapshots and committed the updated snapshots

Additionally, both templates were rendered, type-checked (tsc --noEmit), and exercised end-to-end against deployed AgentCore Runtime and a local server (account-isolated test account):

  • Same sessionId across two turns → second turn recalls the first (Strands and VercelAI).
  • Two distinct sessionIds in one process → histories stay isolated (the bleed is gone).
  • Deployed Strands runtime regression → same---session-id recall still works.

Note on test:integ: the unit suite (5454 tests) and the asset snapshot suite pass. Integration tests that require provisioned AWS infra were not run locally; the behavioral change was instead validated by the real-deploy exercises above.

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

…t-term memory

The TypeScript Strands no-memory template cached a single global Agent
shared across every session, so a process serving multiple sessions (e.g.
`agentcore dev`) leaked conversation history between them and grew without
bound. The VercelAI template was stateless and could not recall earlier
turns even within one session.

Key both templates' in-process history by sessionId, bounded by an
insertion-ordered Map acting as an LRU(128) — mirroring the Python
templates. On AgentCore Runtime each microVM serves a single session, so
this holds one entry; the bound only matters for many-session processes
like local dev. No AWS service is provisioned; this is free, best-effort
short-term memory that resets on cold start.

Verified on deployed AgentCore Runtime: same-session recall works,
distinct sessions stay isolated, and bare invokes are unaffected.
@aidandaly24 aidandaly24 requested a review from a team June 29, 2026 18:24
@github-actions github-actions Bot added the size/s PR size: S label Jun 29, 2026
@agentcore-devx-automation agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jun 29, 2026
@github-actions github-actions Bot added the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 29, 2026
@agentcore-devx-automation

Copy link
Copy Markdown
Contributor

Claude Security Review: no high-confidence findings. (run)

@agentcore-devx-automation agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jun 29, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Package Tarball

aws-agentcore-0.21.1.tgz

How to install

gh release download pr-1666-tarball --repo aws/agentcore-cli --pattern "*.tgz" --dir /tmp/pr-tarball
npm install -g /tmp/pr-tarball/aws-agentcore-0.21.1.tgz

@agentcore-cli-automation agentcore-cli-automation left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing the cross-session bleed — the approach mirrors the Python templates cleanly and the bounded LRU is a nice touch.

I think there is one correctness issue in the Vercel AI template around error handling that should be addressed before merging: a failure during streamText (synchronous throw, async error, or mid-stream failure) leaves the user turn appended to messages without a corresponding assistant turn, which corrupts subsequent turns on the same session (Bedrock/Anthropic reject consecutive user messages). See the inline comment for options.

Minor concurrency note (non-blocking): two concurrent requests against the same sessionId will interleave pushes into the shared messages array. Unlikely on AgentCore Runtime (one session per microVM) and the feature is documented as best-effort, so I'd be fine leaving this as-is — flagging only in case you want to call it out in the comment.

assistant += chunk;
yield { data: chunk };
}
messages.push({ role: 'assistant', content: assistant });

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failed turns leave the history in a broken state. The user message is pushed at line 38 before streamText runs, and the assistant message is only pushed at line 52 after the stream completes successfully. If streamText throws, the for await errors mid-stream, or the generator is aborted by the runtime, the user turn stays in messages with no matching assistant turn. On the next invocation for the same sessionId, the history sent to the model will contain two consecutive user messages, which providers like Bedrock/Anthropic strict-validate and reject — so a single transient error effectively poisons the session for the rest of the process's lifetime.

A few ways to fix:

  1. Stage the user message and only commit on success. Build the request as [...messages, { role: 'user', content: prompt }], and only messages.push(userMsg, assistantMsg) after the stream drains successfully.
  2. Wrap in try/finally and roll back on failure. Push the user message up front, then in a catch/finally either pop the trailing user message or push a placeholder assistant message before re-throwing.
  3. Snapshot-and-restore on error. Capture messages.length before the call; on error, truncate back to that length.

Option 1 is the cleanest IMO and matches how most history-managing examples in the ai SDK docs do it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ccb52f3 with Option 1 (stage, commit on success), with one adjustment based on what I observed testing it.

I reproduced the failure against Bedrock and the mechanism is slightly different from a throw: a provider-level streamText failure surfaces as a silently-empty textStream — the for await completes with zero chunks and does not throw. So a plain try/catch around the loop never fires; the old code committed the user turn plus an empty-content assistant turn, and the next turn for that session was rejected with The content field in the Message object at messages.N is empty.

The fix builds the request from a staged copy and only commits once a non-empty reply has streamed:

const userMessage: ModelMessage = { role: 'user', content: payload.prompt ?? '' };
const result = streamText({ model, system: SYSTEM_PROMPT, messages: [...history, userMessage] });
let assistant = '';
for await (const chunk of result.textStream) { assistant += chunk; yield { data: chunk }; }
if (assistant.length > 0) {
  history.push(userMessage, { role: 'assistant', content: assistant });
}

The assistant.length > 0 guard covers both the silent-empty case and a genuine mid-stream throw (the push is simply never reached), so a failed turn is dropped cleanly instead of poisoning the session. Verified end-to-end: a failed turn leaves history empty, and a subsequent good turn on the same session works and recalls correctly.

On the concurrency note — agreed it's not worth locking given one-session-per-microVM on Runtime and the best-effort framing; the inline comment now states the history-consistency reasoning.

@github-actions github-actions Bot removed the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 29, 2026
@github-actions

github-actions Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Coverage Report

Status Category Percentage Covered / Total
🔵 Lines 38% 14079 / 37042
🔵 Statements 37.29% 14992 / 40200
🔵 Functions 32.61% 2418 / 7413
🔵 Branches 31.89% 9366 / 29362
Generated in workflow #3921 for commit 1540ed6 by the Vitest Coverage Report Action

return cachedAgent;
if (agentCache.size >= AGENT_CACHE_LIMIT) {
const oldest = agentCache.keys().next().value;
if (oldest !== undefined) agentCache.delete(oldest);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq: do we need to do any clean up here aside from just purging oldest agent?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question — no, purging the oldest agent is sufficient; there's nothing else to tear down.

I checked what a Strands TS Agent holds and what the template gives it:

  • The Agent has no close/dispose/[Symbol.asyncDispose] — it's a plain object holding in-memory state (messages, modelState, the tool/conversation registries). Dropping the last reference makes it GC-eligible; there are no per-agent sockets, file handles, or timers to release.
  • The only things that could hold a connection — the MCP clients and the model — are module-level singletons in the template (const mcpClients = [...], const tools = [...] at top scope), shared by reference across every cached agent. They're intentionally not per-agent, so eviction must not close them (other live sessions are still using them); they persist for the process lifetime by design.

So eviction is just delete of the map entry, matching the Python agent_factory (cache.popitem() with no teardown). If a future change gave each agent its own MCP client, that's when we'd need a disposal hook — but that's not the case here.

nborges-aws
nborges-aws previously approved these changes Jun 29, 2026
streamText errors surface as a silently-empty textStream (the for-await
completes with zero chunks rather than throwing), so the previous code
committed the user turn plus an empty assistant turn. On the next
invocation for the same sessionId, the provider rejected the empty
assistant content, poisoning the session for the rest of the process.

Build the request from a staged copy ([...history, userMessage]) and only
push the exchange into the persistent history after a non-empty reply, so
a failed turn is dropped cleanly. Addresses PR review feedback.
@github-actions github-actions Bot added size/s PR size: S and removed size/s PR size: S labels Jun 29, 2026
@agentcore-devx-automation agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jun 29, 2026
@github-actions github-actions Bot added the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 29, 2026
@agentcore-devx-automation

Copy link
Copy Markdown
Contributor

Claude Security Review: no high-confidence findings. (run)

@agentcore-devx-automation agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jun 29, 2026

@agentcore-cli-automation agentcore-cli-automation left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix overall — keying in-process history by sessionId and bounding the cache via Map-as-LRU mirrors the Python templates cleanly, and the VercelAI commit-on-non-empty refinement in ccb52f3 is solid.

One thing that I think still needs to be addressed before merge: the same "failed turn poisons the session" hazard that the prior review surfaced for VercelAI also applies to the no-memory Strands branch, and the current change doesn't guard against it. Details inline.

{{else}}
const agent = await getOrCreateAgent();
const sessionId = context?.sessionId ?? 'default-session';
const agent = await getOrCreateAgent(sessionId);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mid-stream failure on a cached Strands Agent poisons the session — same class of bug as the one we just fixed in VercelAI, but on this branch it survives because nothing rolls back the agent's internal state.

Looking at @strands-agents/sdk 1.7.0 (dist/src/agent/agent.js), Agent.stream(prompt) appends the user message to this.messages before invoking the model:

// Normalize input and append user messages on first invocation only
if (currentArgs !== undefined) {
  const messagesToAppend = this._normalizeInput(currentArgs);
  for (const message of messagesToAppend) {
    yield this._appendMessage(message, invocationState);   // <-- user msg committed here
  }
  ...
}
...
const modelResult = yield* this._invokeModel(invocationState, structuredOutputChoice);

The outer catch (error) only does special handling for CancelledError / InterruptError; a generic error from _invokeModel (e.g. Bedrock throttling, transient provider error) just re-throws after closing spans — this.messages keeps the dangling user turn. Because we now cache one Agent per sessionId, the next invocation on that session calls _normalizeInput again and appends another user message, so the agent's history ends with two consecutive user messages and Bedrock/Anthropic providers strict-reject the request from then on. Single transient error → session permanently broken for the lifetime of the process.

Before this PR, cachedAgent was a single global, so this was already latent — but limited to "the whole process gets stuck" rather than per-session. Now that we're explicitly selling "per-session in-process short-term memory", the failure mode is more visible and worth closing.

A few ways to fix:

  1. Snapshot/restore via the SDK. Agent exposes takeSnapshot({ include: ['messages'] }) and loadSnapshot(snapshot) (see dist/src/agent/agent.d.ts). Before agent.stream(...), take a messages snapshot; in a catch, loadSnapshot it and rethrow. Cleanest because it uses the supported public API.
  2. Length-based truncation. Capture agent.messages.length before the loop; on error, splice back to that length and rethrow. Lower-level but doesn't depend on snapshot semantics.
  3. Evict on error. Wrap the for await in try/catch; on error agentCache.delete(sessionId) and rethrow. Simpler, but the session loses all prior history on any transient failure — fine as a stopgap if the SDK snapshot API isn't stable yet.

Option 1 matches the SDK's own "keep agent.messages reinvokable at all times" design intent (per the deferred-append comment in agent.js) and is what I'd lean toward. Option 3 is the minimal viable fix if you'd rather not introduce a snapshot dependency in the template.

Worth verifying by reproducing the same way the VercelAI case was: force a provider error on turn 1 (e.g. invalid model id, or temporarily throttled), then send turn 2 on the same --session-id and confirm it isn't rejected with a "consecutive user messages" / role-validation error from Bedrock.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 1540ed6 with Option 1 (snapshot/restore via the SDK's takeSnapshot/loadSnapshot).

I reproduced it as you suggested and want to share the nuance I found, because it changes the severity (not the fix):

The dangling user turn is real — after a forced turn-1 failure on a cached agent, agent.messages is left as ["user"], and a subsequent turn appends another, giving ["user","user","assistant"]. Strands' model adapters don't merge consecutive same-role messages (AnthropicModel._formatMessages is a plain .map), so the doubled user turn is sent as-is.

But whether it actually breaks is provider-dependent:

  • Bedrock (the template's default modelProvider): I tested both via the Agent and a raw ConverseCommand with two consecutive user messages — both accepted, and the model answered correctly. So on the default provider the session is not poisoned.
  • Anthropic / OpenAI direct: those APIs require strict user/assistant alternation, so the doubled user turn would be rejected from then on.

Since the template ships all four providers, I fixed it regardless rather than rely on Bedrock's leniency.

The guard snapshots messages before the stream and restores on error, scoped to the no-memory branch only — the hasMemory branch persists through AgentCoreMemorySessionManager (server-side events via hooks), where rolling back local agent.messages wouldn't undo the server write, so it deliberately keeps its existing try/finally … flush() shape.

const snapshot = agent.takeSnapshot({ include: ['messages'] });
try {
  for await (const event of agent.stream(payload.prompt ?? '')) { /* … */ }
} catch (error) {
  agent.loadSnapshot(snapshot);
  throw error;
}

Verified end-to-end through the BedrockAgentCoreApp handler with a model that throws on turn 1: turn 1 returns an SSE error and rolls back, and turn 2 on the same sessionId runs cleanly with ["user","assistant"] history instead of a poisoned ["user","user",…].

@github-actions github-actions Bot removed the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 29, 2026
Agent.stream() appends the user message to the cached agent's history
before invoking the model, and a generic mid-stream error (e.g. throttling
or a transient provider error) re-throws without rolling that back. Since
the no-memory branch now caches one Agent per sessionId, the next turn on
that session would append a second user message, leaving consecutive user
turns that providers requiring strict role alternation (e.g. Anthropic)
reject — poisoning the session for the process lifetime.

Snapshot the agent's messages before streaming and restore them in a catch
before re-throwing, using the SDK's takeSnapshot/loadSnapshot. Scoped to the
no-memory branch; the memory branch persists via the session manager and
must not roll back local state. Verified end-to-end through the runtime
handler: a forced turn-1 failure leaves history clean and the next turn on
the same session succeeds. Addresses PR review feedback.
@github-actions github-actions Bot added size/m PR size: M and removed size/s PR size: S labels Jun 29, 2026
@agentcore-devx-automation agentcore-devx-automation Bot added the claude-security-reviewing Claude Code /security-review in progress label Jun 29, 2026
@github-actions github-actions Bot added the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 29, 2026
@agentcore-devx-automation

Copy link
Copy Markdown
Contributor

Claude Security Review: no high-confidence findings. (run)

@agentcore-devx-automation agentcore-devx-automation Bot removed the claude-security-reviewing Claude Code /security-review in progress label Jun 29, 2026

@agentcore-cli-automation agentcore-cli-automation left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — all serious issues raised in earlier reviews have been addressed in the latest two commits, and I don't have anything new to flag.

What I checked:

  • VercelAI failed/empty-turn handling (ccb52f30): Building the streamText request from [...history, userMessage] (rather than pushing first) and only committing on assistant.length > 0 is correct. A mid-stream throw propagates out of the for await before the commit, and a silent-empty stream is dropped — either way history is never mutated for a failed turn, so the next turn on the same session won't be poisoned with a dangling user message or empty assistant message.
  • Strands snapshot/restore (1540ed67): I confirmed Agent#takeSnapshot/loadSnapshot are part of the public API in @strands-agents/sdk 1.5.0 (the pinned ^1.5.0 floor), and that loadSnapshot with a { include: ['messages'] } snapshot truncates and re-pushes deep copies of the messages array (agent.messages.length = 0 + Message.fromJSON(...)). So a mid-_invokeModel error after Strands appends the user turn is correctly rolled back, and the cached Agent stays reusable. Restricting the guard to the {{else}} (no-memory) branch is the right call — on hasMemory, server-side memory writes via the session manager wouldn't be undone by a local rollback.
  • LRU bound: Map-as-LRU with delete+set on hit and keys().next().value eviction on overflow is correct insertion-ordered LRU semantics.
  • Snapshot test: the snapshot file matches the template sources verbatim, so npm run test:snapshots will pass.

Pre-existing/non-blocking (not introduced by this PR, just noting):

  • The hasMemory Strands branch's agentCache is still unbounded, unlike the no-memory branch's LRU(128). Consistent with the Python templates' agent_factory, so fine to leave for now.
  • The PR description says this "mirrors the Python templates", but the Python http/strands/base/main.py no-memory branch actually keeps a single global _agent, not a per-session LRU. The TS change here is strictly better than the Python equivalent in the no-memory path; just calling out for accuracy.

Nothing blocking from me — approving.

@github-actions github-actions Bot removed the agentcore-harness-reviewing AgentCore Harness review in progress label Jun 29, 2026
@aidandaly24

Copy link
Copy Markdown
Contributor Author

Thanks for the thorough re-review. One factual correction on the non-blocking note: the Python http/strands/base/main.py no-memory branch is already a per-session LRU, not a single global agent — it uses agent_factory() with an OrderedDict keyed by session_id, move_to_end on hit, and popitem(last=False) eviction at 128:

def agent_factory():
    cache = OrderedDict()
    def get_or_create_agent(session_id, ...):
        if session_id in cache:
            cache.move_to_end(session_id); return cache[session_id]
        if len(cache) >= 128:
            cache.popitem(last=False)
        cache[session_id] = Agent(...)
        return cache[session_id]

The single-global _agent was the pre-existing state; it was converted to this per-session LRU across all five Python templates in d791bfa ("key per-session agent state by context.session_id … (#808, #809)"). So "mirrors the Python templates" is accurate — this TS change brings the no-memory branch to parity with that pattern (Map-as-LRU ↔ OrderedDict), which is why the bound and keying match. Leaving the description as-is.

Agreed on the other note: the hasMemory Strands agentCache being unbounded matches the Python hasMemory branch and is fine to leave.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/m PR size: M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TypeScript templates: no per-session short-term memory (Strands leaks history across sessions; VercelAI is stateless)

3 participants