RFC: Cross-agent shared memory store with on-demand capsule recall (agent/group/global scopes)

> **Update 2026-05-25**: I've edited this issue to clarify the framing. The original wording made stronger empirical claims ("I've been running a prototype", "~70% reduction", "~55% daily cost drop") than I can actually support — the design is from reading the codebase, not from a measured production deployment. The design discussion below stands; the specific numbers and personal use-case framing have been removed.

---

## Summary

I'd like to gauge interest in a **cross-agent shared memory store with on-demand capsule recall** for AutoGen — sitting alongside (not replacing) each agent's per-instance memory, scoped to `{agent, group, global}`, retrieved as small capsules via a tool surface rather than prefix-loaded into every agent's context.

In a `GroupChat` (or any multi-agent runtime) today, three patterns recur:

1. **Shared facts duplicated N times** — user preferences ("respond in markdown"), project conventions ("use Pydantic v2 only"), and hard policies ("no PII in logs") live separately in every agent's memory config. Updating one means updating N; drift is structurally inevitable.
2. **Cross-agent context handoff is via messages** — when agent A learns a durable fact, the only way for agent B to know is for the GroupChat manager to forward it as a message, which (a) requires manager logic and (b) doubles context cost for B.
3. **Memory grows monotonically in prefix** — most memory-config implementations load the full memory into every turn's prompt prefix, so a long-lived crew with weeks of accumulated facts pays prefix cost on each turn even when most facts are irrelevant to the current task.

Proposal: an opt-in `SharedMemoryStore` registered at the `AgentRuntime` level (or attached to a `GroupChat`), with three retrieval scopes and a tool-shaped surface (`memory_search` / `memory_remember` / `memory_forget`) so capsules land in tool-result position, not prefix.

## Why three scopes

The scope dimension is the part that doesn't exist today and is the reason to have a *shared* store at all:

- **`agent`** — per-agent facts; same as today's per-agent memory_config but unified API. Examples: agent's own self-correction notes, agent-specific tool quirks.
- **`group`** — facts shared within a single `GroupChat` / runtime but not across crews. Examples: facts about the current task, shared scratchpad, decisions the team has made.
- **`global`** — facts shared across all groups/runtimes in this deployment. Examples: user preferences, project conventions, hard rules.

Today's per-agent stores can simulate `group` via manual forwarding and `global` via shared config files, but neither has a clean retrieval API — and neither supports capsule-style recall, so the prefix keeps growing.

## Design sketch

```
┌──────────────────────────────────────────┐
│  AgentRuntime                             │
│   ├── shared_memory: SharedMemoryStore   │
│   │     ├── global.db   (cross-runtime)  │
│   │     ├── group_{id}.db  (per GroupChat)│
│   │     └── agent_{id}.db  (per agent)    │
│   └── agents: [ConversableAgent, ...]    │
└──────────────────────────────────────────┘

Tool surface (registered automatically on each agent):

  memory_search(query: str, scope: "agent"|"group"|"global"|"all",
                top_k: int = 5, max_capsule_bytes: int = 2048)
      → returns ranked facts as a ≤2KB capsule, tool-result not prefix

  memory_remember(fact: str, scope: "agent"|"group"|"global",
                  confidence: float = 1.0, ttl_days: int|None = None)
      → adds to store; auto-tagged with agent_id + timestamp

  memory_forget(fact_id: str)
      → soft-delete; tombstone preserved for audit
```

**Storage**: SQLite per scope + FTS5 for text search. Optional embedding column (any configured embedding provider) for semantic recall. Stays additive — current `memory_config` paths are unchanged.

**Conflict handling**: `memory_remember` to `global` from agent A while agent B is mid-turn — B's next `memory_search` sees the new fact; existing turn isn't interrupted. No transactional cross-agent guarantee; eventual consistency is intentional (matches the conversation grain).

**Capsule recall, not prefix-load**: this is the part that distinguishes from existing per-agent memory configs. The store is *not* loaded into the system prompt on every turn; agents have to *ask* for what they need via the tool. Cost: agents need to learn to query (handled via the system-prompt scaffolding the runtime can inject); benefit: prompt prefix stays small even as memory grows.

## What this is NOT

- **Not** a replacement for existing per-agent memory configs (whether `agentchat`'s memory hooks or `core`'s `ChatCompletionContext`) — they stay, this is additive
- **Not** a vector database adapter — SQLite + FTS5 is the v1 target; embedding column is optional and pluggable
- **Not** a distributed system — single-process / single-runtime; multi-host autogen would need a backend adapter (out of scope for v1)
- **Not** opinionated on what to store — store is a flat fact table; structuring (topics, entities, relations) is up to the caller

## Why this matters in practice

A representative shape (code-reading argument, not measured production data): consider a multi-agent pipeline where 3 agents need shared knowledge of "current week's policy updates". Today each agent either loads the policy text into its own system prefix (3× cost) or re-derives it from a shared file at each turn (no caching). A `global`-scope shared store retrieved via `memory_search("current policy")` only on policy-relevant turns would address both: single source of truth, retrieval cost paid only when the agent's reasoning surfaces the question.

Would be very interested in whether maintainers have seen similar duplication patterns in real `core` / `agentchat` deployments and how they think about it today.

## Questions before opening anything

1. **Roadmap conflict?** AutoGen's `core` has been evolving fast; is there an in-flight memory abstraction (`MemoryProtocol`, `ChatCompletionContext` extensions, etc.) that this should align with?
2. **Scope preference**:
   - **(a)** minimal demo PR — just `SharedMemoryStore` + the three tools + `global` scope only, no embedding
   - **(b)** full version with all three scopes + embedding + auto-injected system-prompt scaffolding teaching agents to query
   - **(c)** ship as an extension package (`autogen-shared-memory`); just expose the registration hook
3. **Where to register** — `AgentRuntime` constructor (cleanest, but only `core` users) vs `GroupChat` (works for `agentchat` users but ties to chat metaphor) vs both?
4. **`global` lifecycle** — cross-runtime sharing requires picking a deployment-wide path. Convention (`~/.autogen/shared_memory/global.db`) or explicit config?

Posting to gauge interest before moving toward a PR.

Thanks!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Cross-agent shared memory store with on-demand capsule recall (agent/group/global scopes) #7748

Summary

Why three scopes

Design sketch

What this is NOT

Why this matters in practice

Questions before opening anything

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

RFC: Cross-agent shared memory store with on-demand capsule recall (agent/group/global scopes) #7748

Description

Summary

Why three scopes

Design sketch

What this is NOT

Why this matters in practice

Questions before opening anything

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions