Update 2026-05-25: I've edited this issue to clarify the framing. The original wording made stronger empirical claims ("I've been running a prototype", "~70% reduction", "~55% daily cost drop") than I can actually support — the design is from reading the codebase, not from a measured production deployment. The design discussion below stands; the specific numbers and personal use-case framing have been removed.
Summary
I'd like to gauge interest in a cross-agent shared memory store with on-demand capsule recall for AutoGen — sitting alongside (not replacing) each agent's per-instance memory, scoped to {agent, group, global}, retrieved as small capsules via a tool surface rather than prefix-loaded into every agent's context.
In a GroupChat (or any multi-agent runtime) today, three patterns recur:
- Shared facts duplicated N times — user preferences ("respond in markdown"), project conventions ("use Pydantic v2 only"), and hard policies ("no PII in logs") live separately in every agent's memory config. Updating one means updating N; drift is structurally inevitable.
- Cross-agent context handoff is via messages — when agent A learns a durable fact, the only way for agent B to know is for the GroupChat manager to forward it as a message, which (a) requires manager logic and (b) doubles context cost for B.
- Memory grows monotonically in prefix — most memory-config implementations load the full memory into every turn's prompt prefix, so a long-lived crew with weeks of accumulated facts pays prefix cost on each turn even when most facts are irrelevant to the current task.
Proposal: an opt-in SharedMemoryStore registered at the AgentRuntime level (or attached to a GroupChat), with three retrieval scopes and a tool-shaped surface (memory_search / memory_remember / memory_forget) so capsules land in tool-result position, not prefix.
Why three scopes
The scope dimension is the part that doesn't exist today and is the reason to have a shared store at all:
agent — per-agent facts; same as today's per-agent memory_config but unified API. Examples: agent's own self-correction notes, agent-specific tool quirks.
group — facts shared within a single GroupChat / runtime but not across crews. Examples: facts about the current task, shared scratchpad, decisions the team has made.
global — facts shared across all groups/runtimes in this deployment. Examples: user preferences, project conventions, hard rules.
Today's per-agent stores can simulate group via manual forwarding and global via shared config files, but neither has a clean retrieval API — and neither supports capsule-style recall, so the prefix keeps growing.
Design sketch
┌──────────────────────────────────────────┐
│ AgentRuntime │
│ ├── shared_memory: SharedMemoryStore │
│ │ ├── global.db (cross-runtime) │
│ │ ├── group_{id}.db (per GroupChat)│
│ │ └── agent_{id}.db (per agent) │
│ └── agents: [ConversableAgent, ...] │
└──────────────────────────────────────────┘
Tool surface (registered automatically on each agent):
memory_search(query: str, scope: "agent"|"group"|"global"|"all",
top_k: int = 5, max_capsule_bytes: int = 2048)
→ returns ranked facts as a ≤2KB capsule, tool-result not prefix
memory_remember(fact: str, scope: "agent"|"group"|"global",
confidence: float = 1.0, ttl_days: int|None = None)
→ adds to store; auto-tagged with agent_id + timestamp
memory_forget(fact_id: str)
→ soft-delete; tombstone preserved for audit
Storage: SQLite per scope + FTS5 for text search. Optional embedding column (any configured embedding provider) for semantic recall. Stays additive — current memory_config paths are unchanged.
Conflict handling: memory_remember to global from agent A while agent B is mid-turn — B's next memory_search sees the new fact; existing turn isn't interrupted. No transactional cross-agent guarantee; eventual consistency is intentional (matches the conversation grain).
Capsule recall, not prefix-load: this is the part that distinguishes from existing per-agent memory configs. The store is not loaded into the system prompt on every turn; agents have to ask for what they need via the tool. Cost: agents need to learn to query (handled via the system-prompt scaffolding the runtime can inject); benefit: prompt prefix stays small even as memory grows.
What this is NOT
- Not a replacement for existing per-agent memory configs (whether
agentchat's memory hooks or core's ChatCompletionContext) — they stay, this is additive
- Not a vector database adapter — SQLite + FTS5 is the v1 target; embedding column is optional and pluggable
- Not a distributed system — single-process / single-runtime; multi-host autogen would need a backend adapter (out of scope for v1)
- Not opinionated on what to store — store is a flat fact table; structuring (topics, entities, relations) is up to the caller
Why this matters in practice
A representative shape (code-reading argument, not measured production data): consider a multi-agent pipeline where 3 agents need shared knowledge of "current week's policy updates". Today each agent either loads the policy text into its own system prefix (3× cost) or re-derives it from a shared file at each turn (no caching). A global-scope shared store retrieved via memory_search("current policy") only on policy-relevant turns would address both: single source of truth, retrieval cost paid only when the agent's reasoning surfaces the question.
Would be very interested in whether maintainers have seen similar duplication patterns in real core / agentchat deployments and how they think about it today.
Questions before opening anything
- Roadmap conflict? AutoGen's
core has been evolving fast; is there an in-flight memory abstraction (MemoryProtocol, ChatCompletionContext extensions, etc.) that this should align with?
- Scope preference:
- (a) minimal demo PR — just
SharedMemoryStore + the three tools + global scope only, no embedding
- (b) full version with all three scopes + embedding + auto-injected system-prompt scaffolding teaching agents to query
- (c) ship as an extension package (
autogen-shared-memory); just expose the registration hook
- Where to register —
AgentRuntime constructor (cleanest, but only core users) vs GroupChat (works for agentchat users but ties to chat metaphor) vs both?
global lifecycle — cross-runtime sharing requires picking a deployment-wide path. Convention (~/.autogen/shared_memory/global.db) or explicit config?
Posting to gauge interest before moving toward a PR.
Thanks!
Summary
I'd like to gauge interest in a cross-agent shared memory store with on-demand capsule recall for AutoGen — sitting alongside (not replacing) each agent's per-instance memory, scoped to
{agent, group, global}, retrieved as small capsules via a tool surface rather than prefix-loaded into every agent's context.In a
GroupChat(or any multi-agent runtime) today, three patterns recur:Proposal: an opt-in
SharedMemoryStoreregistered at theAgentRuntimelevel (or attached to aGroupChat), with three retrieval scopes and a tool-shaped surface (memory_search/memory_remember/memory_forget) so capsules land in tool-result position, not prefix.Why three scopes
The scope dimension is the part that doesn't exist today and is the reason to have a shared store at all:
agent— per-agent facts; same as today's per-agent memory_config but unified API. Examples: agent's own self-correction notes, agent-specific tool quirks.group— facts shared within a singleGroupChat/ runtime but not across crews. Examples: facts about the current task, shared scratchpad, decisions the team has made.global— facts shared across all groups/runtimes in this deployment. Examples: user preferences, project conventions, hard rules.Today's per-agent stores can simulate
groupvia manual forwarding andglobalvia shared config files, but neither has a clean retrieval API — and neither supports capsule-style recall, so the prefix keeps growing.Design sketch
Storage: SQLite per scope + FTS5 for text search. Optional embedding column (any configured embedding provider) for semantic recall. Stays additive — current
memory_configpaths are unchanged.Conflict handling:
memory_remembertoglobalfrom agent A while agent B is mid-turn — B's nextmemory_searchsees the new fact; existing turn isn't interrupted. No transactional cross-agent guarantee; eventual consistency is intentional (matches the conversation grain).Capsule recall, not prefix-load: this is the part that distinguishes from existing per-agent memory configs. The store is not loaded into the system prompt on every turn; agents have to ask for what they need via the tool. Cost: agents need to learn to query (handled via the system-prompt scaffolding the runtime can inject); benefit: prompt prefix stays small even as memory grows.
What this is NOT
agentchat's memory hooks orcore'sChatCompletionContext) — they stay, this is additiveWhy this matters in practice
A representative shape (code-reading argument, not measured production data): consider a multi-agent pipeline where 3 agents need shared knowledge of "current week's policy updates". Today each agent either loads the policy text into its own system prefix (3× cost) or re-derives it from a shared file at each turn (no caching). A
global-scope shared store retrieved viamemory_search("current policy")only on policy-relevant turns would address both: single source of truth, retrieval cost paid only when the agent's reasoning surfaces the question.Would be very interested in whether maintainers have seen similar duplication patterns in real
core/agentchatdeployments and how they think about it today.Questions before opening anything
corehas been evolving fast; is there an in-flight memory abstraction (MemoryProtocol,ChatCompletionContextextensions, etc.) that this should align with?SharedMemoryStore+ the three tools +globalscope only, no embeddingautogen-shared-memory); just expose the registration hookAgentRuntimeconstructor (cleanest, but onlycoreusers) vsGroupChat(works foragentchatusers but ties to chat metaphor) vs both?globallifecycle — cross-runtime sharing requires picking a deployment-wide path. Convention (~/.autogen/shared_memory/global.db) or explicit config?Posting to gauge interest before moving toward a PR.
Thanks!