Skip to content

RFC: Cross-agent shared memory store with on-demand capsule recall (agent/group/global scopes) #7748

@leavedrop

Description

@leavedrop

Update 2026-05-25: I've edited this issue to clarify the framing. The original wording made stronger empirical claims ("I've been running a prototype", "~70% reduction", "~55% daily cost drop") than I can actually support — the design is from reading the codebase, not from a measured production deployment. The design discussion below stands; the specific numbers and personal use-case framing have been removed.


Summary

I'd like to gauge interest in a cross-agent shared memory store with on-demand capsule recall for AutoGen — sitting alongside (not replacing) each agent's per-instance memory, scoped to {agent, group, global}, retrieved as small capsules via a tool surface rather than prefix-loaded into every agent's context.

In a GroupChat (or any multi-agent runtime) today, three patterns recur:

  1. Shared facts duplicated N times — user preferences ("respond in markdown"), project conventions ("use Pydantic v2 only"), and hard policies ("no PII in logs") live separately in every agent's memory config. Updating one means updating N; drift is structurally inevitable.
  2. Cross-agent context handoff is via messages — when agent A learns a durable fact, the only way for agent B to know is for the GroupChat manager to forward it as a message, which (a) requires manager logic and (b) doubles context cost for B.
  3. Memory grows monotonically in prefix — most memory-config implementations load the full memory into every turn's prompt prefix, so a long-lived crew with weeks of accumulated facts pays prefix cost on each turn even when most facts are irrelevant to the current task.

Proposal: an opt-in SharedMemoryStore registered at the AgentRuntime level (or attached to a GroupChat), with three retrieval scopes and a tool-shaped surface (memory_search / memory_remember / memory_forget) so capsules land in tool-result position, not prefix.

Why three scopes

The scope dimension is the part that doesn't exist today and is the reason to have a shared store at all:

  • agent — per-agent facts; same as today's per-agent memory_config but unified API. Examples: agent's own self-correction notes, agent-specific tool quirks.
  • group — facts shared within a single GroupChat / runtime but not across crews. Examples: facts about the current task, shared scratchpad, decisions the team has made.
  • global — facts shared across all groups/runtimes in this deployment. Examples: user preferences, project conventions, hard rules.

Today's per-agent stores can simulate group via manual forwarding and global via shared config files, but neither has a clean retrieval API — and neither supports capsule-style recall, so the prefix keeps growing.

Design sketch

┌──────────────────────────────────────────┐
│  AgentRuntime                             │
│   ├── shared_memory: SharedMemoryStore   │
│   │     ├── global.db   (cross-runtime)  │
│   │     ├── group_{id}.db  (per GroupChat)│
│   │     └── agent_{id}.db  (per agent)    │
│   └── agents: [ConversableAgent, ...]    │
└──────────────────────────────────────────┘

Tool surface (registered automatically on each agent):

  memory_search(query: str, scope: "agent"|"group"|"global"|"all",
                top_k: int = 5, max_capsule_bytes: int = 2048)
      → returns ranked facts as a ≤2KB capsule, tool-result not prefix

  memory_remember(fact: str, scope: "agent"|"group"|"global",
                  confidence: float = 1.0, ttl_days: int|None = None)
      → adds to store; auto-tagged with agent_id + timestamp

  memory_forget(fact_id: str)
      → soft-delete; tombstone preserved for audit

Storage: SQLite per scope + FTS5 for text search. Optional embedding column (any configured embedding provider) for semantic recall. Stays additive — current memory_config paths are unchanged.

Conflict handling: memory_remember to global from agent A while agent B is mid-turn — B's next memory_search sees the new fact; existing turn isn't interrupted. No transactional cross-agent guarantee; eventual consistency is intentional (matches the conversation grain).

Capsule recall, not prefix-load: this is the part that distinguishes from existing per-agent memory configs. The store is not loaded into the system prompt on every turn; agents have to ask for what they need via the tool. Cost: agents need to learn to query (handled via the system-prompt scaffolding the runtime can inject); benefit: prompt prefix stays small even as memory grows.

What this is NOT

  • Not a replacement for existing per-agent memory configs (whether agentchat's memory hooks or core's ChatCompletionContext) — they stay, this is additive
  • Not a vector database adapter — SQLite + FTS5 is the v1 target; embedding column is optional and pluggable
  • Not a distributed system — single-process / single-runtime; multi-host autogen would need a backend adapter (out of scope for v1)
  • Not opinionated on what to store — store is a flat fact table; structuring (topics, entities, relations) is up to the caller

Why this matters in practice

A representative shape (code-reading argument, not measured production data): consider a multi-agent pipeline where 3 agents need shared knowledge of "current week's policy updates". Today each agent either loads the policy text into its own system prefix (3× cost) or re-derives it from a shared file at each turn (no caching). A global-scope shared store retrieved via memory_search("current policy") only on policy-relevant turns would address both: single source of truth, retrieval cost paid only when the agent's reasoning surfaces the question.

Would be very interested in whether maintainers have seen similar duplication patterns in real core / agentchat deployments and how they think about it today.

Questions before opening anything

  1. Roadmap conflict? AutoGen's core has been evolving fast; is there an in-flight memory abstraction (MemoryProtocol, ChatCompletionContext extensions, etc.) that this should align with?
  2. Scope preference:
    • (a) minimal demo PR — just SharedMemoryStore + the three tools + global scope only, no embedding
    • (b) full version with all three scopes + embedding + auto-injected system-prompt scaffolding teaching agents to query
    • (c) ship as an extension package (autogen-shared-memory); just expose the registration hook
  3. Where to registerAgentRuntime constructor (cleanest, but only core users) vs GroupChat (works for agentchat users but ties to chat metaphor) vs both?
  4. global lifecycle — cross-runtime sharing requires picking a deployment-wide path. Convention (~/.autogen/shared_memory/global.db) or explicit config?

Posting to gauge interest before moving toward a PR.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions