MemMA: Coordinating the Memory Cycle through Multi-Agent Reasoning and In-Situ Self-Evolution

Official implementation of "MemMA: Coordinating the Memory Cycle through Multi-Agent Reasoning and In-Situ Self-Evolution".

If you find this work helpful, please cite our paper:

@article{lin2026memma,
  title={MemMA: Coordinating the Memory Cycle through Multi-Agent Reasoning and In-Situ Self-Evolution},
  author={Lin, Minhua and Zhang, Zhiwei and Lu, Hanqing and Liu, Hui and Tang, Xianfeng and He, Qi and Zhang, Xiang and Wang, Suhang},
  journal={arXiv preprint arXiv:2603.18718},
  year={2026}
}

Overview

Existing memory-augmented LLM agents mainly treat memory construction, retrieval, and utilization as isolated subroutines, leading to myopic construction and aimless retrieval (shallow or repetitive searches). MemMA introduces a multi-agent framework that coordinates both the forward and backward paths of the memory cycle.

Forward Path: Multi-Agent Coordination

Meta-Thinker produces structured guidance that steers both construction and retrieval phases, identifying information importance, redundancy, conflicts, and evidence gaps.
Memory Manager executes atomic memory edits (ADD, UPDATE, DELETE, NONE) conditioned on Meta-Thinker guidance and current context.
Query Reasoner implements an iterative Refine-and-Probe loop, replacing one-shot retrieval with diagnosis-guided evidence refinement over multiple turns.
Answer Agent generates final responses from accumulated evidence.

Backward Path: In-Situ Self-Evolution

After each session, a three-stage mechanism repairs the provisional memory before commitment:

Probe Generation -- synthesizes QA pairs testing factual recall, cross-session reasoning, and temporal inference.
In-situ Verification -- retrieves evidence from provisional memory and generates answers.
Evidence-Grounded Repair -- failed probes trigger repair proposals (SKIP/MERGE/INSERT), consolidating and correcting memory.

Project Structure

memma/
├── scripts/                  # Experiment runner scripts
│   ├── run_memma_self_refine_single*.py       # MemMA: Single-Agent backend
│   ├── run_memma_self_refine_lightmem*.py     # MemMA: LightMem backend
│   ├── run_memma_self_refine_amem*.py         # MemMA: A-MEM backend
│   ├── run_memma_lightmem*.py                 # LightMem (no self-refine)
│   └── run_vanilla_baseline*.py               # Vanilla baselines
│
├── data/                     # Dataset files
│   └── memory_rl_train_locomo_conv_26.parquet  # Pre-generated probe QA
│
├── data_preprocess/          # Dataset utilities
│   └── utils.py              # Data loading (load_locomo_dataset, etc.)
│
├── configs/memory_toolkits/  # Baseline toolkit configs (JSON)
│
├── examples/                 # Experiment shell scripts
│   ├── run_memma_*.sh        # MemMA experiments
│   └── run_baseline_*.sh     # Baseline experiments
├── requirements.txt
├── .env.example
└── .gitignore

Installation

git clone https://github.com/ventr1c/memma.git
cd memma

# Create conda environment
conda create -n memma python=3.11 -y
conda activate memma

# Install PyTorch (CUDA 12.8, adjust for your CUDA version)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128

# Install dependencies
pip install -r requirements.txt

External Dependencies

The LightMem and FullContext/NaiveRAG/LangMem/A-MEM baseline scripts rely on memory toolkits from LightMem. To run these baselines, install LightMem and set the path:

git clone https://github.com/zjunlp/LightMem.git
cd LightMem
pip install -e .
cd ..
export LIGHTMEM_ROOT=/path/to/LightMem

Environment Variables

Copy the example environment file and fill in your API keys:

cp .env.example .env

Variable	Description
`OPENAI_API_KEY`	OpenAI API key (for GPT-4o-mini backbone)
`ANTHROPIC_API_KEY`	Anthropic API key (for Claude Haiku backbone)

Dataset Preparation

MemMA is evaluated on the LoCoMo dataset. Download locomo10.json and place it under the data/ directory:

mkdir -p data
# Download LoCoMo dataset and place locomo10.json in data/

The self-evolution backward path uses a pre-generated probe QA parquet file, which is included in the repo:

data/memory_rl_train_locomo_conv_26.parquet

The generation pipeline for this file will be documented in a future update.

Quick Start

Each experiment follows a two-phase pipeline:

Phase 1 -- Memory Construction: Build memories from conversation sessions (with optional Meta-Thinker guidance and self-evolution).
Phase 2 -- QA Evaluation: Retrieve from constructed memories and answer questions (with optional iterative Query Reasoner).

Example with the Single-Agent backend using GPT-4o-mini:

# Set your API key
export OPENAI_API_KEY="your-key-here"

# Phase 1: Build memories with Meta-Thinker + Self-Evolution
python scripts/run_memma_self_refine_single.py \
  --dataset data/locomo10.json \
  --memory-dir ./results/memory_bank/memma_single/ \
  --output_dir ./results/memma_single/ \
  --enable_construction_meta_guidance \
  --build_memories \
  --model gpt-4o-mini \
  --retrieve_k 30 \
  --ratio 0.1 \
  --self_refine_source parquet \
  --self_refine_parquet data/memory_rl_train_locomo_conv_26.parquet \
  --self_refine_log_jsonl ./results/memma_single/self_refine_log.jsonl

# Phase 2: QA Evaluation with iterative retrieval
python scripts/run_memma_self_refine_single.py \
  --dataset data/locomo10.json \
  --memory-dir ./results/memory_bank/memma_single/ \
  --output_dir ./results/memma_single/ \
  --model gpt-4o-mini \
  --retrieve_k 30 \
  --qr_max_turns 5 \
  --ratio 0.1

Evaluation

We provide experiment scripts for three memory backends. Each script handles both memory construction and QA evaluation:

Backend	Scripts
Single-Agent	`examples/run_memma_self_refine_single_gpt.sh`, `examples/run_memma_self_refine_single_claude.sh`
LightMem	`examples/run_memma_lightmem_gpt.sh`, `examples/run_memma_self_refine_lightmem_gpt.sh`, `examples/run_memma_self_refine_lightmem_claude.sh`
A-MEM	`examples/run_memma_self_refine_amem_gpt.sh`, `examples/run_memma_self_refine_amem_claude.sh`

LightMem backend requires LightMem installed (see External Dependencies).

Baselines

Vanilla Single-Agent Baseline

Single-agent memory construction without Meta-Thinker or self-evolution:

# GPT-4o-mini
bash examples/run_baseline_single.sh

# Claude Haiku 4.5
bash examples/run_baseline_single_claude.sh

Memory Toolkit Baselines

These baselines use memory toolkits from LightMem. Make sure LightMem is installed and LIGHTMEM_ROOT is set (see Installation).

Evaluate existing memory systems (A-MEM, NaiveRAG, LangMem, FullContext) using the toolkit configs in configs/memory_toolkits/:

# A-MEM
bash examples/run_baseline_amem.sh           # GPT-4o-mini
bash examples/run_baseline_amem_claude.sh    # Claude Haiku 4.5

# NaiveRAG (BM25 + embedding retrieval)
bash examples/run_baseline_naiverag_light.sh
bash examples/run_baseline_naiverag_light_claude.sh

# LangMem
bash examples/run_baseline_langmem.sh
bash examples/run_baseline_langmem_claude.sh

# Full Context
bash examples/run_baseline_full_text.sh
bash examples/run_baseline_full_text_light.sh
bash examples/run_baseline_full_text_light_claude.sh
bash examples/run_baseline_full_text_longmem.sh

LightMem Baseline

LightMem without MemMA coordination (requires LightMem):

bash examples/run_baseline_lightmem.sh                  # GPT-4o-mini
bash examples/run_baseline_lightmem_claude_bedrock.sh   # Claude via Bedrock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MemMA: Coordinating the Memory Cycle through Multi-Agent Reasoning and In-Situ Self-Evolution

Table of Contents

Overview

Forward Path: Multi-Agent Coordination

Backward Path: In-Situ Self-Evolution

Project Structure

Installation

External Dependencies

Environment Variables

Dataset Preparation

Quick Start

Evaluation

Baselines

Vanilla Single-Agent Baseline

Memory Toolkit Baselines

LightMem Baseline

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
configs/memory_toolkits		configs/memory_toolkits
data		data
data_preprocess		data_preprocess
examples		examples
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

MemMA: Coordinating the Memory Cycle through Multi-Agent Reasoning and In-Situ Self-Evolution

Table of Contents

Overview

Forward Path: Multi-Agent Coordination

Backward Path: In-Situ Self-Evolution

Project Structure

Installation

External Dependencies

Environment Variables

Dataset Preparation

Quick Start

Evaluation

Baselines

Vanilla Single-Agent Baseline

Memory Toolkit Baselines

LightMem Baseline

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages