Official implementation of "MemMA: Coordinating the Memory Cycle through Multi-Agent Reasoning and In-Situ Self-Evolution".
If you find this work helpful, please cite our paper:
@article{lin2026memma,
title={MemMA: Coordinating the Memory Cycle through Multi-Agent Reasoning and In-Situ Self-Evolution},
author={Lin, Minhua and Zhang, Zhiwei and Lu, Hanqing and Liu, Hui and Tang, Xianfeng and He, Qi and Zhang, Xiang and Wang, Suhang},
journal={arXiv preprint arXiv:2603.18718},
year={2026}
}Existing memory-augmented LLM agents mainly treat memory construction, retrieval, and utilization as isolated subroutines, leading to myopic construction and aimless retrieval (shallow or repetitive searches). MemMA introduces a multi-agent framework that coordinates both the forward and backward paths of the memory cycle.
- Meta-Thinker produces structured guidance that steers both construction and retrieval phases, identifying information importance, redundancy, conflicts, and evidence gaps.
- Memory Manager executes atomic memory edits (
ADD,UPDATE,DELETE,NONE) conditioned on Meta-Thinker guidance and current context. - Query Reasoner implements an iterative Refine-and-Probe loop, replacing one-shot retrieval with diagnosis-guided evidence refinement over multiple turns.
- Answer Agent generates final responses from accumulated evidence.
After each session, a three-stage mechanism repairs the provisional memory before commitment:
- Probe Generation -- synthesizes QA pairs testing factual recall, cross-session reasoning, and temporal inference.
- In-situ Verification -- retrieves evidence from provisional memory and generates answers.
- Evidence-Grounded Repair -- failed probes trigger repair proposals (
SKIP/MERGE/INSERT), consolidating and correcting memory.
memma/
├── scripts/ # Experiment runner scripts
│ ├── run_memma_self_refine_single*.py # MemMA: Single-Agent backend
│ ├── run_memma_self_refine_lightmem*.py # MemMA: LightMem backend
│ ├── run_memma_self_refine_amem*.py # MemMA: A-MEM backend
│ ├── run_memma_lightmem*.py # LightMem (no self-refine)
│ └── run_vanilla_baseline*.py # Vanilla baselines
│
├── data/ # Dataset files
│ └── memory_rl_train_locomo_conv_26.parquet # Pre-generated probe QA
│
├── data_preprocess/ # Dataset utilities
│ └── utils.py # Data loading (load_locomo_dataset, etc.)
│
├── configs/memory_toolkits/ # Baseline toolkit configs (JSON)
│
├── examples/ # Experiment shell scripts
│ ├── run_memma_*.sh # MemMA experiments
│ └── run_baseline_*.sh # Baseline experiments
├── requirements.txt
├── .env.example
└── .gitignore
git clone https://github.com/ventr1c/memma.git
cd memma
# Create conda environment
conda create -n memma python=3.11 -y
conda activate memma
# Install PyTorch (CUDA 12.8, adjust for your CUDA version)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
# Install dependencies
pip install -r requirements.txtThe LightMem and FullContext/NaiveRAG/LangMem/A-MEM baseline scripts rely on memory toolkits from LightMem. To run these baselines, install LightMem and set the path:
git clone https://github.com/zjunlp/LightMem.git
cd LightMem
pip install -e .
cd ..
export LIGHTMEM_ROOT=/path/to/LightMemCopy the example environment file and fill in your API keys:
cp .env.example .env| Variable | Description |
|---|---|
OPENAI_API_KEY |
OpenAI API key (for GPT-4o-mini backbone) |
ANTHROPIC_API_KEY |
Anthropic API key (for Claude Haiku backbone) |
MemMA is evaluated on the LoCoMo dataset. Download locomo10.json and place it under the data/ directory:
mkdir -p data
# Download LoCoMo dataset and place locomo10.json in data/The self-evolution backward path uses a pre-generated probe QA parquet file, which is included in the repo:
data/memory_rl_train_locomo_conv_26.parquet
The generation pipeline for this file will be documented in a future update.
Each experiment follows a two-phase pipeline:
- Phase 1 -- Memory Construction: Build memories from conversation sessions (with optional Meta-Thinker guidance and self-evolution).
- Phase 2 -- QA Evaluation: Retrieve from constructed memories and answer questions (with optional iterative Query Reasoner).
Example with the Single-Agent backend using GPT-4o-mini:
# Set your API key
export OPENAI_API_KEY="your-key-here"
# Phase 1: Build memories with Meta-Thinker + Self-Evolution
python scripts/run_memma_self_refine_single.py \
--dataset data/locomo10.json \
--memory-dir ./results/memory_bank/memma_single/ \
--output_dir ./results/memma_single/ \
--enable_construction_meta_guidance \
--build_memories \
--model gpt-4o-mini \
--retrieve_k 30 \
--ratio 0.1 \
--self_refine_source parquet \
--self_refine_parquet data/memory_rl_train_locomo_conv_26.parquet \
--self_refine_log_jsonl ./results/memma_single/self_refine_log.jsonl
# Phase 2: QA Evaluation with iterative retrieval
python scripts/run_memma_self_refine_single.py \
--dataset data/locomo10.json \
--memory-dir ./results/memory_bank/memma_single/ \
--output_dir ./results/memma_single/ \
--model gpt-4o-mini \
--retrieve_k 30 \
--qr_max_turns 5 \
--ratio 0.1We provide experiment scripts for three memory backends. Each script handles both memory construction and QA evaluation:
| Backend | Scripts |
|---|---|
| Single-Agent | examples/run_memma_self_refine_single_gpt.sh, examples/run_memma_self_refine_single_claude.sh |
| LightMem | examples/run_memma_lightmem_gpt.sh, examples/run_memma_self_refine_lightmem_gpt.sh, examples/run_memma_self_refine_lightmem_claude.sh |
| A-MEM | examples/run_memma_self_refine_amem_gpt.sh, examples/run_memma_self_refine_amem_claude.sh |
LightMem backend requires LightMem installed (see External Dependencies).
Single-agent memory construction without Meta-Thinker or self-evolution:
# GPT-4o-mini
bash examples/run_baseline_single.sh
# Claude Haiku 4.5
bash examples/run_baseline_single_claude.shThese baselines use memory toolkits from LightMem. Make sure LightMem is installed and LIGHTMEM_ROOT is set (see Installation).
Evaluate existing memory systems (A-MEM, NaiveRAG, LangMem, FullContext) using the toolkit configs in configs/memory_toolkits/:
# A-MEM
bash examples/run_baseline_amem.sh # GPT-4o-mini
bash examples/run_baseline_amem_claude.sh # Claude Haiku 4.5
# NaiveRAG (BM25 + embedding retrieval)
bash examples/run_baseline_naiverag_light.sh
bash examples/run_baseline_naiverag_light_claude.sh
# LangMem
bash examples/run_baseline_langmem.sh
bash examples/run_baseline_langmem_claude.sh
# Full Context
bash examples/run_baseline_full_text.sh
bash examples/run_baseline_full_text_light.sh
bash examples/run_baseline_full_text_light_claude.sh
bash examples/run_baseline_full_text_longmem.shLightMem without MemMA coordination (requires LightMem):
bash examples/run_baseline_lightmem.sh # GPT-4o-mini
bash examples/run_baseline_lightmem_claude_bedrock.sh # Claude via Bedrock