Research framework for measuring trajectory stability in LLM-based agents under state perturbations.
This project evaluates how agent action trajectories change when their observations and memory are perturbed. Tests three perturbation types across multiple LLM models to measure robustness and reproducibility.
Perturbations:
- MEM-REORDER: Shuffles memory order
- OBS-PARAPHRASE: Semantically equivalent observation rephrasing
- CONTEXT-INJECT: Adds irrelevant information
Models:
- GPT-4o-mini
- Claude-3-Haiku
- Llama-3-8B
Metrics:
- Trajectory Divergence Rate (TDR)
- Degree of Similarity (DoS)
- Recovery Rate
- Perturbation Effect on Trajectory (PET)
pip install -r requirements.txtSet up your API keys in .env:
OPENAI_API_KEY=your_key
ANTHROPIC_API_KEY=your_key
HF_TOKEN=your_token # Optional, for Llama
Run a quick example:
python run_example.pyRun full multi-model experiment:
python run_multimodel_experiment.pyAnalyze results:
python final_analysis.pysrc/
├── agents/ # ReAct agent implementation
├── environment/ # FileWorld navigation environment
├── perturbations/ # Perturbation implementations
├── metrics/ # Stability metrics
├── models/ # LLM client wrappers
└── experiments/ # Experiment runners and analysis
Results are saved to results/ with trajectories, metrics, and reproducibility manifests in JSON format.