Trajectory Stability

Research framework for measuring trajectory stability in LLM-based agents under state perturbations.

Overview

This project evaluates how agent action trajectories change when their observations and memory are perturbed. Tests three perturbation types across multiple LLM models to measure robustness and reproducibility.

Perturbations:

MEM-REORDER: Shuffles memory order
OBS-PARAPHRASE: Semantically equivalent observation rephrasing
CONTEXT-INJECT: Adds irrelevant information

Models:

GPT-4o-mini
Claude-3-Haiku
Llama-3-8B

Metrics:

Trajectory Divergence Rate (TDR)
Degree of Similarity (DoS)
Recovery Rate
Perturbation Effect on Trajectory (PET)

Installation

pip install -r requirements.txt

Set up your API keys in .env:

OPENAI_API_KEY=your_key
ANTHROPIC_API_KEY=your_key
HF_TOKEN=your_token  # Optional, for Llama

Usage

Run a quick example:

python run_example.py

Run full multi-model experiment:

python run_multimodel_experiment.py

Analyze results:

python final_analysis.py

Structure

src/
├── agents/          # ReAct agent implementation
├── environment/     # FileWorld navigation environment
├── perturbations/   # Perturbation implementations
├── metrics/         # Stability metrics
├── models/          # LLM client wrappers
└── experiments/     # Experiment runners and analysis

Results

Results are saved to results/ with trajectories, metrics, and reproducibility manifests in JSON format.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
paper		paper
results		results
src		src
tests		tests
.gitignore		.gitignore
README.md		README.md
analyze_alfworld_results.py		analyze_alfworld_results.py
final_analysis.py		final_analysis.py
generate_plots.py		generate_plots.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run_alfworld_experiment.py		run_alfworld_experiment.py
run_example.py		run_example.py
run_full_experiment.py		run_full_experiment.py
run_llama_colab.ipynb		run_llama_colab.ipynb
run_multimodel_experiment.py		run_multimodel_experiment.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Trajectory Stability

Overview

Installation

Usage

Structure

Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Trajectory Stability

Overview

Installation

Usage

Structure

Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages