Skip to content

vedevpatel/Trajectory-Stability

Repository files navigation

Trajectory Stability

Research framework for measuring trajectory stability in LLM-based agents under state perturbations.

Overview

This project evaluates how agent action trajectories change when their observations and memory are perturbed. Tests three perturbation types across multiple LLM models to measure robustness and reproducibility.

Perturbations:

  • MEM-REORDER: Shuffles memory order
  • OBS-PARAPHRASE: Semantically equivalent observation rephrasing
  • CONTEXT-INJECT: Adds irrelevant information

Models:

  • GPT-4o-mini
  • Claude-3-Haiku
  • Llama-3-8B

Metrics:

  • Trajectory Divergence Rate (TDR)
  • Degree of Similarity (DoS)
  • Recovery Rate
  • Perturbation Effect on Trajectory (PET)

Installation

pip install -r requirements.txt

Set up your API keys in .env:

OPENAI_API_KEY=your_key
ANTHROPIC_API_KEY=your_key
HF_TOKEN=your_token  # Optional, for Llama

Usage

Run a quick example:

python run_example.py

Run full multi-model experiment:

python run_multimodel_experiment.py

Analyze results:

python final_analysis.py

Structure

src/
├── agents/          # ReAct agent implementation
├── environment/     # FileWorld navigation environment
├── perturbations/   # Perturbation implementations
├── metrics/         # Stability metrics
├── models/          # LLM client wrappers
└── experiments/     # Experiment runners and analysis

Results

Results are saved to results/ with trajectories, metrics, and reproducibility manifests in JSON format.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors