EigenBench: A Comparative Behavioral Measure of Value Alignment

The official repository for EigenBench: A Comparative Behavioral Measure of Value Alignment.

EigenBench is a black-box framework for quantifying value alignment across language models without relying on ground-truth labels. Given a model ensemble, a constitution describing a value system, and a scenario dataset, models judge each other’s responses in pairwise comparisons; these judgments are fit with a Bradley-Terry-Davison (BTD) model and aggregated with EigenTrust into consensus alignment scores.

Install

python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

Set API keys in .env:

OPENROUTER_API_KEY — required for OpenRouter models
HF_TOKEN — required for gated/private Hugging Face models (also reads from ~/.huggingface/token via huggingface-cli login)

Quick Start

Create a run folder and copy the example spec.

mkdir -p runs/my_run
cp runs/example/spec.py runs/my_run/spec.py

Edit runs/my_run/spec.py (required fields: models, dataset.path, constitution.path, constitution.num_criteria).
Run:

Option A: Local (collect + train locally)

python scripts/run.py runs/my_run/spec.py

Option B: Cloud (collect locally, train + upload on ValueArena Space)

Add to your spec:

"upload": {
    "enabled": True,
    "name": "my-run",
    "group": "",
    "note": "optional note",
},

Then run:

export SPACE_SECRET="your-secret"
python scripts/run.py runs/my_run/spec.py

Collection runs locally, then the evaluations are sent to the Space which handles BTD training, bootstrap, EigenTrust, and upload to ValueArena in the background.

If you already have evaluations.jsonl, set collection.enabled=False to skip collection and just train+upload via the Space.

Mixed-model runs work out of the box — just prefix local model paths with hf_local: in your spec. The pipeline auto-detects and batches local models through vLLM while routing API models through OpenRouter.

Run Spec

Top-level keys in RUN_SPEC:

models: {display_name: openrouter_model_id} or {display_name: hf_local:<hf_model_path>}
dataset: scenario source and slicing.
constitution: constitution file path and criterion count.
collection: evaluation sampling/collection settings.
training: BT/BTD training settings.

Dataset controls

path: JSON file of scenarios.
start: start offset (default 0).
count: number of scenarios after start (omit for all remaining).
shuffle: shuffle before slicing.
shuffle_seed: reproducible shuffle seed.

Constitution controls

path: constitution JSON file.
num_criteria (required): hard cap used for collection + extraction.

Spec Modes

Spec Mode: Full Pipeline

"collection": {
    "enabled": True,
    "cached_responses_path": "data/responses/main_cache.jsonl",  # optional
},
"training": {
    "enabled": True,
}

Behavior:

If cached_responses_path is set, cache stage runs first.
Then evaluation collection runs.
Then training/eigentrust runs.

Spec Mode: Train Only

"collection": {
    "enabled": False,
    "evaluations_path": "runs/my_run/evaluations.jsonl",
},
"constitution": {
    "path": "data/constitutions/kindness.json",
    "num_criteria": 8,
},
"training": {
    "enabled": True,
}

Use this when you already have evaluation transcripts and only want BT/BTD + EigenTrust outputs.

Spec Mode: Collect Only

"collection": {
    "enabled": True,
},
"training": {
    "enabled": False,
}

Use this to build/append evaluations.jsonl without running model fitting.

Spec Mode: Cache Only

"collection": {
    "enabled": False,
    "cached_responses_path": "data/responses/main_cache.jsonl",
},
"training": {
    "enabled": False,
}

Use this to precompute model responses for scenarios.

Spec Mode: Mixed HF Local + OpenRouter

Mix OpenRouter API models and local Hugging Face models in the same run. Local models are automatically batched through vLLM for efficient GPU inference, while API models are called through OpenRouter. Use hf_local: prefixes in your models dict:

"models": {
    "Claude 4 Sonnet": "anthropic/claude-sonnet-4",                      # OpenRouter
    "Qwen-sarcasm": "hf_local:maius/qwen-2.5-7b-it-personas/sarcasm",     # lora
    "Qwen": "hf_local:Qwen/Qwen2.5-7B-Instruct",                       # local
},
"collection": {
    "enabled": True,
    "sampler_mode": "random_judge_group",  # or "all_to_all"
},
"training": {
    "enabled": True,
}

The pipeline auto-detects hf_local: models and routes to the mixed collection path, which runs in 3 batched phases:

Responses — all evaluee responses (OpenRouter sequential, vLLM batched)
Reflections — all judge reflections (OpenRouter sequential, vLLM batched)
Comparisons — all pairwise comparisons (OpenRouter sequential, vLLM batched)

This is significantly faster than one-at-a-time API-style calls for local models.

LoRA adapter syntax: hf_local:org/repo/subfolder — the subfolder is resolved as a LoRA adapter on the base model detected from adapter_config.json.

Spec Mode: All-to-All Collection

Use sampler_mode: "all_to_all" for exhaustive evaluation where every model judges every other model's response on every scenario:

"collection": {
    "enabled": True,
    "sampler_mode": "all_to_all",
},
"training": {
    "enabled": True,
}

In all-to-all mode:

Every model acts as a judge for every scenario
Every model's response is evaluated by every judge
Reflections are per-judge (each judge reflects independently on each response)
All ordered pairs (eval1, eval2) are compared

This produces the most complete evaluation matrix but scales as O(scenarios × models² × models²)

Bootstrap Resampling

Adds error bars to EigenBench Elo scores by resampling comparisons and retraining BT/BTD models.

"training": {
    "bootstrap": {
        "enabled": True,
        "n_bootstraps": 100,
        "random_seed": 42,
        "save_models": False,
        "save_trust_matrices": True,
    },
}

Warning

Bootstrap only retrains the BT/BTD model. Run it locally on CPU to avoid wasting GPU compute time.

Outputs

Per run folder (runs/<run_name>/):

evaluations.jsonl (if collection ran)
btd_d<dim>/ folders (if training ran), containing:
- training_loss.png
- model.pt
- eigentrust.txt
- uv_embeddings_pca.png
- eigenbench.png
- log_train.txt
- bootstrap/ (if bootstrap enabled):
  - samples.json
  - summary.json
  - bootstrap_elo.png

Repo Layout

EigenBench/
├── pipeline/
│   ├── eval/          # collection orchestration + sampling
│   │   ├── collect.py             # OpenRouter-only collection
│   │   ├── mixed_collect.py       # mixed OpenRouter + vLLM collection (+ all-to-all)
│   │   ├── criteria_collectors.py # prompt builders + single-group collection
│   │   ├── samplers.py            # judge/evaluee sampling strategies
│   │   └── flows.py               # response-only collection
│   ├── train/         # BT/BTD fitting + plots
│   │   ├── bt_models.py           # VectorBT, VectorBTD, CriteriaVectorBTD
│   │   ├── train.py               # training loop + utilities
│   │   └── plots.py               # embedding + Elo visualizations
│   ├── trust/         # trust matrix + EigenTrust
│   ├── utils/         # record IO + comparison extraction
│   ├── config/        # run-spec + dataset/constitution loaders
│   └── providers/     # model API calls (OpenRouter + vLLM)
├── scripts/
│   ├── run.py                    # only user entrypoint
│   ├── run_collect.py            # internal: routes to mixed or OpenRouter-only collection
│   ├── run_collect_responses.py  # internal: response cache stage
│   ├── run_train.py              # internal: training stage
│   └── upload_results.py         # manual upload to ValueArena
├── notebooks/
│   ├── mixed_openrouter_local_collection.ipynb  # legacy notebook (now integrated into CLI)
│   ├── bootstrap_resampling.ipynb               # bootstrap analysis
├── runs/
│   └── <run_name>/
│       ├── spec.py            # per-run config
│       ├── evaluations.jsonl  # collected judgments
│       └── btd_d<dim>/        # training outputs
├── data/
│   ├── constitutions/         # committed constitutions
│   ├── scenarios/             # local scenario datasets
│   └── responses/             # shared cached responses

Datasets Used in the Paper

AskReddit: https://www.kaggle.com/datasets/rodmcn/askreddit-questions-and-answers
OpenAssistant: https://huggingface.co/datasets/OpenAssistant/oasst1
AIRiskDilemmas (LitmusValues): https://huggingface.co/datasets/kellycyy/AIRiskDilemmas

ValueArena

Upload run results to the ValueArena leaderboard.

Auto-upload via Space

Add an upload section to your spec to automatically train and upload results to ValueArena after collection finishes. Training runs on the HF Space (free CPU), so no local GPU is needed.

"upload": {
    "enabled": True,
    "name": "oct/goodness",       # run slug on ValueArena
    "group": "oct",               # optional grouping
    "note": "LoRA-only (12 personas)",  # shows in the table
},

Set the SPACE_SECRET env var (or upload.secret in spec) before running:

export SPACE_SECRET="your-secret"
python scripts/run.py runs/my_run/spec.py

When upload.enabled=True, local training is skipped. After collection, the evaluations and spec are sent to the Space which handles BTD training, bootstrap, EigenTrust, and upload to ValueArena in the background.

Manual upload

# Single run
python3 scripts/upload_results.py --name "my-run" --run-dir runs/my_run/ --note "optional note"

# Batch upload (all sub-runs in a folder)
python3 scripts/upload_results.py --batch-dir runs/matrix/ --name "matrix" --note "12 persona LoRAs"

--name is the run slug on HF. For batch, it's the prefix (matrix → matrix/goodness, matrix/humor, etc.)
--note shows in the table on the website
Re-uploading with the same name overwrites the previous entry
Git commit hash and scenario range are captured automatically

Citation

@misc{chang2025eigenbenchcomparativebehavioralmeasure,
      title={EigenBench: A Comparative Behavioral Measure of Value Alignment},
      author={Jonathn Chang and Leonhard Piff and Suvadip Sana and Jasmine X. Li and Lionel Levine},
      year={2025},
      eprint={2509.01938},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2509.01938},
}

Name		Name	Last commit message	Last commit date
Latest commit History 127 Commits
data/constitutions		data/constitutions
figs		figs
notebooks		notebooks
pipeline		pipeline
runs		runs
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EigenBench: A Comparative Behavioral Measure of Value Alignment

Table of Contents

Install

Quick Start

Run Spec

Dataset controls

Constitution controls

Spec Modes

Spec Mode: Full Pipeline

Spec Mode: Train Only

Spec Mode: Collect Only

Spec Mode: Cache Only

Spec Mode: Mixed HF Local + OpenRouter

Spec Mode: All-to-All Collection

Bootstrap Resampling

Outputs

Repo Layout

Datasets Used in the Paper

ValueArena

Auto-upload via Space

Manual upload

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

EigenBench: A Comparative Behavioral Measure of Value Alignment

Table of Contents

Install

Quick Start

Run Spec

Dataset controls

Constitution controls

Spec Modes

Spec Mode: Full Pipeline

Spec Mode: Train Only

Spec Mode: Collect Only

Spec Mode: Cache Only

Spec Mode: Mixed HF Local + OpenRouter

Spec Mode: All-to-All Collection

Bootstrap Resampling

Outputs

Repo Layout

Datasets Used in the Paper

ValueArena

Auto-upload via Space

Manual upload

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages