ValueArena: Which Model Shares Your Values?

A comparative behavioral measure of value alignment across language models, built on EigenBench.

What is this?

ValueArena lets you explore how different LLMs align with specific human values. EigenBench works by having an ensemble of models judge each other's responses across value-loaded scenarios, fitting pairwise comparisons to a Bradley-Terry model, and aggregating consensus scores via EigenTrust.

The site has three sections:

Chat — Pick two models and a constitution (e.g. Kindness, Humor, Sarcasm), then chat side-by-side. Vote on which model better reflects the chosen value. Uses OpenRouter for inference directly from the browser.
Leaderboard — Per-constitution Elo rankings from EigenBench experiment runs. View as a ranked table, horizontal bar plot, or cross-constitution pareto heatmap. Group by model or by lab.
Experiments — Browse all EigenBench runs with filtering, sorting, and drill-down into individual run details (Elo distributions, training loss, UV embeddings, bootstrap confidence intervals).

Architecture

Static HTML/CSS/JS — no build step, no backend. Run data lives on a HuggingFace dataset repo and is fetched at page load. Chat uses OpenRouter with the user's own API key. Votes are stored in localStorage.

index.html          Single-page app shell with sidebar
css/style.css       All styles (dark/light theme, DM Serif Display + DM Sans)
js/config.js        Model list, constitutions, API endpoints
js/tabs.js          Sidebar tab switching
js/utils.js         Shared helpers (esc, debounce)
js/chat.js          Side-by-side chat, streaming, voting, session persistence
js/leaderboard.js   Per-constitution rankings, plot, pareto views
js/index.js         Experiments table with filters and grouping
js/hf-fetch.js      HuggingFace dataset fetching
run.html            Individual run detail page

Upload results

From the EigenBench repo:

# Single run
python3 scripts/upload_results.py --name "my-run" --run-dir runs/my_run/

# Batch (all sub-runs in a folder)
python3 scripts/upload_results.py --batch-dir runs/matrix/ --name "matrix" --note "12 persona LoRAs"

Or use the HF Space for auto-upload by adding upload config to your spec — see the EigenBench docs.

New runs appear on the site immediately after upload.

Local dev

python3 -m http.server

Open http://localhost:8000.

Data

All experiment data is stored on HuggingFace at invi-bhagyesh/ValueArena:

index.json                  Manifest of all runs
runs/{name}/
  meta.json                 Spec, training log, eigentrust scores, git info
  summary.json              Bootstrap Elo ratings per model
  evaluations.jsonl         Raw evaluation transcripts
  images/                   Plots (Elo, training loss, UV embeddings, etc.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueArena: Which Model Shares Your Values?

What is this?

Architecture

Upload results

Local dev

Data

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

ValueArena: Which Model Shares Your Values?

What is this?

Architecture

Upload results

Local dev

Data