SVG generation benchmark for open-source LLMs via OpenRouter.
Compare any set of models on SVG generation quality using pairwise VLM judging and ELO rankings.
https://github.com/adamholter/pathscore/raw/main/demo.mp4
- Configure — Select models, define prompts, pick a judge model
- Run — All SVG generations fire in parallel; pairwise VLM judging starts as generations complete
- Results — ELO leaderboard + win-rate heatmap + full comparison browser
npm install
cp .env.example .env # Add your OpenRouter API key
node server.cjsRun tests:
npm testBackend architecture notes: BACKEND_ARCHITECTURE.md
OPENROUTER_API_KEY=sk-or-...
PORT=7642
PATHSCORE_EXTENSION_RUNTIME=legacy
PATHSCORE_INVARIANT_CHECKS=0
- JSON — Full dataset (configs, SVGs, comparisons, metadata) for reproducibility
- HTML — Standalone report page with leaderboard and heatmap
- PDF — Browser print-to-PDF
For N models and M prompts:
- Generates N × M SVGs in parallel
- Creates all N × (N-1) / 2 unique pairs per prompt
- Randomly assigns A/B positions to eliminate position bias
- Runs each pair through the VLM judge (configurable 1-5 runs per pair)
Standard ELO starting at 1000 with K=32. Win=1, Tie=0.5, Loss=0.
- Backend: Node.js + Express + SQLite (better-sqlite3)
- Models: OpenRouter API (400+ models available)
- Frontend: Vanilla JS SPA, PathScore brand identity
- Streaming: Server-Sent Events for live run updates
MIT