Quoridor web player by jonbinney · Pull Request #371 · jonbinney/deep_rabbit_hole

jonbinney · 2026-05-31T19:12:39Z

This PR includes two unrelated changes, since I am lazy: W&B metrics for rust self play and a web app for playing against the rust alphazero implementation.

W&B Metrics for rust self-play

To avoid complicating the rust code too much, the rust self-play just logs metrics to files in the run directory being used. A separate python script, started by train_v2.py, reads those files and uploads the metrics to W&B. So far the metrics are all about the MCTS search. They provide some insight into how many paths in the search tree are being explored, how many reach terminal states, the entropy of the distribution at the root node, etc.

Web-app (and server) for playing against rust implementation of alphazero

Mostly so we can easily try playing against new models without having to compile and run everything locally. One rust binary serves a simple JS+html page and acts as an HTTP API server. Not much intelligence in the frontend. Lets the user choose between any model that is in a particular directory. Each session gets its own state and can have its own game against the AI.

Rust self-play exposes no visibility into MCTS behavior during a run. Spec adds per-model-version diagnostics (terminal/truncation sim fraction, tree depth, root-visit spread, nodes/branching, unique games) written by Rust to JSON records and logged to W&B by a new Python process in the same run group, reset on each model update.

Seven-task TDD plan: SearchStats from search(), per-game GameMetrics plus move hashing, a SelfPlayAccumulator that flushes per-version JSON records, wiring into the continuous loop's reload/shutdown points, and a Python aggregator + W&B logger process spawned by train_v2.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

A local web app for playing Quoridor against the project's Rust AlphaZero agent. One rust binary using tiny_http (no framework) serves embedded HTML/CSS/JS plus a small JSON API; each browser session gets its own in-memory game vs the AI. The user picks the ONNX model from <play-dir>/models/*.onnx (board config inferred from a sibling config.yaml), the mcts_n slider, and who plays first.

Nine-task TDD plan: scaffold the play_server module with pure StateView and action enrichment, then config loader, GameSession + registry, JSON handlers, the tiny_http bin with embedded static assets, an end-to-end test against the B5W2 fixture, and the vanilla HTML/CSS/JS frontend.

Scaffolds play_server module with state.rs containing EnrichedAction, WallOrientation, StateView, WallEntry types and enrich_action / enrich_legal_actions helpers. ACTION constants differ from plan assumptions: WALL_VERTICAL=0, WALL_HORIZONTAL=1, MOVE=2 — match arms updated accordingly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Comment said move_history has at least 2 entries after the human+AI atomic round, but the assertion was >= 1 which would pass even if the AI step silently did not fire. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Vanilla HTML/CSS/JS implementing the spec: - (2N-1)x(2N-1) CSS grid with alternating pawn-cell, wall-slot, and post tracks. - Click handlers attached to legal-action targets keyed by the bare action index (no client-side encoding logic). - Mirrors coordinates 180 deg when human_player == 1 so the human's home row always sits at the bottom of the rendered board. - Side panel with new-game form (model select, mcts_n slider, who- goes-first toggle), turn indicator with AI-thinking spinner, walls remaining per player, step counter, and a game-over banner. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Silences a longstanding unused_imports warning -- the trait is only referenced inside #[cfg(test)] via new_with_evaluator. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Four issues addressed: - Empty wall slots now use a mid-tone groove color (--slot) that is clearly distinct from both pawn cells (--cell, lighter) and placed walls (--wall, dark walnut). Previously empty slots and placed walls were both dark, which made a placed vertical wall look like a cross at the post intersection. - Hovering a legal wall now highlights the entire wall (both halves + the post between them) by tagging the three cells with a shared data-wall-group and synchronizing mouseenter/leave on all of them. - Removed the cross artifact on placed vertical walls -- a side effect of the slot-color change above. - The UI now reflects the human's move immediately via an optimistic local update; the server's atomic human+AI response then replaces the state when it arrives. Errors roll back via GET /api/games/<id>. Also refines the palette toward a warm wood-and-stone aesthetic: warmer beiges, walnut walls, inset pawn shading, board border with depth. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The previous commit added && \!STATE.pending to the humanTurn check in render(), intending to suppress clicks while a network request was in flight. But the post-AI render happens before setPending(false) runs (it's in the finally block), so STATE.pending is still true at that point and no click handlers got attached -- the board became inert. Drop the guard. sendMove() already short-circuits on STATE.pending, so gating handler attachment is redundant. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Resolves three coupled UI problems: 1. Overlapping wall highlights. The right half of one horizontal wall slot is the left half of the next, so hovering it used to fire mouseenter listeners on both walls' groups -- two walls would light up. Now each wall is interactive only on its display top-left cell (left for H, top for V), so every grid cell triggers at most one wall. 2. Ambiguity at intersection posts. Posts were part of two walls' groups before, so hovering one lit up both H and V. With the anchor-only rule, posts are never any wall's anchor and trigger nothing on hover. 3. Human rendered at the top. The mirror condition was inverted vs the spec's intent: the canonical view in CSS has display row 0 at the top, so to put P0's home row at the bottom we have to vertical- flip the board when the human is P0. (P1 already starts at server row N-1, which is the display bottom without a flip.) Switched to a simple vertical flip -- cols stay put -- so the H wall's left half stays the left half in display space and the anchor-only rule reads naturally as 'extends right (H) or down (V)'. Also drops the now-redundant brightness:hover race-cover style; with anchor-only attachment there's no race. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The status panel was wired by server player index, so when the human plays as P1 the 'You' row showed P0's wall count and vice versa. Map by role (human vs ai) using human_player instead. Also reword from 'walls left' to 'walls remaining'. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Bump the default --default-mcts-n CLI value from 400 to 1000 so the slider lands somewhere more capable on startup. - Drop the (P1)/(P2) suffixes from the who-goes-first radio labels; they're noise in a strictly human-vs-AI app. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The lib was declared as both cdylib and rlib in Cargo.toml, which triggers cargo issue #6313 (output filename collision) on every cargo test invocation -- three warnings on each run, no errors, but noise that crowds out anything else. Drop cdylib from Cargo.toml and add a pyproject.toml that tells maturin to use the pyo3 bindings; that backend implicitly compiles with --crate-type cdylib for the wheel build, so plain 'maturin build' still produces an importable .so without cdylib being a default cargo output. Verified by building and importing the wheel. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Clippy lint (clippy::bool_then_in_filter_map). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

jonbinney · 2026-06-04T17:54:25Z

Although this is a big change, it isn't as big as the diff seems. A couple thousand lines of changes are just the planning docs that claude made. Then there's all the web frontend and API stuff, which ..... isn't "real" code :-P

jonbinney and others added 30 commits May 29, 2026 11:00

Update B9W10 config

f997cc8

vibe: return per-search MCTS SearchStats

c5befa0

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

vibe: collect per-game MCTS metrics and move hashes

eac3726

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

vibe: add SelfPlayAccumulator with JSON metric flush

3ea932c

vibe: flush self-play MCTS metrics per model version

e936b69

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

vibe: add self-play metrics aggregator and W&B logger

defa27b

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

vibe: spawn self-play metrics logger from train_v2

12f00c7

vibe: cargo fmt + ruff format

9ed6c73

vibe: add tiny_http dep + play_server bin entry

e58cad1

vibe: add play_server config loader and models scan

8386d2f

vibe: add GameSession + GameRegistry

e8a391f

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

vibe: add play_server JSON handlers

d2d31a9

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

vibe: tighten handler test to assert AI step ran

1dc17ad

Comment said move_history has at least 2 entries after the human+AI atomic round, but the assertion was >= 1 which would pass even if the AI step silently did not fire. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vibe: add play_server tiny_http binary

9d4fa7f

vibe: add play_server end-to-end test

8499e08

vibe: gate test-only Evaluator import behind cfg(test)

ab19c84

Silences a longstanding unused_imports warning -- the trait is only referenced inside #[cfg(test)] via new_with_evaluator. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vibe: cargo fmt

8304135

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

vibe: prefer filter().map() over bool::then in filter_map

de6683c

Clippy lint (clippy::bool_then_in_filter_map). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

jonbinney requested a review from alejandromarcu June 4, 2026 16:22

jonbinney marked this pull request as ready for review June 4, 2026 16:22

alejandromarcu approved these changes Jun 4, 2026

View reviewed changes

jonbinney merged commit 27bcf1c into main Jun 5, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quoridor web player#371

Quoridor web player#371
jonbinney merged 30 commits into
mainfrom
jdb/rust-self-play-logging

jonbinney commented May 31, 2026 •

edited

Loading

Uh oh!

jonbinney commented Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jonbinney commented May 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

W&B Metrics for rust self-play

Web-app (and server) for playing against rust implementation of alphazero

Uh oh!

jonbinney commented Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jonbinney commented May 31, 2026 •

edited

Loading