Skip to content

Quoridor web player#371

Merged
jonbinney merged 30 commits into
mainfrom
jdb/rust-self-play-logging
Jun 5, 2026
Merged

Quoridor web player#371
jonbinney merged 30 commits into
mainfrom
jdb/rust-self-play-logging

Conversation

@jonbinney

@jonbinney jonbinney commented May 31, 2026

Copy link
Copy Markdown
Owner

This PR includes two unrelated changes, since I am lazy: W&B metrics for rust self play and a web app for playing against the rust alphazero implementation.

W&B Metrics for rust self-play

To avoid complicating the rust code too much, the rust self-play just logs metrics to files in the run directory being used. A separate python script, started by train_v2.py, reads those files and uploads the metrics to W&B. So far the metrics are all about the MCTS search. They provide some insight into how many paths in the search tree are being explored, how many reach terminal states, the entropy of the distribution at the root node, etc.

Web-app (and server) for playing against rust implementation of alphazero

Mostly so we can easily try playing against new models without having to compile and run everything locally. One rust binary serves a simple JS+html page and acts as an HTTP API server. Not much intelligence in the frontend. Lets the user choose between any model that is in a particular directory. Each session gets its own state and can have its own game against the AI.

jonbinney and others added 30 commits May 29, 2026 11:00
Rust self-play exposes no visibility into MCTS behavior during a run.
Spec adds per-model-version diagnostics (terminal/truncation sim
fraction, tree depth, root-visit spread, nodes/branching, unique games)
written by Rust to JSON records and logged to W&B by a new Python
process in the same run group, reset on each model update.
Seven-task TDD plan: SearchStats from search(), per-game GameMetrics
plus move hashing, a SelfPlayAccumulator that flushes per-version JSON
records, wiring into the continuous loop's reload/shutdown points, and
a Python aggregator + W&B logger process spawned by train_v2.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
A local web app for playing Quoridor against the project's Rust
AlphaZero agent. One rust binary using tiny_http (no framework) serves
embedded HTML/CSS/JS plus a small JSON API; each browser session gets
its own in-memory game vs the AI. The user picks the ONNX model from
<play-dir>/models/*.onnx (board config inferred from a sibling
config.yaml), the mcts_n slider, and who plays first.
Nine-task TDD plan: scaffold the play_server module with pure StateView
and action enrichment, then config loader, GameSession + registry, JSON
handlers, the tiny_http bin with embedded static assets, an end-to-end
test against the B5W2 fixture, and the vanilla HTML/CSS/JS frontend.
Scaffolds play_server module with state.rs containing EnrichedAction,
WallOrientation, StateView, WallEntry types and enrich_action /
enrich_legal_actions helpers. ACTION constants differ from plan
assumptions: WALL_VERTICAL=0, WALL_HORIZONTAL=1, MOVE=2 — match arms
updated accordingly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment said move_history has at least 2 entries after the human+AI
atomic round, but the assertion was >= 1 which would pass even if
the AI step silently did not fire.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Vanilla HTML/CSS/JS implementing the spec:

- (2N-1)x(2N-1) CSS grid with alternating pawn-cell, wall-slot, and
  post tracks.
- Click handlers attached to legal-action targets keyed by the bare
  action index (no client-side encoding logic).
- Mirrors coordinates 180 deg when human_player == 1 so the human's
  home row always sits at the bottom of the rendered board.
- Side panel with new-game form (model select, mcts_n slider, who-
  goes-first toggle), turn indicator with AI-thinking spinner, walls
  remaining per player, step counter, and a game-over banner.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Silences a longstanding unused_imports warning -- the trait is only
referenced inside #[cfg(test)] via new_with_evaluator.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four issues addressed:

- Empty wall slots now use a mid-tone groove color (--slot) that is
  clearly distinct from both pawn cells (--cell, lighter) and placed
  walls (--wall, dark walnut). Previously empty slots and placed walls
  were both dark, which made a placed vertical wall look like a cross
  at the post intersection.
- Hovering a legal wall now highlights the entire wall (both halves +
  the post between them) by tagging the three cells with a shared
  data-wall-group and synchronizing mouseenter/leave on all of them.
- Removed the cross artifact on placed vertical walls -- a side effect
  of the slot-color change above.
- The UI now reflects the human's move immediately via an optimistic
  local update; the server's atomic human+AI response then replaces
  the state when it arrives. Errors roll back via GET /api/games/<id>.

Also refines the palette toward a warm wood-and-stone aesthetic: warmer
beiges, walnut walls, inset pawn shading, board border with depth.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous commit added && \!STATE.pending to the humanTurn check in
render(), intending to suppress clicks while a network request was in
flight. But the post-AI render happens before setPending(false) runs
(it's in the finally block), so STATE.pending is still true at that
point and no click handlers got attached -- the board became inert.

Drop the guard. sendMove() already short-circuits on STATE.pending, so
gating handler attachment is redundant.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolves three coupled UI problems:

1. Overlapping wall highlights. The right half of one horizontal wall
   slot is the left half of the next, so hovering it used to fire
   mouseenter listeners on both walls' groups -- two walls would light
   up. Now each wall is interactive only on its display top-left cell
   (left for H, top for V), so every grid cell triggers at most one
   wall.

2. Ambiguity at intersection posts. Posts were part of two walls'
   groups before, so hovering one lit up both H and V. With the
   anchor-only rule, posts are never any wall's anchor and trigger
   nothing on hover.

3. Human rendered at the top. The mirror condition was inverted vs
   the spec's intent: the canonical view in CSS has display row 0 at
   the top, so to put P0's home row at the bottom we have to vertical-
   flip the board when the human is P0. (P1 already starts at server
   row N-1, which is the display bottom without a flip.) Switched to a
   simple vertical flip -- cols stay put -- so the H wall's left half
   stays the left half in display space and the anchor-only rule reads
   naturally as 'extends right (H) or down (V)'.

Also drops the now-redundant brightness:hover race-cover style; with
anchor-only attachment there's no race.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The status panel was wired by server player index, so when the human
plays as P1 the 'You' row showed P0's wall count and vice versa.
Map by role (human vs ai) using human_player instead.

Also reword from 'walls left' to 'walls remaining'.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Bump the default --default-mcts-n CLI value from 400 to 1000 so the
  slider lands somewhere more capable on startup.
- Drop the (P1)/(P2) suffixes from the who-goes-first radio labels;
  they're noise in a strictly human-vs-AI app.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The lib was declared as both cdylib and rlib in Cargo.toml, which
triggers cargo issue #6313 (output filename collision) on every
cargo test invocation -- three warnings on each run, no errors, but
noise that crowds out anything else.

Drop cdylib from Cargo.toml and add a pyproject.toml that tells
maturin to use the pyo3 bindings; that backend implicitly compiles
with --crate-type cdylib for the wheel build, so plain 'maturin
build' still produces an importable .so without cdylib being a
default cargo output. Verified by building and importing the wheel.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Clippy lint (clippy::bool_then_in_filter_map).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jonbinney jonbinney requested a review from alejandromarcu June 4, 2026 16:22
@jonbinney jonbinney marked this pull request as ready for review June 4, 2026 16:22
@jonbinney

Copy link
Copy Markdown
Owner Author

Although this is a big change, it isn't as big as the diff seems. A couple thousand lines of changes are just the planning docs that claude made. Then there's all the web frontend and API stuff, which ..... isn't "real" code :-P

@jonbinney jonbinney merged commit 27bcf1c into main Jun 5, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants