Quoridor web player#371
Merged
Merged
Conversation
Rust self-play exposes no visibility into MCTS behavior during a run. Spec adds per-model-version diagnostics (terminal/truncation sim fraction, tree depth, root-visit spread, nodes/branching, unique games) written by Rust to JSON records and logged to W&B by a new Python process in the same run group, reset on each model update.
Seven-task TDD plan: SearchStats from search(), per-game GameMetrics plus move hashing, a SelfPlayAccumulator that flushes per-version JSON records, wiring into the continuous loop's reload/shutdown points, and a Python aggregator + W&B logger process spawned by train_v2.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
A local web app for playing Quoridor against the project's Rust AlphaZero agent. One rust binary using tiny_http (no framework) serves embedded HTML/CSS/JS plus a small JSON API; each browser session gets its own in-memory game vs the AI. The user picks the ONNX model from <play-dir>/models/*.onnx (board config inferred from a sibling config.yaml), the mcts_n slider, and who plays first.
Nine-task TDD plan: scaffold the play_server module with pure StateView and action enrichment, then config loader, GameSession + registry, JSON handlers, the tiny_http bin with embedded static assets, an end-to-end test against the B5W2 fixture, and the vanilla HTML/CSS/JS frontend.
Scaffolds play_server module with state.rs containing EnrichedAction, WallOrientation, StateView, WallEntry types and enrich_action / enrich_legal_actions helpers. ACTION constants differ from plan assumptions: WALL_VERTICAL=0, WALL_HORIZONTAL=1, MOVE=2 — match arms updated accordingly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment said move_history has at least 2 entries after the human+AI atomic round, but the assertion was >= 1 which would pass even if the AI step silently did not fire. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Vanilla HTML/CSS/JS implementing the spec: - (2N-1)x(2N-1) CSS grid with alternating pawn-cell, wall-slot, and post tracks. - Click handlers attached to legal-action targets keyed by the bare action index (no client-side encoding logic). - Mirrors coordinates 180 deg when human_player == 1 so the human's home row always sits at the bottom of the rendered board. - Side panel with new-game form (model select, mcts_n slider, who- goes-first toggle), turn indicator with AI-thinking spinner, walls remaining per player, step counter, and a game-over banner. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Silences a longstanding unused_imports warning -- the trait is only referenced inside #[cfg(test)] via new_with_evaluator. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four issues addressed: - Empty wall slots now use a mid-tone groove color (--slot) that is clearly distinct from both pawn cells (--cell, lighter) and placed walls (--wall, dark walnut). Previously empty slots and placed walls were both dark, which made a placed vertical wall look like a cross at the post intersection. - Hovering a legal wall now highlights the entire wall (both halves + the post between them) by tagging the three cells with a shared data-wall-group and synchronizing mouseenter/leave on all of them. - Removed the cross artifact on placed vertical walls -- a side effect of the slot-color change above. - The UI now reflects the human's move immediately via an optimistic local update; the server's atomic human+AI response then replaces the state when it arrives. Errors roll back via GET /api/games/<id>. Also refines the palette toward a warm wood-and-stone aesthetic: warmer beiges, walnut walls, inset pawn shading, board border with depth. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous commit added && \!STATE.pending to the humanTurn check in render(), intending to suppress clicks while a network request was in flight. But the post-AI render happens before setPending(false) runs (it's in the finally block), so STATE.pending is still true at that point and no click handlers got attached -- the board became inert. Drop the guard. sendMove() already short-circuits on STATE.pending, so gating handler attachment is redundant. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Resolves three coupled UI problems: 1. Overlapping wall highlights. The right half of one horizontal wall slot is the left half of the next, so hovering it used to fire mouseenter listeners on both walls' groups -- two walls would light up. Now each wall is interactive only on its display top-left cell (left for H, top for V), so every grid cell triggers at most one wall. 2. Ambiguity at intersection posts. Posts were part of two walls' groups before, so hovering one lit up both H and V. With the anchor-only rule, posts are never any wall's anchor and trigger nothing on hover. 3. Human rendered at the top. The mirror condition was inverted vs the spec's intent: the canonical view in CSS has display row 0 at the top, so to put P0's home row at the bottom we have to vertical- flip the board when the human is P0. (P1 already starts at server row N-1, which is the display bottom without a flip.) Switched to a simple vertical flip -- cols stay put -- so the H wall's left half stays the left half in display space and the anchor-only rule reads naturally as 'extends right (H) or down (V)'. Also drops the now-redundant brightness:hover race-cover style; with anchor-only attachment there's no race. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The status panel was wired by server player index, so when the human plays as P1 the 'You' row showed P0's wall count and vice versa. Map by role (human vs ai) using human_player instead. Also reword from 'walls left' to 'walls remaining'. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Bump the default --default-mcts-n CLI value from 400 to 1000 so the slider lands somewhere more capable on startup. - Drop the (P1)/(P2) suffixes from the who-goes-first radio labels; they're noise in a strictly human-vs-AI app. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The lib was declared as both cdylib and rlib in Cargo.toml, which triggers cargo issue #6313 (output filename collision) on every cargo test invocation -- three warnings on each run, no errors, but noise that crowds out anything else. Drop cdylib from Cargo.toml and add a pyproject.toml that tells maturin to use the pyo3 bindings; that backend implicitly compiles with --crate-type cdylib for the wheel build, so plain 'maturin build' still produces an importable .so without cdylib being a default cargo output. Verified by building and importing the wheel. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Clippy lint (clippy::bool_then_in_filter_map). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Owner
Author
|
Although this is a big change, it isn't as big as the diff seems. A couple thousand lines of changes are just the planning docs that claude made. Then there's all the web frontend and API stuff, which ..... isn't "real" code :-P |
alejandromarcu
approved these changes
Jun 4, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR includes two unrelated changes, since I am lazy: W&B metrics for rust self play and a web app for playing against the rust alphazero implementation.
W&B Metrics for rust self-play
To avoid complicating the rust code too much, the rust self-play just logs metrics to files in the run directory being used. A separate python script, started by train_v2.py, reads those files and uploads the metrics to W&B. So far the metrics are all about the MCTS search. They provide some insight into how many paths in the search tree are being explored, how many reach terminal states, the entropy of the distribution at the root node, etc.
Web-app (and server) for playing against rust implementation of alphazero
Mostly so we can easily try playing against new models without having to compile and run everything locally. One rust binary serves a simple JS+html page and acts as an HTTP API server. Not much intelligence in the frontend. Lets the user choose between any model that is in a particular directory. Each session gets its own state and can have its own game against the AI.