Skip to content

Use compact representation for rust selfplay#369

Merged
jonbinney merged 6 commits into
mainfrom
jdb/rust-selfplay-use-compact-repr
May 18, 2026
Merged

Use compact representation for rust selfplay#369
jonbinney merged 6 commits into
mainfrom
jdb/rust-selfplay-use-compact-repr

Conversation

@jonbinney

@jonbinney jonbinney commented May 17, 2026

Copy link
Copy Markdown
Owner

I made this change in preparation for adding parallelism and caching to the rust selfplay. The compact states will save a lot of space. In this PR, I also updated the compact representation to handle the full B9W10 case, which takes 24 bytes instead of just 8 for B5W3. This slows down the the generation of the policy database a bit, but it isn't terrible. And the resulting policy database parquet files for B5W2 are actually about the same size, I assume because of compression.

There are a ton of changed lines in this PR for three main reasons:

  • the game state type shows up in lots of places, so lots of code gets touched
  • some things like board rotation and NN feature generation is now implemented for both grid and compact representations
  • more tests

I've tested that B5W2 training still works with this PR, and that the policy database stuff still works. I've also uploaded new policy databases to W&B for B5W2 and B5W1.

jonbinney and others added 6 commits May 17, 2026 19:21
On 0.65 I'm getting compilation errors for our code.
Needed for rust self-play code.
ActionSelector, Evaluator, MCTS nodes, and game_runner now operate on
(u64 data, &QGameMechanics) instead of cloning GameState. MCTS nodes
store their data eagerly at creation, removing the lazy
get_or_create_game caching path. Adds compact_state_to_resnet_input and
rotate_compact_state mirroring the existing GameState equivalents, plus
get_action_mask / apply_action_index / is_game_over / winner helpers on
QGameMechanics. game_runner only materializes a GameState for observer
callbacks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
QGameMechanics owns goal_rows and does not flip them under rotation, so
get_action_mask_immut on rotated data treats walls that block the
rotated player's path as legal. Use remap_mask on the original mask
instead, and add a test that pins this contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The single-u64 packed state could not hold a 9x9 board (128 wall bits
alone exceed u64). Split the layout so the wall bitmap lives in a u128
and the scalar fields in a u64; every accessor stays within one
primitive. Policy DB schema switches to FixedSizeBinary(24) with lex
byte ordering; PyO3 functions accept/return state as 24-byte bytes
buffers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jonbinney jonbinney marked this pull request as ready for review May 17, 2026 23:48

@alejandromarcu alejandromarcu left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't really review the code but what you're doing LGTM

@jonbinney jonbinney merged commit d5fcb3a into main May 18, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants