Description: Implement the Monte Carlo Tree Search (MCTS) algorithm in adversarial/mcts.py. Unlike Minimax, which uses a fixed evaluation function, MCTS estimates the value of a move by running random "rollouts" (simulations) to the end of the game. It uses the UCT (Upper Confidence Bound for Trees) formula to balance exploring new moves vs. exploiting moves that have already shown success.
Objectives:
Implement the 4-step MCTS cycle: Selection, Expansion, Simulation, and Backpropagation.
Implement the UCT formula to guide the search.
Support an "Anytime" search: The algorithm should be able to stop at any time and return the best move found so far based on visit counts.
Compare MCTS performance against Minimax for a 3x3 grid (Tic-Tac-Toe) and potentially a larger 4x4 grid.
Tasks:
[ ] Implement adversarial/mcts.py:
MCTSNode: A specialized node class that tracks visit_count and total_reward.
select(node): Use the UCT formula to navigate to a leaf.
expand(node): Add a new child for an untried action.
simulate(state): Play a random game until a terminal state is reached.
backpropagate(node, result): Update statistics from the leaf up to the root.
[ ] Implement the UCT Formula:
UCT=niwi+CnilnNi
(wi = wins, ni = node visits, Ni = parent visits, C = exploration constant).
[ ] Testing & Benchmarking:
Verify that MCTS converges to the same "best move" as Minimax given enough simulations.
Document the memory usage vs. Minimax in CONCEPTS.md.
Acceptance Criteria:
[ ] The MCTS agent can play a legal and competitive game of Tic-Tac-Toe.
[ ] The simulate function correctly reaches terminal states and returns the game outcome.
[ ] The "Anytime" property is verified (the AI returns a move even if given only 100 simulations).
Dependencies:
Issue #2 (Core Architecture)
Issue #6 (AdversarialGame Protocol)
Description: Implement the Monte Carlo Tree Search (MCTS) algorithm in adversarial/mcts.py. Unlike Minimax, which uses a fixed evaluation function, MCTS estimates the value of a move by running random "rollouts" (simulations) to the end of the game. It uses the UCT (Upper Confidence Bound for Trees) formula to balance exploring new moves vs. exploiting moves that have already shown success.
Objectives:
Tasks:
Acceptance Criteria:
Dependencies:
Issue #2 (Core Architecture)
Issue #6 (AdversarialGame Protocol)