An open-source reinforcement learning project training AI agents to master multi-turn domino strategies using self-play learning and a persistent game history database.
Build the most competitive dominos game AI by leveraging:
- Multi-Agent Reinforcement Learning (MARL) - Independent agents learning optimal strategies
- Self-Play Training - Agents improving by playing against themselves
- Game History Database - Persistent storage of game states/actions for offline learning
- CUDA Optimization - GPU-accelerated training on 9GB VRAM (RTX 2070/3070/etc)
✅ Dominoes Game Engine - Full implementation of Block/Draw dominoes rules
✅ Multi-Agent Agents - DQN, PPO, and custom policy gradient implementations
✅ Self-Play Arena - Tournament system for agent evaluation
✅ Game Database - SQLite persistent storage of game histories
✅ CUDA Support - Mixed-precision training, gradient accumulation, batch optimization
✅ Open Source - Leverages Stable Baselines3, PyTorch, Gymnasium
# Clone the repository
git clone https://github.com/o7adam/dominos-ai.git
cd dominos-ai
# Create virtual environment
python -m venv venv
source venv/bin/activate # or `venv\Scripts\activate` on Windows
# Install dependencies
pip install -r requirements.txtpython -m src.training.trainer --config config/config.yaml --cudapython -m src.training.self_play --num-episodes 1000 --save-gamesjupyter notebook notebooks/analysis.ipynbdominos-ai/
├── README.md # Project overview and quick start
├── requirements.txt # Python dependencies (PyTorch, etc)
├── setup.py # Package installation
├── .gitignore # Git ignore patterns
├── ARCHITECTURE.md # Technical design decisions
├── STRATEGIES.md # Documented domino strategies
│
├── config/
│ ├── config.yaml # Training hyperparameters (batch_size, lr, etc)
│ ├── model_config.json # Neural network architecture
│ └── cuda_config.py # CUDA/GPU memory settings
│
├── src/
│ ├── __init__.py
│ │
│ ├── game/ # 🎮 GAME ENGINE
│ │ ├── __init__.py
│ │ ├── dominoes.py # Core game logic, move validation
│ │ ├── rules.py # Block/Draw rules, scoring
│ │ └── state.py # Game state representation, observation space
│ │
│ ├── agents/ # 🤖 RL AGENTS
│ │ ├── __init__.py
│ │ ├── base_agent.py # Abstract agent interface
│ │ ├── dqn_agent.py # Deep Q-Network implementation
│ │ ├── ppo_agent.py # Proximal Policy Optimization
│ │ └── network.py # PyTorch neural network models
│ │
│ ├── training/ # 📚 TRAINING LOOP
│ │ ├── __init__.py
│ │ ├── trainer.py # Main training logic, loss calculation
│ │ ├── self_play.py # Self-play tournament, league play
│ │ ├── replay_buffer.py # Experience replay for DQN
│ │ └── curriculum.py # Progressive difficulty (optional)
│ │
│ └── utils/ # 🛠️ UTILITIES
│ ├── __init__.py
│ ├── database.py # SQLite game history storage
│ ├── logger.py # Training/evaluation logging
│ └── metrics.py # Win rates, avg score, strategy metrics
│
├── data/
│ ├── games.db # SQLite database (auto-created)
│ └── checkpoints/ # Model weights (auto-created)
│
├── notebooks/
│ ├── analysis.ipynb # Jupyter: Game analysis & visualization
│ └── training_progress.ipynb # Jupyter: Training curves & metrics
│
├── tests/
│ ├── __init__.py
│ ├── test_game.py # Unit tests for game engine
│ └── test_agents.py # Unit tests for agent logic
│
└── docs/
├── DOMINOES_RULES.md # Complete dominoes rules
└── ALGORITHM_GUIDE.md # MARL algorithms explained
| Component | Technology | Purpose |
|---|---|---|
| Framework | PyTorch | Neural network training |
| RL Library | Stable Baselines3 | Pre-built RL algorithms |
| Environment | Gymnasium | Standard RL environment interface |
| Multi-Agent | PettingZoo | Multi-agent game environments |
| Database | SQLite + SQLAlchemy | Game history storage |
| GPU | CUDA (9GB optimized) | Mixed-precision, batch tuning |
| Logging | Weights & Biases | Experiment tracking |
- Tile counting and information hiding
- Minimax evaluation of board positions
- Nash equilibrium concepts for competitive play
- Agents learn to predict opponent tile distribution
- Blocking strategies (forcing passes)
- End-game optimization (minimizing pip count)
- State Space: Board layout, hand composition, game history
- Action Space: Valid domino placements
- Reward Signal:
- +1 for winning
- -1 for losing
- Intermediate rewards for advantageous positions
- Train against fixed random policy
- Train against previous checkpoint (week-old weights)
- Tournament-style league play (current vs. historical versions)
- Hard opponent mining (select hardest opponents)
- Training Time: 24-48 hours on RTX 2070 for competitive agent
- Win Rate: 65%+ vs. random baseline, 55%+ vs. human players
- Database Size: 100k-1M games (~500MB SQLite)
✅ Mixed Precision Training - FP16 reduces memory 50%
✅ Gradient Accumulation - Effective batch size 512 with BS=32
✅ Dynamic Memory Allocation - Clear cache between episodes
✅ Model Pruning - Efficient architectures (64→128→64 layers)
# Example: Memory-optimized training
from torch.cuda.amp import autocast, GradScaler
scaler = GradScaler()
for state, action, reward in replay_buffer:
with autocast(): # FP16
loss = compute_loss(state, action, reward)
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()
torch.cuda.empty_cache()Setup: Each player draws 7 dominoes. Highest double leads.
Gameplay:
- Match domino ends to board ends (e.g., [6|4] to [4|3])
- If you can't play, pass (in Block) or draw (in Draw)
- Round ends when someone plays all tiles or game blocks
Scoring:
- Player who goes out wins opponent pip counts
- If blocked, lowest pip count wins
Strategy:
- Control the Board → Lead toward your suit
- Track Tiles → Know what's played, deduce hands
- Block Early → Force passes when ahead
- End Strong → Save low tiles for finish
┌─────────────────────┐
│ Initialize Agents │ (Random or pretrained weights)
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ Self-Play Episodes │ (N agents play M games)
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ Store Game History │ (SQLite database)
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ Sample Experiences │ (Replay buffer batches)
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ Update Policy/Value │ (Gradient descent, CUDA)
└──────────┬──────────┘
│
▼
┌─────────────────────┐
│ Evaluate vs Baselines│ (Win rate, checkpoint)
└──────────┬──────────┘
│
┌────▼────┐
│ Converged?
└────┬─────┘
No│
│
└──────────► (Loop back to Self-Play)
Game → Database → Learning
-- Games table schema
CREATE TABLE games (
game_id INTEGER PRIMARY KEY,
timestamp DATETIME,
agents TEXT, -- "agent_v1 vs agent_v2"
winner TEXT, -- "agent_v1"
total_turns INT,
final_scores JSON
);
-- States table (game positions)
CREATE TABLE states (
state_id INTEGER PRIMARY KEY,
game_id INTEGER,
turn INT,
agent_name TEXT,
board_state BLOB, -- Encoded board layout
hand BLOB, -- Agent's tiles
action INT, -- Tile index played
reward FLOAT, -- Immediate reward
next_state BLOB,
done BOOLEAN
);- Win Rate vs. each baseline
- Average Score (pips when losing, opponent sum when winning)
- Move Validity (% of legal moves)
- Training Loss (DQN/Policy gradient)
- Episode Length (turns per game)
- GPU Memory Usage (monitoring VRAM)
Contributions welcome! Areas:
- New RL algorithms (QMIX, MADDPG, AlphaZero variants)
- Dominoes variants (Mexican Train, All Fives, etc.)
- Opponent agents (minimax, Monte Carlo tree search)
- Web UI for game visualization
- Performance optimizations
- Deep Reinforcement Learning Hands-On
- Stable Baselines3
- Mastering the game of Go (AlphaGo)
- AlphaZero: Mastering Chess, Shogi, and Go
- Domino Strategy Guide
MIT License - See LICENSE file
o7adam - @o7adam
Status: 🚧 In Development
Last Updated: May 2026
Stars: ⭐ (Coming soon!)