Skip to content

o7adam/dominos-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎲 Dominos AI: Learning to Play Competitive Dominoes

An open-source reinforcement learning project training AI agents to master multi-turn domino strategies using self-play learning and a persistent game history database.

🎯 Project Vision

Build the most competitive dominos game AI by leveraging:

  • Multi-Agent Reinforcement Learning (MARL) - Independent agents learning optimal strategies
  • Self-Play Training - Agents improving by playing against themselves
  • Game History Database - Persistent storage of game states/actions for offline learning
  • CUDA Optimization - GPU-accelerated training on 9GB VRAM (RTX 2070/3070/etc)

✨ Key Features

Dominoes Game Engine - Full implementation of Block/Draw dominoes rules
Multi-Agent Agents - DQN, PPO, and custom policy gradient implementations
Self-Play Arena - Tournament system for agent evaluation
Game Database - SQLite persistent storage of game histories
CUDA Support - Mixed-precision training, gradient accumulation, batch optimization
Open Source - Leverages Stable Baselines3, PyTorch, Gymnasium

🚀 Quick Start

Installation

# Clone the repository
git clone https://github.com/o7adam/dominos-ai.git
cd dominos-ai

# Create virtual environment
python -m venv venv
source venv/bin/activate  # or `venv\Scripts\activate` on Windows

# Install dependencies
pip install -r requirements.txt

Train an Agent

python -m src.training.trainer --config config/config.yaml --cuda

Run Self-Play Tournament

python -m src.training.self_play --num-episodes 1000 --save-games

Analyze Game History

jupyter notebook notebooks/analysis.ipynb

📁 Project Structure

dominos-ai/
├── README.md                    # Project overview and quick start
├── requirements.txt             # Python dependencies (PyTorch, etc)
├── setup.py                     # Package installation
├── .gitignore                   # Git ignore patterns
├── ARCHITECTURE.md              # Technical design decisions
├── STRATEGIES.md                # Documented domino strategies
│
├── config/
│   ├── config.yaml             # Training hyperparameters (batch_size, lr, etc)
│   ├── model_config.json       # Neural network architecture
│   └── cuda_config.py          # CUDA/GPU memory settings
│
├── src/
│   ├── __init__.py
│   │
│   ├── game/                    # 🎮 GAME ENGINE
│   │   ├── __init__.py
│   │   ├── dominoes.py         # Core game logic, move validation
│   │   ├── rules.py            # Block/Draw rules, scoring
│   │   └── state.py            # Game state representation, observation space
│   │
│   ├── agents/                  # 🤖 RL AGENTS
│   │   ├── __init__.py
│   │   ├── base_agent.py       # Abstract agent interface
│   │   ├── dqn_agent.py        # Deep Q-Network implementation
│   │   ├── ppo_agent.py        # Proximal Policy Optimization
│   │   └── network.py          # PyTorch neural network models
│   │
│   ├── training/               # 📚 TRAINING LOOP
│   │   ├── __init__.py
│   │   ├── trainer.py          # Main training logic, loss calculation
│   │   ├── self_play.py        # Self-play tournament, league play
│   │   ├── replay_buffer.py    # Experience replay for DQN
│   │   └── curriculum.py       # Progressive difficulty (optional)
│   │
│   └── utils/                   # 🛠️ UTILITIES
│       ├── __init__.py
│       ├── database.py         # SQLite game history storage
│       ├── logger.py           # Training/evaluation logging
│       └── metrics.py          # Win rates, avg score, strategy metrics
│
├── data/
│   ├── games.db               # SQLite database (auto-created)
│   └── checkpoints/           # Model weights (auto-created)
│
├── notebooks/
│   ├── analysis.ipynb         # Jupyter: Game analysis & visualization
│   └── training_progress.ipynb # Jupyter: Training curves & metrics
│
├── tests/
│   ├── __init__.py
│   ├── test_game.py           # Unit tests for game engine
│   └── test_agents.py         # Unit tests for agent logic
│
└── docs/
    ├── DOMINOES_RULES.md      # Complete dominoes rules
    └── ALGORITHM_GUIDE.md     # MARL algorithms explained

🎓 Technical Stack

Component Technology Purpose
Framework PyTorch Neural network training
RL Library Stable Baselines3 Pre-built RL algorithms
Environment Gymnasium Standard RL environment interface
Multi-Agent PettingZoo Multi-agent game environments
Database SQLite + SQLAlchemy Game history storage
GPU CUDA (9GB optimized) Mixed-precision, batch tuning
Logging Weights & Biases Experiment tracking

🧠 Learning Strategies

1. Game Theory Foundation

  • Tile counting and information hiding
  • Minimax evaluation of board positions
  • Nash equilibrium concepts for competitive play

2. Multi-Turn Planning

  • Agents learn to predict opponent tile distribution
  • Blocking strategies (forcing passes)
  • End-game optimization (minimizing pip count)

3. Reinforcement Learning

  • State Space: Board layout, hand composition, game history
  • Action Space: Valid domino placements
  • Reward Signal:
    • +1 for winning
    • -1 for losing
    • Intermediate rewards for advantageous positions

4. Self-Play Curriculum

  1. Train against fixed random policy
  2. Train against previous checkpoint (week-old weights)
  3. Tournament-style league play (current vs. historical versions)
  4. Hard opponent mining (select hardest opponents)

📊 Expected Performance

  • Training Time: 24-48 hours on RTX 2070 for competitive agent
  • Win Rate: 65%+ vs. random baseline, 55%+ vs. human players
  • Database Size: 100k-1M games (~500MB SQLite)

🔧 CUDA Optimization (9GB VRAM)

Mixed Precision Training - FP16 reduces memory 50%
Gradient Accumulation - Effective batch size 512 with BS=32
Dynamic Memory Allocation - Clear cache between episodes
Model Pruning - Efficient architectures (64→128→64 layers)

# Example: Memory-optimized training
from torch.cuda.amp import autocast, GradScaler

scaler = GradScaler()
for state, action, reward in replay_buffer:
    with autocast():  # FP16
        loss = compute_loss(state, action, reward)
    scaler.scale(loss).backward()
    scaler.step(optimizer)
    scaler.update()
    torch.cuda.empty_cache()

📖 Dominoes Rules (Block Variant)

Setup: Each player draws 7 dominoes. Highest double leads.

Gameplay:

  • Match domino ends to board ends (e.g., [6|4] to [4|3])
  • If you can't play, pass (in Block) or draw (in Draw)
  • Round ends when someone plays all tiles or game blocks

Scoring:

  • Player who goes out wins opponent pip counts
  • If blocked, lowest pip count wins

Strategy:

  1. Control the Board → Lead toward your suit
  2. Track Tiles → Know what's played, deduce hands
  3. Block Early → Force passes when ahead
  4. End Strong → Save low tiles for finish

🎮 Training Workflow

┌─────────────────────┐
│ Initialize Agents   │ (Random or pretrained weights)
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│ Self-Play Episodes  │ (N agents play M games)
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│ Store Game History  │ (SQLite database)
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│ Sample Experiences  │ (Replay buffer batches)
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│ Update Policy/Value │ (Gradient descent, CUDA)
└──────────┬──────────┘
           │
           ▼
┌─────────────────────┐
│ Evaluate vs Baselines│ (Win rate, checkpoint)
└──────────┬──────────┘
           │
      ┌────▼────┐
      │ Converged?
      └────┬─────┘
         No│
           │
           └──────────► (Loop back to Self-Play)

🔗 Data Pipeline

Game → Database → Learning

-- Games table schema
CREATE TABLE games (
    game_id INTEGER PRIMARY KEY,
    timestamp DATETIME,
    agents TEXT,           -- "agent_v1 vs agent_v2"
    winner TEXT,           -- "agent_v1"
    total_turns INT,
    final_scores JSON
);

-- States table (game positions)
CREATE TABLE states (
    state_id INTEGER PRIMARY KEY,
    game_id INTEGER,
    turn INT,
    agent_name TEXT,
    board_state BLOB,      -- Encoded board layout
    hand BLOB,             -- Agent's tiles
    action INT,            -- Tile index played
    reward FLOAT,          -- Immediate reward
    next_state BLOB,
    done BOOLEAN
);

📈 Metrics Tracked

  • Win Rate vs. each baseline
  • Average Score (pips when losing, opponent sum when winning)
  • Move Validity (% of legal moves)
  • Training Loss (DQN/Policy gradient)
  • Episode Length (turns per game)
  • GPU Memory Usage (monitoring VRAM)

🤝 Contributing

Contributions welcome! Areas:

  • New RL algorithms (QMIX, MADDPG, AlphaZero variants)
  • Dominoes variants (Mexican Train, All Fives, etc.)
  • Opponent agents (minimax, Monte Carlo tree search)
  • Web UI for game visualization
  • Performance optimizations

📚 References

📄 License

MIT License - See LICENSE file

👤 Author

o7adam - @o7adam


Status: 🚧 In Development
Last Updated: May 2026
Stars: ⭐ (Coming soon!)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages