A competitive multi-agent reinforcement learning environment implementing state-of-the-art Deep Q-Learning algorithms.
Bluessy Redish Golden Apple
Competitive Snake Environment
Real-time training visualization showing competitive gameplay between AI agents
- Input: 25-dimensional state vector
- Architecture: 256 → 128 → 4 neurons
- Activation: ReLU (hidden), Linear (output)
- Optimizer: Adam (lr=0.001)
# Reduces overestimation bias
target_q_values = target_network(next_states).gather(1, next_actions.unsqueeze(1))
targets = rewards + (gamma * target_q_values * (1 - dones))# Separate value and advantage streams
value_stream = self.value_stream(features)
advantage_stream = self.advantage_stream(features)
q_values = value_stream + (advantage_stream - advantage_stream.mean(dim=1, keepdim=True))- Buffer Size: 50,000 experiences
- Alpha: 0.6 (prioritization strength)
- Beta: 0.4 (importance sampling)
- Sampling: TD-error based priority
| Index | Feature | Description | Range |
|---|---|---|---|
| 0-3 | Snake 1 direction | One-hot encoding | [0,1] |
| 4-7 | Snake 2 direction | One-hot encoding | [0,1] |
| 8-11 | Food direction | One-hot encoding | [0,1] |
| 12-15 | Wall proximity | One-hot encoding | [0,1] |
| 16-19 | Snake 1 body proximity | One-hot encoding | [0,1] |
| 20-23 | Snake 2 body proximity | One-hot encoding | [0,1] |
| 24 | Snake 1 length | Normalized | [0,1] |
- Food Consumption:
10 + length_bonus + speed_bonus + competitive_bonus - Survival:
0.1 * (1 - frame_iteration/1000) - Collision:
-10
- Proximity to Food:
exp(-distance_to_food/10) * 2 - Competitive Advantage:
2 if closer_to_food_than_opponent else 0 - Efficiency:
length * 0.1 / max(frame_iteration, 1)
LEARNING_RATE = 0.001
GAMMA = 0.99
EPSILON_START = 1.0
EPSILON_END = 0.01
EPSILON_DECAY = 0.995
BATCH_SIZE = 64
TARGET_UPDATE_FREQUENCY = 1000
MEMORY_CAPACITY = 50000# Clone repository
git clone <repository-url>
cd Snake-Apple
# Install dependencies
pip install torch pygame numpy matplotlib
# Run training
python train.pypython train.pypython test_enhanced_snake.py- SPACE: Pause/Resume
- H: Toggle help
- +/-: Speed control
- R: Reset speed
- ESC: Exit
Snake-Apple/
├── model.py # DQN architecture
├── snake_env.py # Game environment
├── train.py # Training loop
├── logger.py # Metrics tracking
├── config.py # Configuration
├── game_assets.py # Asset loading
├── test_enhanced_snake.py # Test suite
├── requirements.txt # Dependencies
├── models/ # Checkpoints
└── logs/ # Training logs
- Loss tracking with gradient clipping
- Q-value monitoring per agent
- Epsilon decay visualization
- Win rate analysis
- Game length distribution
- Convergence: 2000-5000 games
- Peak Score: 15-25 average
- Win Rate: 60-70%
- Training Time: 2-4 hours (CPU), 30-60 min (GPU)
- Simultaneous learning
- Competitive dynamics
- Adaptive opponents
- Checkpoint system
- Resume training
- Version control
- Live metrics
- Performance plots
- Interactive controls
- CUDA OOM: Reduce batch size
- Slow Training: Enable GPU
- Poor Performance: Tune hyperparameters
- Memory Leaks: Check buffer size
- GPU acceleration (3-5x speedup)
- Batch processing
- Memory management
- Parallel training
- Multi-agent RL dynamics
- Algorithm comparison
- Strategic gameplay analysis
- Competitive learning
MIT License
@software{snake_rl_2024,
title={Snake's & The Golden Apple: Advanced Multi-Agent RL},
author={[Your Name]},
year={2024}
}