Find the minimum set of baseball players needed to cover all possible MLB franchise pair combinations for Immaculate Grid gameplay.
Given 30 current MLB franchises, find the smallest set of players such that for every pair of franchises, at least one player in the set played for both teams.
- Total franchise pairs: C(30,2) = 435 combinations
- Constraint: Each pair must have ≥1 player who played for both teams
- Goal: Minimize the number of players needed
- Approach: This is a Set Cover Problem (NP-complete), solved using:
- Greedy heuristic - Fast approximation (near-optimal)
- Exact optimization - Integer Linear Programming (provably optimal)
Historical teams are mapped to their current franchises:
- Brooklyn Dodgers → Los Angeles Dodgers
- Montreal Expos → Washington Nationals
- Seattle Pilots → Milwaukee Brewers
- etc.
- Python 3.11+ with pandas, numpy
- PuLP - Integer Linear Programming for exact solver
- SQLite - Persistent storage for results
- FastAPI + Uvicorn - Async web API with automatic OpenAPI docs
- Bootstrap 5 + Chart.js - Frontend visualization
# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install dependencies
uv pip install -r requirements.txt
# Install dev dependencies (for testing)
uv pip install -r requirements-dev.txtpython scripts/download_data.pyDownloads the Sean Lahman Baseball Database (2025 edition) into the data/ directory.
# Populate database with all players, pairs, and solutions
python scripts/populate_database.py
# Skip the exact solver (faster, only runs greedy)
python scripts/populate_database.py --skip-exact
# Custom database path
python scripts/populate_database.py --db-path custom.dbThis script:
- Loads all 19,947 players from the Lahman database
- Creates 435 franchise pairs
- Populates player coverage data (58,163 entries)
- Runs greedy solver (~0.1 seconds)
- Runs exact ILP solver (~9 minutes, can be skipped)
- Saves results to SQLite database
# Compare greedy vs exact solver performance
python scripts/compare_solvers.pyShows:
- Solution sizes (greedy: 21 players, exact: 19 players)
- Runtime comparison
- Approximation ratio (1.1053)
- Player overlap analysis
Find the minimum players who ALL played for a specific franchise:
# Find MIN-constrained solution (includes Washington Senators history)
python scripts/solve_for_franchise.py MIN
# Run only greedy solver (faster)
python scripts/solve_for_franchise.py NYY --greedy
# Save results to markdown file
python scripts/solve_for_franchise.py MIN --output results/min_solution.md
# Custom time limit for exact solver (default: 600s)
python scripts/solve_for_franchise.py LAD --time-limit 300This finds players who played for the target franchise AND enough other teams to cover all 435 pairs. Some pairs may be uncoverable if no player from that franchise played for both teams in the pair.
Example: MIN-constrained solution requires 34 players (optimal) covering 434/435 pairs. The FLA-STL pair is uncoverable because no Twins/Senators player ever played for both Florida and St. Louis.
# Start FastAPI server
uvicorn web.api:app --reload --port 8000
# Access the app at:
# - Frontend: http://localhost:8000
# - API docs: http://localhost:8000/docs (automatic Swagger UI)Web Interface Features:
- Solutions comparison table (greedy vs exact)
- Interactive player list for each solution
- 30×30 coverage heatmap showing player counts per franchise pair
- Player search and lookup
- Franchise pair lookup (find all players who played for two teams)
# Run all tests
pytest
# Run with coverage
pytest --cov=src --cov-report=html
# Run specific test file
pytest tests/test_api.py -vminmaculate-grid/
├── src/ # Core Python modules
├── tests/ # Unit and integration tests
├── web/ # FastAPI application and frontend
├── scripts/ # CLI scripts
├── data/ # Lahman database CSVs (not in git)
└── results/ # Solution outputs
Players needed: 19 ✅ PROVEN OPTIMAL Runtime: ~9.3 minutes (557 seconds) Coverage: 435/435 pairs (100%)
The exact Integer Linear Programming solver found the provably optimal minimum solution:
Complete Optimal Player List:
- Chase Anderson
- Ken Brett
- Lew Burdette
- Bruce Chen
- Dennis Cook
- Jose Cruz
- Octavio Dotel
- Jose Guillen
- Billy Hamilton
- LaTroy Hawkins
- Rich Hill
- Edwin Jackson
- Mike Morgan
- Lance Parrish
- Edgar Renteria
- Fernando Rodney
- Matt Stairs
- Anthony Swarzak
- Ron Villone
Memorize these 19 players and you'll be able to answer every possible two-team combination in Immaculate Grid!
Players needed: 21 Runtime: 0.10 seconds Coverage: 435/435 pairs (100%)
The greedy algorithm provides a fast approximation requiring 21 players:
Top Players in Greedy Solution:
- Edwin Jackson - 14 franchises, 91 pairs (21% of all pairs!)
- Rich Hill - 13 franchises, 78 pairs
- Ron Villone - 12 franchises, 66 pairs
- Bruce Chen - 11 franchises, 55 pairs
- Octavio Dotel - 13 franchises, 78 pairs
- Royce Clayton - 11 franchises, 55 pairs
- Terry Mulholland - 11 franchises, 55 pairs
- LaTroy Hawkins - 11 franchises, 55 pairs
- Matt Stairs - 12 franchises, 66 pairs
- Jose Guillen - 10 franchises, 45 pairs
(See answers.md for complete lists with franchise details)
| Metric | Greedy | Exact (Optimal) |
|---|---|---|
| Solution Size | 21 players | 19 players 🏆 |
| Runtime | 0.10 seconds | 557.82 seconds (~9.3 min) |
| Approximation Ratio | 1.1053 (11% over optimal) | 1.0000 (proven optimal) |
| Coverage | 435/435 pairs (100%) | 435/435 pairs (100%) |
Key Insights:
- The exact solver found a solution 2 players smaller than greedy (9.5% improvement)
- Greedy is 5,578x faster but slightly suboptimal
- 10 players appear in both solutions (core journeymen like Edwin Jackson, Rich Hill, etc.)
- For memorization purposes, the difference between 19 and 21 players is minimal
- Mathematical certainty: 19 is the absolute minimum - no solution with 18 or fewer players exists
This project uses the Sean Lahman Baseball Database.
Copyright: 1996-2025 by Sean Lahman (sean@lahman.com) License: Creative Commons Attribution-ShareAlike 3.0 Unported License Source: https://sabr.org/lahman-database/
We are deeply grateful to Sean Lahman for making this comprehensive baseball dataset freely available to the public.
MIT License (for our code only; data is CC BY-SA 3.0 as noted above)