Data-driven wrestling analytics platform — match history, career stats, and ML-powered match outcome predictions covering 40+ years of professional wrestling.
- Docker & Docker Compose
- Python 3.11+ (for local development)
docker compose up -d db redisThis starts PostgreSQL 16 and Redis 7. The schema is auto-applied on first run via schema.sql.
docker compose run --rm seedLoads wrestlers_roster_2026.csv (354 wrestlers) into the promotions and wrestlers tables.
# Scrape WWE + AEW data for 2020-2026
docker compose run --rm scraper
# Or run locally with options
python -m scraper --promotions WWE AEW --year-start 2000 --year-end 2026Scraped data is cached locally and output as JSON in ./output/.
# Load scraped JSON into the database
docker compose run --rm etl
# Or run locally
python -m etl --input-dir ./outputAfter loading, rolling stats are automatically recomputed for all affected wrestlers.
├── schema.sql # PostgreSQL DDL (10 tables, ENUMs, indexes)
├── seed.py # CSV → database loader
├── docker-compose.yml # PostgreSQL 16, Redis 7, job containers
├── Dockerfile.python # Python 3.11 image for scraper/ETL
├── requirements.txt # Python dependencies
├── scraper/ # Cagematch.net scraper
│ ├── cagematch.py # Main scraper orchestrator
│ ├── parser.py # HTML parsing and data extraction
│ ├── http_client.py # Rate-limited HTTP client with caching
│ ├── config.py # Scraper configuration
│ └── cli.py # CLI entry point
├── etl/ # Extract-Transform-Load pipeline
│ ├── load.py # Database loader with upserts
│ ├── entity_resolution.py# Fuzzy wrestler name matching
│ ├── stats.py # Rolling stats computation
│ └── cli.py # CLI entry point
├── wrestlers_roster_2026.csv
├── PLAN.md # Full engineering plan (4 phases)
└── .env.example # Environment variable template
PostgreSQL 16 with 10 tables:
| Table | Description |
|---|---|
promotions |
WWE, AEW, WCW, ECW, TNA, NXT |
wrestlers |
Master registry with full-text search |
wrestler_aliases |
Name variations for entity resolution |
events |
PPVs, TV episodes, house shows |
matches |
Match details with type, duration, rating |
match_participants |
Who was in each match and their result |
titles |
Championship registry |
title_reigns |
Championship reign history |
wrestler_stats_rolling |
Pre-computed ML features |
predictions |
Cached ML prediction results |
| Variable | Default | Description |
|---|---|---|
DATABASE_URL |
postgresql://ringside:ringside@localhost:5432/ringside |
PostgreSQL connection |
REDIS_URL |
redis://localhost:6379 |
Redis connection |
SCRAPER_RATE_LIMIT |
1.0 |
Seconds between HTTP requests |
SCRAPER_CACHE_DIR |
./cache |
Local HTML cache directory |
python -m scraper \
--promotions WWE AEW WCW ECW TNA NXT \
--year-start 1980 \
--year-end 2026 \
--rate-limit 1.0 \
--output-dir ./output \
--cache-dir ./cache# Load all JSON files from output directory
python -m etl --input-dir ./output
# Load a single file
python -m etl --file ./output/wwe_2024.json
# Recompute stats only (no data load)
python -m etl --stats-onlySee PLAN.md for the full 4-phase engineering plan:
- Data Foundation (current) — Schema, scraper, ETL, Docker
- API & Search — Express REST API, Next.js frontend
- ML Predictions — XGBoost model, FastAPI service
- Visualizations — Recharts components, nightly refresh, monitoring