An end-to-end football analytics platform: tactical engine, machine-learning Expected Goals (xG), Expected Threat (xT) possession value, VAEP-style action valuation, pitch control from 360 freeze frames, packing, data-driven player role discovery, AI-generated tactical commentary, and a production-grade dark dashboard.
PressIQ ingests football event data and produces the kind of intelligence elite analytics departments rely on: every action valued in expected-goal units, players clustered by what they actually do rather than where they line up, and — when 360 freeze frames are available — pitch control and packing computed from real player positions.
It runs on real StatsBomb Open Data out of the box (Premier League 2015/16 by default) and supports a 360-enabled mode (UEFA Euro 2020/2024, FIFA World Cup 2022) that unlocks the full spatial-intelligence stack. A deterministic possession-chain simulator ships as a fully offline fallback so the entire stack is runnable with zero API keys, zero database setup.
| Module | What it adds | |
|---|---|---|
| ➕ | Expected Threat (xT) | Markov grid possession-value surface, trained on the dataset; values every pass and carry by the goal probability it lifts. |
| ➕ | Trained xG model | Gradient-boosted Expected Goals model trained on real StatsBomb shots (geometry + context + freeze-frame pressure), with calibration curve and StatsBomb-xG agreement. |
| ➕ | Possession Value (PV) | VAEP-style action valuation — every action scored in expected-goal units (progression / finishing / defending). |
| ➕ | StatsBomb 360 adapter | Optional freeze-frame layer: every visible player's position at every event, normalised into the attacking frame. |
| ➕ | Pitch Control | Soft-Voronoi team-control surface from 360 freeze frames, per-event local control and per-match aggregates. |
| ➕ | Packing | Opponents bypassed by each completed pass, including defenders-bypassed and per-player top packers. |
| ➕ | Role Galaxy | Unsupervised KMeans clustering of every outfield player into behavioural roles, projected onto a 2-D PCA scatter. |
- Key capabilities
- Architecture
- Tech stack
- Project structure
- Quick start
- Configuration
- Spatial intelligence — enabling 360 data
- Machine-learning models
- API reference
- How the engine works
- AI tactical commentary
- Testing
- Deployment
- Future work
| # | Module | What it does |
|---|---|---|
| 1 | Match Analysis Engine | Orchestrates every analysis module into one structured tactical report with auto-generated key findings. |
| 2 | Formation Detection | Infers average formation (4-3-3, 4-2-3-1, 3-5-2, 5-4-1, …) via positional K-means clustering with a confidence score. |
| 3 | Pressing Analysis | PPDA, block height, press classification (high press / mid block / low block), counter-pressing recoveries, defensive activity zones. |
| 4 | Build-up Analysis | Directness index, verticality, progressive passes/carries, pass-length mix, possession-chain profiling, build-up channel preference. |
| 5 | Half-space & Zonal Analysis | 18-zone (3 thirds × 6 channels) occupation grid, explicit left/right half-space metrics, wide vs central balance. |
| 6 | Passing Networks | Directed passing graph, involvement-scaled nodes, degree & betweenness centrality, key distributors. |
| 7 | Heatmaps & Event Visuals | Touch heatmaps, pressure maps, shot maps (xG-sized), progressive-action maps, formation maps — all server-rendered. |
| 8 | Player Analytics | Per-player progression, creativity, defensive output, composite influence score and inferred tactical role. |
| 9 | AI Tactical Commentary | Professional match reports and team identity profiles via a modular LLM pipeline (Groq / Anthropic / OpenAI) with an offline template fallback. |
| 10 | Tactical DNA System | Reduces a team's behaviour to a 6-axis style vector and classifies it into an archetype. |
| 11 | Match Momentum | Minute-by-minute momentum curve, cumulative xG timelines, key-moment detection. |
| 12 | Tactical Similarity | Pairwise similarity scoring between teams' DNA vectors. |
| 13 | Expected Threat (xT) | A Markov grid possession-value surface trained by value iteration; every pass and carry valued by the threat it adds. |
| 14 | Trained xG model | Gradient-boosted Expected Goals classifier with feature importance, calibration curve and a StatsBomb-xG agreement check. |
| 15 | Possession Value (PV) | VAEP-style action valuation: progression + finishing + defending, in expected-goal units, per player. |
| 16 | StatsBomb 360 integration | Freeze-frame layer powering the spatial-intelligence stack. |
| 17 | Pitch Control | Soft-Voronoi control surface, local control around the ball, aggregated per team. |
| 18 | Packing | Outfield opponents and defenders bypassed by each completed pass. |
| 19 | Player Role Galaxy | Unsupervised clustering of every outfield player into behavioural roles, with a PCA scatter. |
┌──────────────────────────────────────────────────────────────────────┐
│ React Frontend (Vite) │
│ Dashboard · Match · Threat · Value · Spatial · xG Model · Roles · DNA │
└───────────────────────────────┬──────────────────────────────────────┘
│ REST / JSON + PNG
┌───────────────────────────────▼──────────────────────────────────────┐
│ FastAPI Application │
│ routes · cached service layer · OpenAPI documentation │
├───────────┬────────────┬────────────┬───────────┬──────────┬─────────┤
│ Tactical │ Spatial │ Machine │ Visualiz. │ Data │ AI │
│ Engine │ Layer │ Learning │ Engine │ Repo. │ Pipeline│
│ │ │ │ │ │ │
│ formation │ pitch ctrl │ xG model │mplsoccer │statsbomb │groq / │
│ pressing │ packing │ roles │heatmaps │ + 360 │anthropic│
│ buildup │ 360 data │ KMeans/PCA │networks │generated │ /openai │
│ zones │ │ HistGBM │xT plots │database │template │
│ xT/PV/etc.│ │ │spatial │ │ │
└───────────┴────────────┴────────────┴───────────┴──────────┴─────────┘
The tactical engine and ML layer are pure: every function takes a Pandas DataFrame of events (and optionally a 360 freeze-frame dict / trained model artefact) and returns a JSON-serialisable dictionary. No knowledge of HTTP, the database or the data source — making the whole stack trivially testable and reusable.
Backend — Python · FastAPI · Pandas · NumPy · scikit-learn (HistGBM,
KMeans, PCA, calibration) · NetworkX · SQLAlchemy · Pydantic
Data — StatsBomb Open Data (events + 360 freeze frames) · synthetic
possession simulator
Visualization — mplsoccer · Matplotlib (server-rendered PNG)
AI — modular LLM integration (Groq gpt-oss-120b / Anthropic Claude /
OpenAI) with prompt engineering and a deterministic offline template fallback
Frontend — React · Vite · React Router · hand-built SVG charts (radar,
calibration, xT timeline, role galaxy)
Database — PostgreSQL (SQLite fallback) · Alembic-ready
Infra — Docker · docker-compose · GitHub Actions CI
PressIQ/
├── backend/
│ ├── app/
│ │ ├── core/ # pitch geometry, tactical constants, logging
│ │ ├── data/ # StatsBomb (+360) adapter, synthetic generator,
│ │ │ # repository, store/ cache
│ │ ├── db/ # SQLAlchemy models, session
│ │ ├── engine/ # tactical engine: formation, pressing, build-up,
│ │ │ # zones, networks, players, momentum, DNA,
│ │ │ # similarity, xT, possession value, packing, roles
│ │ ├── spatial/ # pitch control, 360-driven spatial models
│ │ ├── models/ # trained xG model — features, training, inference,
│ │ │ # artefacts/
│ │ ├── viz/ # mplsoccer pitch plots, heatmaps, xT plots,
│ │ │ # pitch-control snapshots
│ │ ├── ai/ # commentary providers + template engine
│ │ ├── api/ # routes + cached service layer (deps.py)
│ │ ├── config.py # strongly-typed settings
│ │ └── main.py # FastAPI app
│ ├── tests/ # pytest suite
│ └── requirements.txt
├── frontend/
│ └── src/
│ ├── components/ # Layout, RadarChart, MomentumChart, ZoneGrid,
│ │ # PitchImage, XTSurface, XTTimeline,
│ │ # CalibrationChart, ui primitives
│ ├── pages/ # Dashboard, MatchAnalysis (10 tabs),
│ │ # TeamProfile, Comparison, XGModel, PlayerRoles
│ └── api/ # API client
├── docs/ # ARCHITECTURE.md, DEPLOYMENT.md
├── docker-compose.yml
└── README.md
Requirements: Python 3.11+ and Node.js 18+.
cd backend
python -m pip install -r requirements.txt
uvicorn app.main:app --reloadAPI on http://localhost:8000 — interactive docs at /docs. Copy
.env.example to .env to run on real StatsBomb data (the first launch
downloads and caches the matches; later launches are instant). Without a
.env file PressIQ uses its fully offline synthetic season.
cd frontend
npm install
npm run devOpen http://localhost:5173. The Vite dev server proxies /api to the
backend.
cd backend
python -m app.models.train_xgFetches ~80 matches of raw StatsBomb shots, engineers features, trains a
HistGradientBoostingClassifier and writes app/models/artifacts/. The
xG-model page then displays metrics, calibration and feature importance.
docker compose up --buildBrings up PostgreSQL, seeds it and starts the API (:8000) and frontend
(:5173).
Copy .env.example to .env and adjust as needed. Key settings:
| Variable | Default | Description |
|---|---|---|
DATA_SOURCE |
generated |
statsbomb (real data), generated (synthetic), database. |
STATSBOMB_COMPETITION_ID |
2 |
StatsBomb competition (2 = Premier League). |
STATSBOMB_SEASON_ID |
27 |
StatsBomb season (27 = 2015/16). |
STATSBOMB_MAX_MATCHES |
20 |
Number of matches to load. |
STATSBOMB_USE_360 |
false |
Fetch the 360 freeze-frame dataset. Requires a competition that ships 360 data. |
DATABASE_URL |
sqlite:///./pressiq.db |
Used when DATA_SOURCE=database. |
AI_PROVIDER |
template |
groq, anthropic, openai, template. |
GROQ_API_KEY |
(blank) | Enables gpt-oss-120b commentary via Groq. |
ANTHROPIC_API_KEY / OPENAI_API_KEY |
(blank) | Alternative LLM providers. |
DATA_SOURCE=statsbomb needs internet on the first run; matches are then
cached locally for instant reloads.
The spatial-intelligence stack (pitch control, packing, the Spatial tab, shooting-lane pressure on the xG model) consumes StatsBomb 360 freeze frames — the locations of every visible player at the moment of every event. Only some competitions in the open dataset ship 360 frames:
| Competition | COMPETITION_ID |
SEASON_ID |
|---|---|---|
| UEFA Euro 2020 (men) | 55 |
43 |
| UEFA Euro 2024 (men) | 55 |
282 |
| FIFA Men's World Cup 2022 | 43 |
106 |
Enable 360 in backend/.env:
DATA_SOURCE=statsbomb
STATSBOMB_COMPETITION_ID=55
STATSBOMB_SEASON_ID=43
STATSBOMB_USE_360=trueOn the next launch PressIQ will fetch the 360 stream alongside each match, normalise frames into the actor's attacking frame, and cache them locally. The Spatial tab inside Match Analysis lights up: packing per team and per player, a pitch-control snapshot at the match's highest-xT pass, and per-team control summaries.
When 360 data is unavailable for a match, every spatial feature degrades silently — the tab shows a polite enable 360 message; the rest of the platform is unaffected.
| Model | Method | Where it lives | What it gives you |
|---|---|---|---|
| xG (Expected Goals) | HistGradientBoostingClassifier on shot geometry + context + 360 freeze-frame pressure |
app/models/ |
Trained model card with calibration curve, ROC-AUC, log-loss vs baseline, permutation feature importance, agreement with StatsBomb's xG. |
| Player roles | KMeans clustering on standardised per-match averages, with PCA-2D projection | app/engine/roles.py |
The Role Galaxy scatter and a cluster centroid table — discovered behavioural roles instead of nominal positions. |
| Expected Threat (xT) surface | Markov value-iteration on the move/shot transition matrix | app/engine/xt.py |
A 192-zone possession-value grid; per-action xT delta; team and player xT generated. |
The xG artefact is built by python -m app.models.train_xg and lives in
backend/app/models/artifacts/. Without an artefact PressIQ falls back to
an analytic distance/angle logistic so the application always has a usable
xG value.
| Method & path | Description |
|---|---|
GET /api/health |
Service health and configuration. |
GET /api/matches |
All matches. |
GET /api/matches/{id}/analysis |
Full tactical analysis. |
GET /api/matches/{id}/players |
Player analytics. |
GET /api/matches/{id}/passing-network |
Passing networks. |
GET /api/matches/{id}/momentum |
Momentum timeline. |
GET /api/matches/{id}/commentary |
AI tactical match report. |
GET /api/matches/{id}/possession-value |
Possession Value (PV) breakdown. |
GET /api/matches/{id}/xt |
Match Expected Threat analysis. |
GET /api/matches/{id}/packing |
Opponents bypassed per pass (requires 360). |
GET /api/matches/{id}/pitch-control |
Per-team pitch-control summary (requires 360). |
GET /api/matches/{id}/pitch-control/snapshot |
PNG of pitch control at the match's highest-xT pass. |
GET /api/matches/{id}/visuals/{kind} |
Server-rendered PNG visualization. |
GET /api/teams |
Team directory. |
GET /api/teams/{name} |
Team season profile + DNA. |
GET /api/teams/{name}/commentary |
AI team identity summary. |
GET /api/teams/compare |
Compare two teams. |
GET /api/teams/similarity-matrix |
Pairwise tactical similarity. |
GET /api/xt/surface |
Trained Expected Threat surface (JSON). |
GET /api/xt/surface/plot |
Trained xT surface PNG. |
GET /api/xg/model |
Trained xG model card. |
GET /api/players/roles |
Discovered behavioural roles. |
Visual kinds: formation, shotmap, progressive, touch-heatmap,
pressure-map, defensive-map, passing-network, xt-flow.
- Formation detection computes each outfield player's average on-ball position, clusters the longitudinal coordinates into 3–4 vertical bands with K-means, and matches the band signature against a known formation library.
- PPDA divides opponent passes in their build-up area by the pressing team's defensive actions in that same zone — the standard pressing metric.
- Progressive actions use distance-to-goal thresholds (30 / 15 / 10 units by pitch zone) to flag line-breaking passes and carries.
- Tactical DNA derives six normalised axes (Possession, Directness, Pressing, Width, Progression, Transition) directly from the event data and classifies the team into an archetype.
- Expected Threat (xT) trains a 16×12 grid surface by value iteration over the dataset's move and shot tendencies, then values each completed pass and carry by the lift it adds.
- Possession Value sums the xT progression, the shot xG (finishing) and a defending term — every team's defending value is the opponent's threat distributed across its ball-winning actions, weighted by zone danger.
- Pitch Control treats each visible player as a Gaussian source of influence; a pitch cell's actor-team control share is the fraction of total influence held by the actor's team.
- Packing counts the outfield opponents between passer and receiver along the attacking direction in each pass's freeze frame, with a separate defenders-bypassed count for opponents sitting in their own defensive third.
- Role Galaxy standardises eight per-player features, runs KMeans on them and projects the result onto PCA-2D for the role-galaxy scatter.
See docs/ARCHITECTURE.md for the full breakdown.
Fully modular:
groq— runsopenai/gpt-oss-120bon Groq's fast inference (the default provider; free API key at console.groq.com).anthropic— Claude with a tuned analyst system prompt and prompt caching.openai— same pipeline via OpenAI.template— deterministic engine producing genuine professional prose, with zero configuration.
If no API key is set or an LLM call fails, the service degrades to the template engine — PressIQ always produces meaningful commentary.
cd backend
python -m pip install -r requirements-dev.txt
pytestThe suite covers pitch geometry, every engine module, the spatial models, the xG-model fallback, the 360 freeze-frame normaliser and the HTTP API.
See docs/DEPLOYMENT.md for container, database and production guidance.
- Metrica tracking integration — the open Metrica Sports sample dataset
(3 matches of 25-fps tracking) would unlock true tracking-derived
metrics (player velocity, run-detection, dynamic pitch control). A
MetricaRepositoryand a frontend tracking-animation viewer is sketched in the architecture document. - Per-shot PressIQ xG application — extend the StatsBomb adapter to store the freeze-frame-derived shot features so the trained xG model can re-score every match shot alongside StatsBomb's own xG.
- Set-piece routine clustering — corner and free-kick patterns clustered into routines, then valued.
Real match data is provided by StatsBomb Open Data — free, detailed football event data (and 360 freeze frames for select competitions) published by StatsBomb. PressIQ is an independent project and is not affiliated with StatsBomb.
The xT model follows the formulation popularised by Karun Singh; the possession-value approach follows VAEP (KU Leuven); the packing metric was popularised by Impect / Stefan Reinartz. Pitch control uses the soft-Voronoi formulation.
MIT — see LICENSE.