Skip to content

AtaSakik/PressIQ

Repository files navigation

PressIQ — Football Spatial Intelligence Platform

An end-to-end football analytics platform: tactical engine, machine-learning Expected Goals (xG), Expected Threat (xT) possession value, VAEP-style action valuation, pitch control from 360 freeze frames, packing, data-driven player role discovery, AI-generated tactical commentary, and a production-grade dark dashboard.

PressIQ ingests football event data and produces the kind of intelligence elite analytics departments rely on: every action valued in expected-goal units, players clustered by what they actually do rather than where they line up, and — when 360 freeze frames are available — pitch control and packing computed from real player positions.

It runs on real StatsBomb Open Data out of the box (Premier League 2015/16 by default) and supports a 360-enabled mode (UEFA Euro 2020/2024, FIFA World Cup 2022) that unlocks the full spatial-intelligence stack. A deterministic possession-chain simulator ships as a fully offline fallback so the entire stack is runnable with zero API keys, zero database setup.


Spatial intelligence layer

Module What it adds
Expected Threat (xT) Markov grid possession-value surface, trained on the dataset; values every pass and carry by the goal probability it lifts.
Trained xG model Gradient-boosted Expected Goals model trained on real StatsBomb shots (geometry + context + freeze-frame pressure), with calibration curve and StatsBomb-xG agreement.
Possession Value (PV) VAEP-style action valuation — every action scored in expected-goal units (progression / finishing / defending).
StatsBomb 360 adapter Optional freeze-frame layer: every visible player's position at every event, normalised into the attacking frame.
Pitch Control Soft-Voronoi team-control surface from 360 freeze frames, per-event local control and per-match aggregates.
Packing Opponents bypassed by each completed pass, including defenders-bypassed and per-player top packers.
Role Galaxy Unsupervised KMeans clustering of every outfield player into behavioural roles, projected onto a 2-D PCA scatter.

Table of contents


Key capabilities

# Module What it does
1 Match Analysis Engine Orchestrates every analysis module into one structured tactical report with auto-generated key findings.
2 Formation Detection Infers average formation (4-3-3, 4-2-3-1, 3-5-2, 5-4-1, …) via positional K-means clustering with a confidence score.
3 Pressing Analysis PPDA, block height, press classification (high press / mid block / low block), counter-pressing recoveries, defensive activity zones.
4 Build-up Analysis Directness index, verticality, progressive passes/carries, pass-length mix, possession-chain profiling, build-up channel preference.
5 Half-space & Zonal Analysis 18-zone (3 thirds × 6 channels) occupation grid, explicit left/right half-space metrics, wide vs central balance.
6 Passing Networks Directed passing graph, involvement-scaled nodes, degree & betweenness centrality, key distributors.
7 Heatmaps & Event Visuals Touch heatmaps, pressure maps, shot maps (xG-sized), progressive-action maps, formation maps — all server-rendered.
8 Player Analytics Per-player progression, creativity, defensive output, composite influence score and inferred tactical role.
9 AI Tactical Commentary Professional match reports and team identity profiles via a modular LLM pipeline (Groq / Anthropic / OpenAI) with an offline template fallback.
10 Tactical DNA System Reduces a team's behaviour to a 6-axis style vector and classifies it into an archetype.
11 Match Momentum Minute-by-minute momentum curve, cumulative xG timelines, key-moment detection.
12 Tactical Similarity Pairwise similarity scoring between teams' DNA vectors.
13 Expected Threat (xT) A Markov grid possession-value surface trained by value iteration; every pass and carry valued by the threat it adds.
14 Trained xG model Gradient-boosted Expected Goals classifier with feature importance, calibration curve and a StatsBomb-xG agreement check.
15 Possession Value (PV) VAEP-style action valuation: progression + finishing + defending, in expected-goal units, per player.
16 StatsBomb 360 integration Freeze-frame layer powering the spatial-intelligence stack.
17 Pitch Control Soft-Voronoi control surface, local control around the ball, aggregated per team.
18 Packing Outfield opponents and defenders bypassed by each completed pass.
19 Player Role Galaxy Unsupervised clustering of every outfield player into behavioural roles, with a PCA scatter.

Architecture

┌──────────────────────────────────────────────────────────────────────┐
│                        React Frontend (Vite)                          │
│  Dashboard · Match · Threat · Value · Spatial · xG Model · Roles · DNA │
└───────────────────────────────┬──────────────────────────────────────┘
                                 │  REST / JSON + PNG
┌───────────────────────────────▼──────────────────────────────────────┐
│                          FastAPI Application                          │
│        routes · cached service layer · OpenAPI documentation           │
├───────────┬────────────┬────────────┬───────────┬──────────┬─────────┤
│ Tactical  │  Spatial   │  Machine   │ Visualiz. │   Data   │   AI    │
│  Engine   │   Layer    │  Learning  │  Engine   │ Repo.    │ Pipeline│
│           │            │            │           │          │         │
│ formation │ pitch ctrl │ xG model   │mplsoccer  │statsbomb │groq /   │
│ pressing  │ packing    │ roles      │heatmaps   │ + 360    │anthropic│
│ buildup   │ 360 data   │ KMeans/PCA │networks   │generated │ /openai │
│ zones     │            │ HistGBM    │xT plots   │database  │template │
│ xT/PV/etc.│            │            │spatial    │          │         │
└───────────┴────────────┴────────────┴───────────┴──────────┴─────────┘

The tactical engine and ML layer are pure: every function takes a Pandas DataFrame of events (and optionally a 360 freeze-frame dict / trained model artefact) and returns a JSON-serialisable dictionary. No knowledge of HTTP, the database or the data source — making the whole stack trivially testable and reusable.


Tech stack

Backend — Python · FastAPI · Pandas · NumPy · scikit-learn (HistGBM, KMeans, PCA, calibration) · NetworkX · SQLAlchemy · Pydantic Data — StatsBomb Open Data (events + 360 freeze frames) · synthetic possession simulator Visualization — mplsoccer · Matplotlib (server-rendered PNG) AI — modular LLM integration (Groq gpt-oss-120b / Anthropic Claude / OpenAI) with prompt engineering and a deterministic offline template fallback Frontend — React · Vite · React Router · hand-built SVG charts (radar, calibration, xT timeline, role galaxy) Database — PostgreSQL (SQLite fallback) · Alembic-ready Infra — Docker · docker-compose · GitHub Actions CI


Project structure

PressIQ/
├── backend/
│   ├── app/
│   │   ├── core/          # pitch geometry, tactical constants, logging
│   │   ├── data/          # StatsBomb (+360) adapter, synthetic generator,
│   │   │                  # repository, store/ cache
│   │   ├── db/            # SQLAlchemy models, session
│   │   ├── engine/        # tactical engine: formation, pressing, build-up,
│   │   │                  # zones, networks, players, momentum, DNA,
│   │   │                  # similarity, xT, possession value, packing, roles
│   │   ├── spatial/       # pitch control, 360-driven spatial models
│   │   ├── models/        # trained xG model — features, training, inference,
│   │   │                  # artefacts/
│   │   ├── viz/           # mplsoccer pitch plots, heatmaps, xT plots,
│   │   │                  # pitch-control snapshots
│   │   ├── ai/            # commentary providers + template engine
│   │   ├── api/           # routes + cached service layer (deps.py)
│   │   ├── config.py      # strongly-typed settings
│   │   └── main.py        # FastAPI app
│   ├── tests/             # pytest suite
│   └── requirements.txt
├── frontend/
│   └── src/
│       ├── components/    # Layout, RadarChart, MomentumChart, ZoneGrid,
│       │                  # PitchImage, XTSurface, XTTimeline,
│       │                  # CalibrationChart, ui primitives
│       ├── pages/         # Dashboard, MatchAnalysis (10 tabs),
│       │                  # TeamProfile, Comparison, XGModel, PlayerRoles
│       └── api/           # API client
├── docs/                  # ARCHITECTURE.md, DEPLOYMENT.md
├── docker-compose.yml
└── README.md

Quick start

Requirements: Python 3.11+ and Node.js 18+.

1. Backend

cd backend
python -m pip install -r requirements.txt
uvicorn app.main:app --reload

API on http://localhost:8000 — interactive docs at /docs. Copy .env.example to .env to run on real StatsBomb data (the first launch downloads and caches the matches; later launches are instant). Without a .env file PressIQ uses its fully offline synthetic season.

2. Frontend

cd frontend
npm install
npm run dev

Open http://localhost:5173. The Vite dev server proxies /api to the backend.

3. (Optional) train the xG model

cd backend
python -m app.models.train_xg

Fetches ~80 matches of raw StatsBomb shots, engineers features, trains a HistGradientBoostingClassifier and writes app/models/artifacts/. The xG-model page then displays metrics, calibration and feature importance.

4. Everything via Docker

docker compose up --build

Brings up PostgreSQL, seeds it and starts the API (:8000) and frontend (:5173).


Configuration

Copy .env.example to .env and adjust as needed. Key settings:

Variable Default Description
DATA_SOURCE generated statsbomb (real data), generated (synthetic), database.
STATSBOMB_COMPETITION_ID 2 StatsBomb competition (2 = Premier League).
STATSBOMB_SEASON_ID 27 StatsBomb season (27 = 2015/16).
STATSBOMB_MAX_MATCHES 20 Number of matches to load.
STATSBOMB_USE_360 false Fetch the 360 freeze-frame dataset. Requires a competition that ships 360 data.
DATABASE_URL sqlite:///./pressiq.db Used when DATA_SOURCE=database.
AI_PROVIDER template groq, anthropic, openai, template.
GROQ_API_KEY (blank) Enables gpt-oss-120b commentary via Groq.
ANTHROPIC_API_KEY / OPENAI_API_KEY (blank) Alternative LLM providers.

DATA_SOURCE=statsbomb needs internet on the first run; matches are then cached locally for instant reloads.


Spatial intelligence — enabling 360 data

The spatial-intelligence stack (pitch control, packing, the Spatial tab, shooting-lane pressure on the xG model) consumes StatsBomb 360 freeze frames — the locations of every visible player at the moment of every event. Only some competitions in the open dataset ship 360 frames:

Competition COMPETITION_ID SEASON_ID
UEFA Euro 2020 (men) 55 43
UEFA Euro 2024 (men) 55 282
FIFA Men's World Cup 2022 43 106

Enable 360 in backend/.env:

DATA_SOURCE=statsbomb
STATSBOMB_COMPETITION_ID=55
STATSBOMB_SEASON_ID=43
STATSBOMB_USE_360=true

On the next launch PressIQ will fetch the 360 stream alongside each match, normalise frames into the actor's attacking frame, and cache them locally. The Spatial tab inside Match Analysis lights up: packing per team and per player, a pitch-control snapshot at the match's highest-xT pass, and per-team control summaries.

When 360 data is unavailable for a match, every spatial feature degrades silently — the tab shows a polite enable 360 message; the rest of the platform is unaffected.


Machine-learning models

Model Method Where it lives What it gives you
xG (Expected Goals) HistGradientBoostingClassifier on shot geometry + context + 360 freeze-frame pressure app/models/ Trained model card with calibration curve, ROC-AUC, log-loss vs baseline, permutation feature importance, agreement with StatsBomb's xG.
Player roles KMeans clustering on standardised per-match averages, with PCA-2D projection app/engine/roles.py The Role Galaxy scatter and a cluster centroid table — discovered behavioural roles instead of nominal positions.
Expected Threat (xT) surface Markov value-iteration on the move/shot transition matrix app/engine/xt.py A 192-zone possession-value grid; per-action xT delta; team and player xT generated.

The xG artefact is built by python -m app.models.train_xg and lives in backend/app/models/artifacts/. Without an artefact PressIQ falls back to an analytic distance/angle logistic so the application always has a usable xG value.


API reference

Method & path Description
GET /api/health Service health and configuration.
GET /api/matches All matches.
GET /api/matches/{id}/analysis Full tactical analysis.
GET /api/matches/{id}/players Player analytics.
GET /api/matches/{id}/passing-network Passing networks.
GET /api/matches/{id}/momentum Momentum timeline.
GET /api/matches/{id}/commentary AI tactical match report.
GET /api/matches/{id}/possession-value Possession Value (PV) breakdown.
GET /api/matches/{id}/xt Match Expected Threat analysis.
GET /api/matches/{id}/packing Opponents bypassed per pass (requires 360).
GET /api/matches/{id}/pitch-control Per-team pitch-control summary (requires 360).
GET /api/matches/{id}/pitch-control/snapshot PNG of pitch control at the match's highest-xT pass.
GET /api/matches/{id}/visuals/{kind} Server-rendered PNG visualization.
GET /api/teams Team directory.
GET /api/teams/{name} Team season profile + DNA.
GET /api/teams/{name}/commentary AI team identity summary.
GET /api/teams/compare Compare two teams.
GET /api/teams/similarity-matrix Pairwise tactical similarity.
GET /api/xt/surface Trained Expected Threat surface (JSON).
GET /api/xt/surface/plot Trained xT surface PNG.
GET /api/xg/model Trained xG model card.
GET /api/players/roles Discovered behavioural roles.

Visual kinds: formation, shotmap, progressive, touch-heatmap, pressure-map, defensive-map, passing-network, xt-flow.


How the engine works

  • Formation detection computes each outfield player's average on-ball position, clusters the longitudinal coordinates into 3–4 vertical bands with K-means, and matches the band signature against a known formation library.
  • PPDA divides opponent passes in their build-up area by the pressing team's defensive actions in that same zone — the standard pressing metric.
  • Progressive actions use distance-to-goal thresholds (30 / 15 / 10 units by pitch zone) to flag line-breaking passes and carries.
  • Tactical DNA derives six normalised axes (Possession, Directness, Pressing, Width, Progression, Transition) directly from the event data and classifies the team into an archetype.
  • Expected Threat (xT) trains a 16×12 grid surface by value iteration over the dataset's move and shot tendencies, then values each completed pass and carry by the lift it adds.
  • Possession Value sums the xT progression, the shot xG (finishing) and a defending term — every team's defending value is the opponent's threat distributed across its ball-winning actions, weighted by zone danger.
  • Pitch Control treats each visible player as a Gaussian source of influence; a pitch cell's actor-team control share is the fraction of total influence held by the actor's team.
  • Packing counts the outfield opponents between passer and receiver along the attacking direction in each pass's freeze frame, with a separate defenders-bypassed count for opponents sitting in their own defensive third.
  • Role Galaxy standardises eight per-player features, runs KMeans on them and projects the result onto PCA-2D for the role-galaxy scatter.

See docs/ARCHITECTURE.md for the full breakdown.


AI tactical commentary

Fully modular:

  • groq — runs openai/gpt-oss-120b on Groq's fast inference (the default provider; free API key at console.groq.com).
  • anthropic — Claude with a tuned analyst system prompt and prompt caching.
  • openai — same pipeline via OpenAI.
  • template — deterministic engine producing genuine professional prose, with zero configuration.

If no API key is set or an LLM call fails, the service degrades to the template engine — PressIQ always produces meaningful commentary.


Testing

cd backend
python -m pip install -r requirements-dev.txt
pytest

The suite covers pitch geometry, every engine module, the spatial models, the xG-model fallback, the 360 freeze-frame normaliser and the HTTP API.


Deployment

See docs/DEPLOYMENT.md for container, database and production guidance.


Future work

  • Metrica tracking integration — the open Metrica Sports sample dataset (3 matches of 25-fps tracking) would unlock true tracking-derived metrics (player velocity, run-detection, dynamic pitch control). A MetricaRepository and a frontend tracking-animation viewer is sketched in the architecture document.
  • Per-shot PressIQ xG application — extend the StatsBomb adapter to store the freeze-frame-derived shot features so the trained xG model can re-score every match shot alongside StatsBomb's own xG.
  • Set-piece routine clustering — corner and free-kick patterns clustered into routines, then valued.

Acknowledgements

Real match data is provided by StatsBomb Open Data — free, detailed football event data (and 360 freeze frames for select competitions) published by StatsBomb. PressIQ is an independent project and is not affiliated with StatsBomb.

The xT model follows the formulation popularised by Karun Singh; the possession-value approach follows VAEP (KU Leuven); the packing metric was popularised by Impect / Stefan Reinartz. Pitch control uses the soft-Voronoi formulation.


License

MIT — see LICENSE.

About

Production-grade football intelligence platform — Expected Threat (xT), trained xG model, VAEP-style possession value, pitch control from StatsBomb 360 freeze frames, packing, data-driven player role discovery, AI tactical commentary. Real World Cup 2022 / Premier League data, FastAPI + React.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors