Skip to content

sharifhsn/nbaquant

Repository files navigation

nbaquant

Quantitative NBA analysis built on play-by-play data, box scores, and game logs sourced from the official NBA API. This project combines data engineering pipelines with statistical modeling to surface insights that go beyond standard box score metrics.

Data Pipeline

Each analysis ingests data through a shared ETL pipeline:

  1. Extract - Game logs, play-by-play events, and box scores are pulled from the nba_api across multiple seasons, covering regular season and playoff games. Multiple box score variants are supported (player tracking, defensive, traditional, advanced, four factors, hustle, matchups, miscellaneous, scoring, usage).

  2. Transform - Raw data is cleaned, filtered by season recency, converted to memory-efficient categorical types, and normalized (e.g., ISO 8601 duration strings parsed to integer seconds). Play-by-play action types are classified using a comprehensive taxonomy of NBA event types.

  3. Load - Processed DataFrames are cached locally as CSV and can be exported to a SQL database (SQL Server, PostgreSQL, MySQL via SQLAlchemy) or to Excel for Power BI reporting.

Analyses

Defensive Plus/Minus in the Paint

A novel metric for evaluating paint protection that goes beyond the standard "defended at rim" stat. Instead of only counting shots a player was specifically identified as defending, this metric compares all opposing field goal attempts in the paint (layups, dunks, hooks) when a player is on the floor versus off. This captures a defender's broader paint presence and deterrence effect.

The analysis tracks substitution events in play-by-play data to determine on/off status per game, then computes per-36-minute shooting rates and field goal percentages for each state. Players are ranked by the delta between opponent FG% with them on court versus off. Evaluated across a range of player archetypes -- from elite rim protectors (Gobert, Kessler, Wembanyama) to poor defenders and point guards as control cases.

Foul Baiting via Poisson Modeling

Investigates the statistical distribution of team fouls per quarter to understand foul-drawing behavior around the penalty threshold (5 team fouls). The analysis:

  • Constructs empirical probability mass functions for pre-penalty, post-penalty, and total quarter fouls
  • Fits a truncated Poisson distribution (MLE via bounded optimization) to pre-penalty fouls, accounting for the natural ceiling at 5
  • Fits a standard Poisson distribution to post-penalty fouls, both unconditionally and conditioned on reaching the penalty
  • Compares empirical vs. theoretical distributions to identify deviations that may indicate strategic foul-drawing

Clutch Performance

  • Pace vs. Net Rating regression - Tests whether teams that play faster in clutch situations perform better, using simple linear regression with per-team visualization
  • PCA on clutch stats - Applies principal component analysis to 30-team clutch performance data (shooting splits, pace, win%) to identify the latent factors that best explain variance in clutch outcomes

Tech Stack

Layer Tools
Language Python 3.11+
Package Manager uv
Data pandas, NumPy, nba_api
Statistics SciPy (Poisson, MLE optimization), scikit-learn (PCA, linear regression)
Visualization Matplotlib, Seaborn
Database SQLAlchemy, pyodbc, psycopg2, mysqlclient
Code Quality Ruff (linter + formatter), pre-commit, nbstripout

Getting Started

Prerequisites

  • Python 3.11+
  • uv

Setup

# Clone and install
git clone <repo-url> && cd nbaquant
./setup.sh        # installs uv and ruff if needed
uv sync --dev     # installs all dependencies

# Set up dev hooks
uv run nbstripout --install
uv run pre-commit install

Optional: Database Export

Create a .env file to enable database export for Power BI reporting:

DB_TYPE=sqlserver
DB_USER=sqladmin
DB_PASSWORD=your_password
DB_HOST=localhost
DB_PORT=1433
DB_NAME=dataframes
DB_DRIVER=ODBC Driver 18 for SQL Server

Without a database configured, results export to Excel files in data/.

Project Structure

nbaquant/
  src/nbaquant/
    default.ipynb      # Defensive plus/minus in the paint + rim defense
    rim_defense.ipynb   # Extended rim defense with additional box score types
    pace.ipynb          # Clutch pace regression + PCA
    penalty.ipynb       # Foul distribution Poisson modeling
    extra.py            # NBA action type taxonomy (reference)
  data/                 # Cached CSV data (gitignored)
  pyproject.toml        # Project config and dependencies

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors