🏃 Running Agent — Data Science & Explainable ML Project

Running Agent analyzes Garmin/Strava running data to extract insights, track training load, and build predictive, explainable models. It serves both as a personal training analytics tool and a data-science portfolio project showcasing reproducible pipelines, interpretable ML, and SQL-backed dashboards.

📊 Overview

Purpose

Understand and visualize individual running patterns
Track key performance indicators (distance, pace, cadence, load)
Cluster runs into natural categories (easy, tempo, hilly, intervals)
Predict pace and fatigue using Random Forest models
Prototype a Tamagotchi-style running agent that suggests training intensity

Core Concepts

End-to-end ML workflow: raw Garmin .fit → cleaned dataset → ML models
Explainable AI (SHAP) for transparent model behaviour
Interactive dashboard powered by Streamlit
PostgreSQL + SQLAlchemy for structured, persistent data storage

📓 Notebook Workflow

Notebook	Focus	Key Outputs
01_explore_data	Load & inspect Garmin/Strava data	Basic stats & visualizations
02_feature_engineering	Compute derived metrics (load, variability, cadence drift)	`runs_summary.csv`
03_clustering_runs	Unsupervised learning for run grouping	Cluster labels
04_predictive_models	Random Forest regression + classification	Pace & run-type models
05_model_interpretation	SHAP explainability	Global & local feature attributions
06_interactive_dashboard	Streamlit app	Interactive UI
07_postgresql_storage	Save data + SHAP results to PostgreSQL	Tables: `runs_summary`, `shap_importance_global`, `data_lineage`

🗄️ Database Integration

PostgreSQL 16 for structured, durable storage
SQLAlchemy for engine creation and ORM-style interactions

Core tables

runs_summary — per-run feature set
shap_importance_global — mean SHAP values across features
data_lineage — timestamps, dataset versions, transformation logs

Example SQL use cases:

Weekly summaries & training load trends
Top SHAP features per model
Reproducibility checks through lineage

🧱 Folder Structure

running-agent/
│
├── data/
│   ├── raw/                  # raw Garmin/Strava exports (ignored in Git)
│   ├── interim/              # temporary intermediate outputs
│   ├── processed/            # derived CSV/Parquet files
│   └── sql/                  # database init scripts (ignored in Git)
│
├── notebooks/
│   ├── 01_explore_data.ipynb
│   ├── 02_feature_engineering.ipynb
│   ├── 03_clustering_runs.ipynb
│   ├── 04_predictive_models.ipynb
│   ├── 05_model_interpretation.ipynb
│   ├── 07_postgresql_storage.ipynb
│   └── archive/
│
├── src/
│   ├── __init__.py
│   ├── db_utils.py           # PostgreSQL utilities
│   ├── xai_utils.py          # SHAP helper functions
│   ├── features/
│   │   └── engineering.py    # feature engineering pipeline
│   └── ingestion/
│       └── parse_fit.py      # .fit file parsing
│
├── models/                   # trained models (ignored in Git)
│   ├── model_rf_clf.joblib
│   └── shap_explainer_clf.pkl
│
├── 06_interactive_dashboard_humanized.py  # Streamlit dashboard
├── requirements.txt
├── requirements-dev.txt
├── .gitignore
└── README.md

---

## ⚙️ Environment Setup

```bash
# 1. Clone repository
git clone https://github.com/gommezen/running-agent.git
cd running-agent

# 2. Create and activate environment
python -m venv .venv
source .venv/bin/activate        # Linux / macOS
# .\.venv\Scripts\activate       # Windows

# 3. Install dependencies
pip install -r requirements.txt
pip install -r requirements-dev.txt   # linting, testing, pre-commit

# 4. Set up PostgreSQL
# Create a database, then create a .env file in the project root:
#   DATABASE_URL=postgresql://user:password@localhost:5432/running_agent

# 5. Test database connection
python -m src.db_utils

# 6. Run the dashboard
streamlit run 06_interactive_dashboard_humanized.py

🧩 Next Steps for this Project

UX Update — Streamlit Dashboard (🔄 In Progress) Refine layout, tabs, and visual hierarchy for a smoother user experience. Add filters, metric cards, and consistent color/label styling.
Notebook 7 → PostgreSQL Storage (✅ Completed) Data now stored persistently in PostgreSQL and queried live via SQLAlchemy.
Notebook 8 → Monitoring & Automated Logging Implement lineage tracking, model-version logging, and automated SHAP summaries.
Dockerize the App Containerize the Streamlit + PostgreSQL setup for portable, reproducible deployment.
CI/CD Integration (GitHub Actions) Automate testing, style checks, and build verification on every commit.
API Integration (Garmin / Strava) Enable automatic ingestion of new running data through connected APIs.
Agent Iteration v0.3+ Extend toward an adaptive “Running Agent” that provides personalized training insights and recommendations.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github/workflows		.github/workflows
.streamlit		.streamlit
data/sql		data/sql
docs		docs
notebooks		notebooks
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
06_interactive_dashboard.ipynb		06_interactive_dashboard.ipynb
06_interactive_dashboard_humanized.py		06_interactive_dashboard_humanized.py
README.md		README.md
check_storage.sql		check_storage.sql
config.yaml		config.yaml
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🏃 Running Agent — Data Science & Explainable ML Project

📊 Overview

Purpose

Core Concepts

📓 Notebook Workflow

🗄️ Database Integration

🧱 Folder Structure

About

Uh oh!

Packages

Contributors 2

Uh oh!

Languages

gommezen/running-agent

Folders and files

Latest commit

History

Repository files navigation

🏃 Running Agent — Data Science & Explainable ML Project

📊 Overview

Purpose

Core Concepts

📓 Notebook Workflow

🗄️ Database Integration

🧱 Folder Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Packages 0

Contributors 2

Uh oh!

Languages

Packages