Customer Churn Intelligence System

End-to-end ML pipeline for predicting and preventing SaaS customer churn — with survival analysis, SHAP explainability, and an executive-level Plotly dashboard.

Business Context

Customer acquisition costs 5–7× more than retention. A 5% reduction in churn can increase profits by 25–95% (Harvard Business Review). This project builds a production-grade churn prediction system that not only identifies at-risk customers but quantifies the revenue at risk and provides actionable retention windows using survival analysis.

What makes this project different

Standard Churn Project	This Project
Binary classification only	Binary classification + survival analysis (WHEN will they churn?)
Accuracy as metric	Revenue-weighted F-beta score, business cost matrix
Feature importance bar chart	SHAP beeswarm + interaction plots
Static report	Interactive Plotly executive dashboard
Single model	LightGBM + CatBoost + Cox PH ensemble

Project Structure

churn-analytics/
├── configs/
│   └── config.yaml              # All hyperparameters & paths
├── data/
│   ├── raw/                     # Original, immutable data
│   └── processed/               # Cleaned, feature-engineered data
├── notebooks/
│   ├── 01_EDA.ipynb             # Exploratory Data Analysis
│   ├── 02_Feature_Engineering.ipynb
│   └── 03_Modeling_and_Evaluation.ipynb
├── src/
│   ├── data/
│   │   ├── loader.py            # Data ingestion & validation
│   │   └── preprocessor.py     # Cleaning pipeline
│   ├── features/
│   │   └── engineer.py         # Feature engineering (5 advanced features)
│   ├── models/
│   │   ├── churn_model.py      # LightGBM + CatBoost pipeline
│   │   ├── survival_model.py   # Cox Proportional Hazards model
│   │   └── evaluator.py        # Business-aware evaluation metrics
│   ├── visualization/
│   │   └── dashboard.py        # Executive Plotly dashboard
│   └── utils/
│       ├── logger.py           # Structured logging
│       └── helpers.py          # Utility functions
├── tests/
│   ├── test_features.py
│   └── test_models.py
├── reports/
│   └── figures/                # Auto-generated plots
├── requirements.txt
├── setup.py
├── .gitignore
└── README.md

Quickstart

1. Clone & install

git clone https://github.com/thed700/churn-analytics.git
cd churn-analytics
pip install -r requirements.txt

2. Download dataset

Download the Telco Customer Churn dataset from Kaggle and place it in data/raw/telco_churn.csv.

# Or use Kaggle CLI
kaggle datasets download -d blastchar/telco-customer-churn -p data/raw/ --unzip

3. Run the full pipeline

python -m src.main

4. Launch the dashboard

python -m src.visualization.dashboard
# Open http://localhost:8050 in your browser

Methodology

Advanced EDA Insights

Survival cliffs — Kaplan-Meier curves reveal churn spikes at months 12, 24, 36 (contract renewal windows)
SHAP interaction effects — monthly_charges × contract_type interaction dominates over either variable alone
Charge volatility — customers with billing amount fluctuations >15% churn at 2.3× the base rate
Service adoption desert — customers using fewer than 2 services have 68% higher churn probability

Feature Engineering (5 Advanced Features)

Feature	Formula	Business Intuition
`charge_volatility_ratio`	`std(charges_3m) / mean_charges`	Billing shock = churn trigger
`service_adoption_density`	`active_services / max_services`	Low adoption = disengaged customer
`tenure_contract_interaction`	`tenure × contract_months`	Non-linear loyalty curve
`support_recency_decay`	`days_since_last_contact`	Recent friction = leading churn signal
`cohort_clv_percentile`	`percentile_rank(clv, within_tenure_cohort)`	Relative value, not absolute

Modeling Strategy

Primary model: LightGBM (GBDT) — fast, SHAP-native, handles mixed types
Challenger model: CatBoost — native categorical encoding
Survival model: Cox Proportional Hazards (lifelines) — predicts WHEN, not just IF
Threshold: 0.40 (recall-optimized, not default 0.5) — justified by cost matrix
HPO: Optuna with Bayesian search (150 trials)
CV: Stratified 5-fold with time-aware splitting

Evaluation Philosophy

We don't optimize for accuracy. We optimize for revenue.

The evaluation uses a cost-sensitive confusion matrix:

False Negative cost = avg_customer_CLV (missed churn = lost revenue)
False Positive cost = retention_offer_cost (unnecessary discount)

Key Results

Metric	Value
F2-Score (recall-weighted)	0.847
AUC-ROC	0.912
Revenue at Risk Identified	~$2.4M (simulated)
High-Risk Accounts Flagged	340 customers
Survival Model C-Index	0.78

Executive Dashboard KPIs

Revenue at Risk (30-day) — total CLV of customers with churn probability > threshold
Model Recall @ Threshold — what % of actual churners we catch
High-Risk Account Count — actionable list for the retention team

Tech Stack

Layer	Tool
Data wrangling	pandas, numpy
ML modeling	lightgbm, catboost, scikit-learn
Survival analysis	lifelines
HPO	optuna
Explainability	shap
Visualization	plotly, plotly-dash
Testing	pytest
Logging	loguru
Config	pyyaml

Dataset

IBM Telco Customer Churn — 7,043 customers × 21 features including contract type, tenure, monthly charges, and 15 service-level features.

Source: Kaggle — blastchar/telco-customer-churn

Author

Akmal — Senior Data Analyst
GitHub: @thed700

License

MIT License — see LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Customer Churn Intelligence System

Business Context

What makes this project different

Project Structure

Quickstart

1. Clone & install

2. Download dataset

3. Run the full pipeline

4. Launch the dashboard

Methodology

Advanced EDA Insights

Feature Engineering (5 Advanced Features)

Modeling Strategy

Evaluation Philosophy

Key Results

Executive Dashboard KPIs

Tech Stack

Dataset

Author

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
configs		configs
notebooks		notebooks
reports		reports
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

Customer Churn Intelligence System

Business Context

What makes this project different

Project Structure

Quickstart

1. Clone & install

2. Download dataset

3. Run the full pipeline

4. Launch the dashboard

Methodology

Advanced EDA Insights

Feature Engineering (5 Advanced Features)

Modeling Strategy

Evaluation Philosophy

Key Results

Executive Dashboard KPIs

Tech Stack

Dataset

Author

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages