Customer churn is one of the most critical problems in subscription businesses.
Losing customers directly impacts revenue, growth, and acquisition costs.
CustomerChurnPredictor is an end-to-end churn risk system that combines:
- β Machine Learning prediction (probability scoring)
- β Risk segmentation (buckets + deciles)
- β ROI-based decisioning (threshold β business value)
- β FastAPI inference service (single + batch prediction)
- β Streamlit modern SaaS dashboard (interactive demo + analytics)
- β Monitoring with Evidently (data drift report)
- β Tableau dashboards for executive stakeholders
The solution is designed to answer:
βWho is most likely to churn next, and what should we do about it?β
Most churn projects stop at accuracy. This project goes further:
- Probability β Decision Policy: Intervene only above a chosen threshold
- Threshold is ROI-driven: We simulate expected value across thresholds (not random 0.50 defaults)
- Explainability built-in: SHAP + permutation importance
- Deployed system: API + UI + logs + monitoring
- Executive dashboards: Tableau-ready exports + dashboard suite
CustomerChurnPredictor/
ββ churn/ # Core ML + API + monitoring modules
β ββ data.py # Download + clean dataset
β ββ modeling.py # Preprocess + candidate models
β ββ train.py # Train + save best model
β ββ evaluate.py # Metrics + confusion matrix + threshold scan
β ββ explain.py # Permutation + SHAP explainability
β ββ business.py # ROI simulation + best threshold
β ββ tableau_export.py # Exports final Tableau-ready CSVs
β ββ api.py # FastAPI inference service + logging
β ββ monitor.py # Evidently drift report
β ββ config.py # Paths, columns, business defaults
β
ββ app/
β ββ streamlit_app.py # Modern SaaS Streamlit UI
β
ββ data/ # Local-only data (ignored in git except placeholder)
β ββ raw/
β ββ processed/
β ββ tableau/ # Exports for Tableau dashboards
β ββ logs/ # API prediction logs
β
ββ models/ # Saved model artifact (model.joblib) + metadata
ββ reports/
β ββ figures/ # Explainability + ROI plots (PNG/CSV)
β ββ metrics/ # Model metrics, threshold scan, best threshold
β ββ monitoring/ # Drift report HTML
β
ββ tableau/ # Tableau workbook (.twbx) + screenshots
β ββ screenshots/
β
ββ tests/ # Basic CI tests
ββ requirements.txt
ββ requirements-dev.txt
ββ Makefile
ββ README.md
Raw Telco Dataset
β
Data Cleaning & Feature Engineering (Pandas)
β
Preprocessing Pipeline (Impute + Scale + OneHotEncode)
β
Classification Models (LogReg, RF)
β
Churn Probability Scores (0β1)
β
Risk Segmentation (Buckets + Deciles)
β
ROI-Based Threshold Decisioning
β
FastAPI + Streamlit + Monitoring + Tableau Dashboards
You requested a 2-dashboard workflow:
- Dashboard 1: Churn Overview (Executive / Business Analysis)
- Dashboard 2: ML Risk Intelligence (Predictive + Decision Layer)
Focus: historical churn patterns and segmentation insights.
KPIs
- Total Customers
- Churn Rate
- Avg Monthly Charges
- Avg Tenure
Visuals
- Churn by Contract Type
- Churn by Internet Service
- Churn by Payment Method
- Churn by Tenure Bucket
- Interactive filters (Contract, InternetService, PaymentMethod, SeniorCitizen)
Focus: who will churn next and what to do.
KPIs
- Avg Churn Probability
- High Risk Count (β₯ threshold)
- Targeted Customers (decision policy)
Visuals
- Churn probability distribution
- Risk decile breakdown (Top 10% = Decile 10)
- High-risk customers table
- ROI threshold curve (Total Expected Value vs Threshold)
After running:
python -m churn.tableau_exportTableau-ready exports are generated in:
data/tableau/telco_cleaned.csvdata/tableau/telco_scored.csvdata/tableau/roi_thresholds.csvdata/tableau/feature_importance.csvdata/tableau/threshold_scan.csv
The Streamlit app includes:
- KPI cards (model performance + threshold)
- Predict tab (decision + EV per customer)
- Analytics dashboard tab (stacked distributions + drivers + threshold tradeoffs)
- Explainability tab (Permutation + SHAP global/local)
- Business tab (ROI curve + threshold strategy)
- Logs & batch scoring tab (CSV upload + /predict_batch)
Endpoints:
GET /healthβ service statusPOST /predictβ score one customerPOST /predict_batchβ score many rows (batch scoring)
All predictions are logged to:
data/logs/predictions_log.csv
This log is used for monitoring drift.
In this project, FastAPI acts as the bridge between the trained machine learning model and the user interface.
During development, the system runs in two parts:
-
FastAPI Backend (Model Server)
- Loads the trained model (
model.joblib) - Exposes prediction endpoints:
/predictβ single customer/predict_batchβ multiple customers
- Handles inference logic and logging
- Loads the trained model (
-
Streamlit Frontend (Dashboard UI)
- Collects user input (customer data)
- Sends requests to FastAPI
- Displays:
- churn probability
- decision (intervene or not)
- expected business value
π Flow:
User Input (Streamlit)
β
HTTP Request β FastAPI (/predict)
β
Model (joblib) β Prediction
β
Response β Streamlit UI
This setup mimics a real production ML system, where:
- UI β Model
- Communication happens via APIs
FastAPI is chosen because it is:
- β‘ Fast and lightweight (high-performance inference)
- π¦ Production-ready (used in real ML systems)
- π Easy to integrate with frontends (Streamlit, React, etc.)
- π Supports batch inference and scalability
In a real production environment, this system would be deployed as:
- FastAPI β deployed on cloud (AWS / GCP / Azure)
- Model β stored in object storage (S3 / GCS)
- Load balancer β handles traffic
- Database β stores prediction logs
- Frontend β separate app (React / dashboard)
User β Web App
β
API Gateway / Load Balancer
β
FastAPI Service (Docker container)
β
Model Inference
β
Response + Logging (Database)
- Docker (containerization)
- Kubernetes (scaling)
- AWS ECS / Lambda / EC2
- CI/CD pipelines (GitHub Actions)
Since this is an academic + portfolio project, we use a simplified setup:
- FastAPI runs locally (
http://localhost:8000) - Streamlit connects directly to it
- No cloud infrastructure required
- No cost involved
This allows:
- β Fast development
- β Easy debugging
- β Zero deployment cost
- β Demonstrates full ML system design
The project also supports a fallback mode:
- If FastAPI is offline, Streamlit:
- loads
model.joblibdirectly - performs predictions locally
- loads
This ensures:
- π« No dependency on backend uptime
- π Works on Streamlit Cloud
- πΌ Demonstrates resilient system design
This architecture shows that the project is not just:
β βa machine learning modelβ
It is:
β a complete ML system with deployment, APIs, UI, and monitoring
βI deployed my churn model behind a FastAPI service, which the Streamlit dashboard calls in real-time. I also implemented a local fallback so the system works even without a backendβmaking it both production-ready and deployable for free.β
Run:
python -m churn.monitorOutput:
reports/monitoring/data_drift_report.html
This compares:
- reference sample (saved during training)
- current inference logs (from API)
python -m venv .venv
# Windows:
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
pip install -r requirements-dev.txtpython -m churn.data --download
python -m churn.train
python -m churn.evaluate
python -m churn.explain
python -m churn.business
python -m churn.tableau_exportTerminal A:
python -m churn.apiTerminal B:
python -m streamlit run app/streamlit_app.py- Month-to-month contracts are consistently the highest churn risk
- Long-term contracts (1β2 year) strongly reduce churn likelihood
- Churn risk is concentrated: a smaller segment can represent a large share of revenue exposure
- Probability-based segmentation enables targeted retention strategies instead of broad campaigns
Companies can use this system to:
- identify high-risk customers early
- deploy targeted retention campaigns
- improve contract conversion strategies
- protect recurring revenue with ROI-optimized decisions
|
Mitra Boga |
Yashweer Potelu |
Datla Akshith Varma |
Pranav Surya |









