Production-ready customer segmentation system using unsupervised machine learning to identify behavioral clusters in credit card data.
Banks and fintech companies need to:
- Identify high-value customers
- Detect risky spending behavior
- Personalize credit offers
- Optimize marketing spend
- Improve customer retention
This project builds a scalable ML segmentation pipeline to solve that.
- Missing value imputation
- Outlier detection
- Log transformation for skewed features
- Standard scaling
- Credit utilization ratio
- Purchase frequency metrics
- Cash advance ratio
- Payment consistency index
| Model | Silhouette Score | Stability |
|---|---|---|
| K-Means | 0.42 | High |
| Hierarchical | 0.39 | Medium |
| DBSCAN | 0.21 | Low |
Final model: K-Means (k=4)
- Elbow Method
- Silhouette Analysis
- Business interpretability validation
High balance, high purchases, strong credit limits.
Moderate spending, consistent payments.
High withdrawals, potential financial risk.
Low activity, low revenue contribution.
- Silhouette Score
- Inertia
- PCA visualization
- Cluster stability tests
- Feature importance per cluster
Implemented using sklearn Pipeline:
Pipeline([
('scaler', StandardScaler()),
('pca', PCA(n_components=0.90)),
('cluster', KMeans(n_clusters=4))
])Pipeline supports:
- Re-training
- Batch scoring
- Real-time inference
- Reproducibility
Endpoints:
/predict→ Assign cluster/health→ Check service/retrain→ Trigger retraining
docker build -t segmentation-api .
docker run -p 8000:8000 segmentation-api- Cluster drift detection
- Distribution monitoring
- Re-clustering trigger logic
- Versioned models
- Experiment tracking (optional MLflow)