Skip to content

eaboulila/CustomerSegmentation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 

Repository files navigation

💳 Customer Segmentation Platform – End-to-End ML System

Python Clustering Scikit-Learn FastAPI Docker CI

Production-ready customer segmentation system using unsupervised machine learning to identify behavioral clusters in credit card data.

📌 Business Context

Banks and fintech companies need to:

  • Identify high-value customers
  • Detect risky spending behavior
  • Personalize credit offers
  • Optimize marketing spend
  • Improve customer retention

This project builds a scalable ML segmentation pipeline to solve that.

🧠 System Architecture

mermaid-diagram (1)

🔎 1️⃣ Data Engineering

Preprocessing

  • Missing value imputation
  • Outlier detection
  • Log transformation for skewed features
  • Standard scaling

Feature Engineering

  • Credit utilization ratio
  • Purchase frequency metrics
  • Cash advance ratio
  • Payment consistency index

🤖 2️⃣ Model Development

Algorithms Evaluated

Model Silhouette Score Stability
K-Means 0.42 High
Hierarchical 0.39 Medium
DBSCAN 0.21 Low

Final model: K-Means (k=4)

Cluster Selection Strategy

  • Elbow Method
  • Silhouette Analysis
  • Business interpretability validation

🏷 3️⃣ Identified Customer Segments

🥇 Premium High Spenders

High balance, high purchases, strong credit limits.

🥈 Stable Customers

Moderate spending, consistent payments.

🥉 Cash Advance Heavy Users

High withdrawals, potential financial risk.

💤 Low Engagement Users

Low activity, low revenue contribution.

📊 4️⃣ Evaluation Strategy

  • Silhouette Score
  • Inertia
  • PCA visualization
  • Cluster stability tests
  • Feature importance per cluster

🏗 5️⃣ End-to-End ML Pipeline

Implemented using sklearn Pipeline:

Pipeline([
    ('scaler', StandardScaler()),
    ('pca', PCA(n_components=0.90)),
    ('cluster', KMeans(n_clusters=4))
])

Pipeline supports:

  • Re-training
  • Batch scoring
  • Real-time inference
  • Reproducibility

🌐 6️⃣ Deployment Architecture

🔹 REST API (FastAPI)

Endpoints:

  • /predict → Assign cluster
  • /health → Check service
  • /retrain → Trigger retraining

🔹 Dockerized Service

docker build -t segmentation-api .
docker run -p 8000:8000 segmentation-api

📊 7️⃣ Monitoring & MLOps

  • Cluster drift detection
  • Distribution monitoring
  • Re-clustering trigger logic
  • Versioned models
  • Experiment tracking (optional MLflow)

About

Built a production-ready customer segmentation system using K-Means clustering with PCA-based dimensionality reduction. Designed full ML pipeline with API deployment, Docker support, and monitoring strategy for scalable fintech applications.

Topics

Resources

License

Stars

Watchers

Forks

Contributors