Customer Satisfaction (CSAT) Prediction using Machine Learning

📌 Project Overview

Customer Satisfaction (CSAT) is a critical indicator of service quality and customer loyalty in e-commerce. This project analyzes large-scale customer support interaction data to identify the key drivers of customer satisfaction and to build a machine learning model that predicts whether a customer is Satisfied or Unsatisfied after an interaction.

The project follows a complete end‑to‑end Data Science pipeline including:

Data cleaning & wrangling
Exploratory Data Analysis (EDA)
Statistical hypothesis testing
Feature engineering
Handling class imbalance (SMOTE)
Machine learning model training & evaluation
Hyperparameter tuning
Business insights & conclusions

🗂 Dataset Summary

Records: 85,907
Features: 20 (categorical, numerical, temporal)
Target Variable: CSAT Score (1–5)

Key columns include:

Channel Type (Inbound, Outcall, Email)
Issue Category & Sub‑Category
Agent, Supervisor, Manager
Tenure Bucket & Agent Shift
Issue Reported Time & Response Time

After cleaning and feature selection, the final modeling dataset contained 82,779 rows and 15 features.

🛠 Tools & Technologies Used

Language: Python 3.12
Libraries:
- NumPy, Pandas
- Matplotlib, Seaborn
- Scikit‑learn
- SciPy
- Imbalanced‑learn (SMOTE)
Model Persistence: joblib
Environment: Jupyter Notebook, Anaconda

🧹 Data Cleaning & Feature Engineering

Dropped high‑missing and low‑relevance columns
Converted datetime columns into proper formats
Engineered a new feature: response_time_minutes
Extracted time‑based features (hour, day of week, survey day)
Median imputation for skewed numeric values
Outlier treatment using 99th percentile capping
Label encoding for categorical features

📊 Exploratory Data Analysis (EDA)

Key insights from EDA:

69% of customers rated CSAT = 5, indicating strong overall satisfaction
Faster response times strongly correlate with higher CSAT
Morning and split shifts perform slightly better than night shifts
Experienced agents consistently achieve higher CSAT
Refunds & returns show lower satisfaction and higher response times

Multiple univariate, bivariate, and multivariate visualizations were created to validate these patterns.

🧪 Hypothesis Testing

Three statistical tests were performed:

ANOVA: Response Time vs CSAT → Significant relationship (p < 0.001)
Chi‑Square: Channel Type vs CSAT → CSAT depends on channel (p < 0.001)
T‑Test: New vs Experienced Agents → Mixed but informative results

These tests ensured that insights from EDA were statistically valid.

⚖️ Handling Imbalanced Data

The dataset was highly imbalanced (~82% satisfied vs 18% unsatisfied)
Applied SMOTE (Synthetic Minority Oversampling Technique) on training data
Balanced class distribution to 50/50 before model training

🤖 Machine Learning Models Used

Three classification models were trained and evaluated:

Model	Accuracy	F1‑Score
Logistic Regression	58%	0.71
Random Forest	67%	0.78
Gradient Boosting (Best)	74%	0.84

Best Model: Gradient Boosting Classifier

Tuned using GridSearchCV (5‑fold CV)
Final Parameters:
- learning_rate = 0.2
- max_depth = 7
- n_estimators = 300

Final tuned performance:

Accuracy: ~74%
F1‑Score: ~0.84
Recall (Satisfied class): ~0.84

⭐ Key Features Influencing CSAT

Response Time
Agent Name
Supervisor
Issue Category & Sub‑Category
Tenure Bucket
Agent Shift

These features provide direct operational levers for business improvement.

💼 Business Impact

This project provides actionable insights for business teams:

Reduce response time to directly improve satisfaction
Assign complex issues to experienced agents
Improve refund & return workflows
Optimize staffing by shift performance

The trained model can be integrated into a live dashboard to flag high‑risk (unsatisfied) interactions in real‑time.

📁 Project Structure

├── CSAT_Prediction.ipynb
├── best_csat_model_GradientBoosting.pkl
├── Customer_support_data.csv
└── README.md

🚀 How to Run the Project

Clone the repository
Install dependencies:
```
pip install -r requirements.txt
```
Open and run the notebook:
```
jupyter notebook CSAT_Prediction.ipynb
```

✅ Conclusion

This project successfully demonstrates how data‑driven analytics and machine learning can be used to understand and predict customer satisfaction at scale. By combining statistical validation with predictive modeling, the study bridges the gap between business insight and AI‑driven decision making.

👤 Author

Developed as part of an individual data science project for portfolio and interview preparation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Customer Satisfaction (CSAT) Prediction using Machine Learning

📌 Project Overview

🗂 Dataset Summary

🛠 Tools & Technologies Used

🧹 Data Cleaning & Feature Engineering

📊 Exploratory Data Analysis (EDA)

🧪 Hypothesis Testing

⚖️ Handling Imbalanced Data

🤖 Machine Learning Models Used

Best Model: Gradient Boosting Classifier

⭐ Key Features Influencing CSAT

💼 Business Impact

📁 Project Structure

🚀 How to Run the Project

✅ Conclusion

👤 Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Customer_Satisfaction_(CSAT)_Prediction_using_Machine_Learning (3).ipynb		Customer_Satisfaction_(CSAT)_Prediction_using_Machine_Learning (3).ipynb
Customer_support_data.csv		Customer_support_data.csv
README.md		README.md
best_csat_model_GradientBoosting.pkl		best_csat_model_GradientBoosting.pkl

Folders and files

Latest commit

History

Repository files navigation

Customer Satisfaction (CSAT) Prediction using Machine Learning

📌 Project Overview

🗂 Dataset Summary

🛠 Tools & Technologies Used

🧹 Data Cleaning & Feature Engineering

📊 Exploratory Data Analysis (EDA)

🧪 Hypothesis Testing

⚖️ Handling Imbalanced Data

🤖 Machine Learning Models Used

Best Model: Gradient Boosting Classifier

⭐ Key Features Influencing CSAT

💼 Business Impact

📁 Project Structure

🚀 How to Run the Project

✅ Conclusion

👤 Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages