Skip to content

omprakash0702/Customer-Satisfaction-CSAT-Prediction-using-Machine-Learning

Repository files navigation

Customer Satisfaction (CSAT) Prediction using Machine Learning

📌 Project Overview

Customer Satisfaction (CSAT) is a critical indicator of service quality and customer loyalty in e-commerce. This project analyzes large-scale customer support interaction data to identify the key drivers of customer satisfaction and to build a machine learning model that predicts whether a customer is Satisfied or Unsatisfied after an interaction.

The project follows a complete end‑to‑end Data Science pipeline including:

  • Data cleaning & wrangling
  • Exploratory Data Analysis (EDA)
  • Statistical hypothesis testing
  • Feature engineering
  • Handling class imbalance (SMOTE)
  • Machine learning model training & evaluation
  • Hyperparameter tuning
  • Business insights & conclusions

🗂 Dataset Summary

  • Records: 85,907
  • Features: 20 (categorical, numerical, temporal)
  • Target Variable: CSAT Score (1–5)

Key columns include:

  • Channel Type (Inbound, Outcall, Email)
  • Issue Category & Sub‑Category
  • Agent, Supervisor, Manager
  • Tenure Bucket & Agent Shift
  • Issue Reported Time & Response Time

After cleaning and feature selection, the final modeling dataset contained 82,779 rows and 15 features.


🛠 Tools & Technologies Used

  • Language: Python 3.12

  • Libraries:

    • NumPy, Pandas
    • Matplotlib, Seaborn
    • Scikit‑learn
    • SciPy
    • Imbalanced‑learn (SMOTE)
  • Model Persistence: joblib

  • Environment: Jupyter Notebook, Anaconda


🧹 Data Cleaning & Feature Engineering

  • Dropped high‑missing and low‑relevance columns
  • Converted datetime columns into proper formats
  • Engineered a new feature: response_time_minutes
  • Extracted time‑based features (hour, day of week, survey day)
  • Median imputation for skewed numeric values
  • Outlier treatment using 99th percentile capping
  • Label encoding for categorical features

📊 Exploratory Data Analysis (EDA)

Key insights from EDA:

  • 69% of customers rated CSAT = 5, indicating strong overall satisfaction
  • Faster response times strongly correlate with higher CSAT
  • Morning and split shifts perform slightly better than night shifts
  • Experienced agents consistently achieve higher CSAT
  • Refunds & returns show lower satisfaction and higher response times

Multiple univariate, bivariate, and multivariate visualizations were created to validate these patterns.


🧪 Hypothesis Testing

Three statistical tests were performed:

  1. ANOVA: Response Time vs CSAT → Significant relationship (p < 0.001)
  2. Chi‑Square: Channel Type vs CSAT → CSAT depends on channel (p < 0.001)
  3. T‑Test: New vs Experienced Agents → Mixed but informative results

These tests ensured that insights from EDA were statistically valid.


⚖️ Handling Imbalanced Data

  • The dataset was highly imbalanced (~82% satisfied vs 18% unsatisfied)
  • Applied SMOTE (Synthetic Minority Oversampling Technique) on training data
  • Balanced class distribution to 50/50 before model training

🤖 Machine Learning Models Used

Three classification models were trained and evaluated:

Model Accuracy F1‑Score
Logistic Regression 58% 0.71
Random Forest 67% 0.78
Gradient Boosting (Best) 74% 0.84

Best Model: Gradient Boosting Classifier

  • Tuned using GridSearchCV (5‑fold CV)

  • Final Parameters:

    • learning_rate = 0.2
    • max_depth = 7
    • n_estimators = 300

Final tuned performance:

  • Accuracy: ~74%
  • F1‑Score: ~0.84
  • Recall (Satisfied class): ~0.84

⭐ Key Features Influencing CSAT

  • Response Time
  • Agent Name
  • Supervisor
  • Issue Category & Sub‑Category
  • Tenure Bucket
  • Agent Shift

These features provide direct operational levers for business improvement.


💼 Business Impact

This project provides actionable insights for business teams:

  • Reduce response time to directly improve satisfaction
  • Assign complex issues to experienced agents
  • Improve refund & return workflows
  • Optimize staffing by shift performance

The trained model can be integrated into a live dashboard to flag high‑risk (unsatisfied) interactions in real‑time.


📁 Project Structure

├── CSAT_Prediction.ipynb
├── best_csat_model_GradientBoosting.pkl
├── Customer_support_data.csv
└── README.md

🚀 How to Run the Project

  1. Clone the repository

  2. Install dependencies:

    pip install -r requirements.txt
  3. Open and run the notebook:

    jupyter notebook CSAT_Prediction.ipynb

✅ Conclusion

This project successfully demonstrates how data‑driven analytics and machine learning can be used to understand and predict customer satisfaction at scale. By combining statistical validation with predictive modeling, the study bridges the gap between business insight and AI‑driven decision making.


👤 Author

Developed as part of an individual data science project for portfolio and interview preparation.

About

Customer Satisfaction (CSAT) is a critical indicator of service quality and customer loyalty in e-commerce. This project analyzes large-scale customer support interaction data to identify the key drivers of customer satisfaction and to build a machine learning model that predicts whether a customer is Satisfied or Unsatisfied after an interaction.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors