Customer Segmentation with K-Means Clustering

Overview

This project applies K-Means clustering to segment customers based on their age, annual income, and spending score using the Mall Customers dataset. The goal is to identify distinct customer groups to help the mall better understand its clientele and tailor marketing strategies.

Dataset

Source: Mall Customer Segmentation Data by vjchoudhary7 on Kaggle
Filename: Mall_Customers.csv

How to Run

Open the project folder in VS Code.
Open the Jupyter notebook file: customer_segmentation.ipynb.
Run all cells in order. Make sure you have the required libraries installed:
- pandas
- numpy
- matplotlib
- seaborn
- scikit-learn

You can install any missing libraries using: pip install pandas numpy matplotlib seaborn scikit-learn

Project Steps

Data Loading & Exploration: Loaded and explored the Mall Customers dataset.
Data Cleaning: Checked for missing values and selected relevant features (Age, Annual Income (k$), Spending Score (1-100)).
Visualization: Used pairplots and scatterplots to visualize relationships between features.
Clustering: Applied K-Means clustering and used the Elbow Method to determine the optimal number of clusters.
Interpretation: Analyzed and interpreted the resulting customer segments.

Cluster Interpretation

Cluster 0:
Average Age: ~46
Average Annual Income: ~$48,000
Average Spending Score: ~42
Older customers with moderate income and average spending behavior. Likely represent mature consumers with balanced purchasing habits.
Cluster 1:
Average Age: ~32
Average Annual Income: ~$108,000
Average Spending Score: ~83
Young, high-income, high-spending customers. Likely the most valuable segment for the mall.
Cluster 2:
Average Age: ~25
Average Annual Income: ~$30,000
Average Spending Score: ~74
Very young customers with low income but high spending. Possibly students or young professionals who spend significantly relative to their earnings.
Cluster 3:
Average Age: ~40
Average Annual Income: ~$87,000
Average Spending Score: ~19
Middle-aged, high-income customers with very low spending. Likely conservative or selective shoppers.
Cluster 4:
Average Age: ~32
Average Annual Income: ~$76,000
Average Spending Score: ~78
Young adults with moderately high income and high spending. A financially stable and active consumer segment.

Dependencies

Python 3.x
pandas
numpy
matplotlib
seaborn
scikit-learn

Author

Nitin Nandan
Internship Project for CodeClause

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Mall_Customers.csv		Mall_Customers.csv
README.md		README.md
customer_segmentation.ipynb		customer_segmentation.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Customer Segmentation with K-Means Clustering

Overview

Dataset

How to Run

Project Steps

Cluster Interpretation

Dependencies

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Customer Segmentation with K-Means Clustering

Overview

Dataset

How to Run

Project Steps

Cluster Interpretation

Dependencies

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages