Skip to content

vnscka/Customer-Segmentation-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Customer Segmentation Analysis Project

Python 3.8+ License: MIT

Overview

This project performs customer segmentation analysis using the Online Retail Dataset from the UCI Machine Learning Repository. The analysis combines SQL operations, K-means clustering, and statistical analysis to derive meaningful customer segments and insights.

Dataset

We use the Online Retail Dataset from UCI Machine Learning Repository. This dataset contains:

  • 541,909 transactions
  • 8 attributes
  • Transactions from 01/12/2010 to 09/12/2011
  • Multi-country e-commerce sales data

Data Dictionary

Column Description Type
InvoiceNo Invoice number (6-digit unique) Nominal
StockCode Product code (5-digit unique) Nominal
Description Product name Nominal
Quantity Quantity per transaction Numeric
InvoiceDate Invoice date and time Numeric
UnitPrice Unit price in sterling Numeric
CustomerID Customer number (5-digit unique) Nominal
Country Country of customer Nominal

Project Structure

customer-segmentation/
│
├── data/                      # Data files
│   ├── raw/                  # Raw data
│   └── processed/            # Processed data
│
├── notebooks/                # Jupyter notebooks
│   ├── 01_data_preparation.ipynb
│   ├── 02_exploratory_analysis.ipynb
│   └── 03_segmentation_analysis.ipynb
│
├── src/                      # Source code
│   ├── __init__.py
│   ├── data_processing.py
│   ├── feature_engineering.py
│   └── visualization.py
│
├── tests/                    # Unit tests
│   └── test_data_processing.py
│
├── requirements.txt          # Project dependencies
├── setup.py                  # Package setup file
└── README.md                # Project documentation

Installation

  1. Clone the repository:
git clone https://github.com/yourusername/customer-segmentation.git
cd customer-segmentation
  1. Create a virtual environment:
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt

Usage

  1. Data Preparation:
python src/data_processing.py
  1. Run Analysis:
python src/feature_engineering.py
  1. Or use Jupyter notebooks:
jupyter notebook notebooks/01_data_preparation.ipynb

Features

  • RFM (Recency, Frequency, Monetary) Analysis
  • K-means Clustering
  • SQL-based Customer Analytics
  • Interactive Visualizations
  • Statistical Analysis

Results

The analysis identifies distinct customer segments based on:

  • Purchase behavior
  • Transaction frequency
  • Monetary value
  • Product preferences
  • Geographic distribution

Detailed results and visualizations are available in the notebooks.

Contributing

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • UCI Machine Learning Repository for the dataset
  • scikit-learn documentation and community
  • Python data science community

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors