This project performs customer segmentation analysis using the Online Retail Dataset from the UCI Machine Learning Repository. The analysis combines SQL operations, K-means clustering, and statistical analysis to derive meaningful customer segments and insights.
We use the Online Retail Dataset from UCI Machine Learning Repository. This dataset contains:
- 541,909 transactions
- 8 attributes
- Transactions from 01/12/2010 to 09/12/2011
- Multi-country e-commerce sales data
| Column | Description | Type |
|---|---|---|
| InvoiceNo | Invoice number (6-digit unique) | Nominal |
| StockCode | Product code (5-digit unique) | Nominal |
| Description | Product name | Nominal |
| Quantity | Quantity per transaction | Numeric |
| InvoiceDate | Invoice date and time | Numeric |
| UnitPrice | Unit price in sterling | Numeric |
| CustomerID | Customer number (5-digit unique) | Nominal |
| Country | Country of customer | Nominal |
customer-segmentation/
│
├── data/ # Data files
│ ├── raw/ # Raw data
│ └── processed/ # Processed data
│
├── notebooks/ # Jupyter notebooks
│ ├── 01_data_preparation.ipynb
│ ├── 02_exploratory_analysis.ipynb
│ └── 03_segmentation_analysis.ipynb
│
├── src/ # Source code
│ ├── __init__.py
│ ├── data_processing.py
│ ├── feature_engineering.py
│ └── visualization.py
│
├── tests/ # Unit tests
│ └── test_data_processing.py
│
├── requirements.txt # Project dependencies
├── setup.py # Package setup file
└── README.md # Project documentation
- Clone the repository:
git clone https://github.com/yourusername/customer-segmentation.git
cd customer-segmentation- Create a virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txt- Data Preparation:
python src/data_processing.py- Run Analysis:
python src/feature_engineering.py- Or use Jupyter notebooks:
jupyter notebook notebooks/01_data_preparation.ipynb- RFM (Recency, Frequency, Monetary) Analysis
- K-means Clustering
- SQL-based Customer Analytics
- Interactive Visualizations
- Statistical Analysis
The analysis identifies distinct customer segments based on:
- Purchase behavior
- Transaction frequency
- Monetary value
- Product preferences
- Geographic distribution
Detailed results and visualizations are available in the notebooks.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- UCI Machine Learning Repository for the dataset
- scikit-learn documentation and community
- Python data science community