This project analyzes an online retail dataset to understand customer purchasing behavior, product performance, and revenue patterns.
The goal is to demonstrate how SQL and Python can be combined to generate actionable business insights from transactional data.
The analysis includes:
- Data cleaning and preprocessing
- SQL-based revenue analysis
- Customer segmentation using RFM (Recency, Frequency, Monetary)
- Business insights and strategic recommendations
This type of analysis is commonly used in retail analytics, business intelligence, and data analyst roles.
This project aims to answer the following questions:
- Which countries generate the most revenue?
- Which products drive the highest sales?
- How does revenue change over time?
- Which customers are the most valuable?
- How can businesses segment customers for targeted marketing?
Online Retail Dataset
Source:
UCI Machine Learning Repository / Kaggle
The dataset contains transactional data from a UK-based online retailer including:
- Invoice numbers
- Product descriptions
- Quantities purchased
- Unit prices
- Transaction timestamps
- Customer IDs
- Country of purchase
retail_analytics_sql_python/
│
├── data/
│ ├── raw/
│ │ └── online_retail.xlsx
│ └── processed/
│ ├── retail_cleaned.csv
│ └── customer_rfm_segments.csv
│
├── notebooks/
│ ├── 01_data_cleaning.ipynb
│ ├── 02_sql_analysis.ipynb
│ ├── 03_customer_segmentation.ipynb
│ └── 04_business_insights.ipynb
│
├── sql/
│ └── queries.sql
│
├── requirements.txt
├── .gitignore
└── README.md
The dataset required preprocessing to remove invalid records and prepare it for analysis.
Steps included:
- Removing cancelled transactions
- Removing negative quantities and prices
- Handling missing customer IDs
- Creating a revenue feature
- Converting transaction timestamps
SQL queries were used to analyze:
- Revenue by country
- Top selling products
- Monthly revenue trends
- Customer purchasing frequency
This demonstrates how SQL can be integrated with Python workflows.
Customers were segmented using the RFM model:
| Metric | Meaning |
|---|---|
| Recency | Days since last purchase |
| Frequency | Number of purchases |
| Monetary | Total spending |
This allows businesses to identify:
- High-value customers
- Loyal customers
- Customers at risk of churn
The United Kingdom dominates total revenue, suggesting the business is heavily dependent on a single geographic market.
A small number of products generate a large share of total revenue.
Top products include:
- PAPER CRAFT, LITTLE BIRDIE
- REGENCY CAKESTAND 3 TIER
- WHITE HANGING HEART T-LIGHT HOLDER
This suggests strong demand for decorative and gift-related products.
Revenue increases significantly between September and November, indicating strong seasonal demand related to holiday shopping.
Customer segmentation shows that:
- Most customers fall into the Regular Customers group
- A smaller group of Best Customers contribute disproportionately to revenue
These customers represent key opportunities for targeted marketing and loyalty programs.
Based on the analysis:
- Customer Retention
Implement loyalty programs to reward high-value customers.
- Targeted Marketing
Re-engage customers who have not purchased recently.
- Product Strategy
Increase stock and promotion of top-performing products.
- Market Expansion
Explore opportunities to expand sales beyond the UK market to reduce geographic dependency.
This dashboard highlights:
- Revenue trend over time
- Revenue by country
- Top performing products
- Customer segment distribution
- Business-focused KPI summary
Click below to view the interactive dashboard:
Python
Pandas
SQL (SQLite)
Matplotlib
Seaborn
Jupyter Notebook
- Data Cleaning and Preprocessing
- SQL Analytics
- Exploratory Data Analysis
- Customer Segmentation (RFM)
- Business Insight Generation
- Data Visualization
Nathaniel Magit