Skip to content

soumimukherjee22/Customer-Behaviour-Analysis-Python-SQL-Tableau

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

21 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ›οΈ Customer Behaviour Analysis

End-to-End Data Analytics Portfolio Project β€” Python Β· SQL Β· Tableau

Python MySQL Tableau Pandas Status


πŸ“Œ Business Problem

A retail business operating across the United States had accumulated rich transactional data from 3,900 customers β€” but had never systematically analysed it. The result: marketing budgets spent without ROI, high-value customers not identified, and churning customers not detected until it was too late.

This project builds a complete, data-driven intelligence system to answer five critical business questions:

Pain Point The Problem
πŸ”΄ No Segmentation All 3,900 customers receive identical marketing regardless of loyalty or spend
πŸ”΄ Discount Inefficiency 32% of customers use promo codes β€” but does it actually lift revenue?
πŸ”΄ Silent Churn Low-frequency, disengaged customers leave silently, never flagged in advance
πŸ”΄ Missed Seasons Seasonal demand patterns exist, but inventory and promotions are never aligned
πŸ”΄ Operational Blindspots No benchmark for which shipping method or payment channel drives satisfaction

πŸ“ Repository Structure

customer-behaviour-analysis/
β”‚
β”œβ”€β”€ πŸ““ Customer_Behaviour_Analysis.ipynb   # Python EDA + statistical analysis + churn model
β”œβ”€β”€ πŸ—„οΈ  SQL_Queries.sql                    # 30+ SQL queries (Basic β†’ Advanced) + BI View
β”œβ”€β”€ πŸ“Š Customer_Behaviour_Analysis.twbx   # Tableau workbook (2 dashboards)
β”œβ”€β”€ πŸ“„ README.md                           # You are here
β”‚
└── πŸ“ assets/
    β”œβ”€β”€ dashboard_segmentation.png         # Customer Segmentation Dashboard screenshot
    └── dashboard_product.png              # Product & Category Performance screenshot

πŸ—‚οΈ Dataset

Source: Shopping Trends Dataset
Records: 3,900 customers Β· 19 features Β· US market (50 states)

Feature Group Columns
Demographics customer_id, age, age_group (derived), gender, location
Product item_purchased, category, size, color, season
Transaction purchase_amount_usd, review_rating, previous_purchases
Behaviour frequency_of_purchases, subscription_status, payment_method, shipping_type
Marketing discount_applied, promo_code_used, preferred_payment_method

Key Stats:

Total Revenue     β†’  $233,081
Total Customers   β†’  3,900
Avg Purchase      β†’  $59.76
Avg Rating        β†’  3.75 / 5.0

🐍 Python Analysis

File: Customer_Behaviour_Analysis.ipynb
Libraries: pandas Β· numpy Β· seaborn Β· matplotlib Β· plotly Β· scipy

What's Inside

1. Data Cleaning & EDA

  • Null value check, data type inspection, shape analysis
  • Purchase amount distribution (histogram + KDE)
  • Spending by category (boxplot), category popularity (countplot)
  • Average spend by season, gender distribution, location-based patterns

2. Outlier Detection

# IQR Method β€” identify anomalous high-spend customers
Q1 = df["Purchase Amount (USD)"].quantile(0.25)
Q3 = df["Purchase Amount (USD)"].quantile(0.75)
IQR = Q3 - Q1
outliers = df[df["Purchase Amount (USD)"] > Q3 + 1.5 * IQR]

3. Hypothesis Testing

# Welch's t-test: Does gender significantly affect spending?
_, p_value = stats.ttest_ind(male_spend, female_spend, equal_var=False)
# Result: p > 0.05 β†’ Gender does NOT significantly affect purchase amount

4. Correlation Analysis

# Age vs Purchase Amount β€” Pearson correlation
df[['Age', 'Purchase Amount (USD)']].corr()

5. Confidence Interval Estimation

# 95% CI for true population mean of purchase amount
z = stats.norm.ppf(0.975)
lower = mean - z * (std / np.sqrt(n))
upper = mean + z * (std / np.sqrt(n))

6. Churn Risk Scoring Model

def churn_score(freq, subscription):
    score = 0
    if freq in ['annually', 'every 3 months']:  score += 2
    elif freq == 'quarterly':                    score += 1
    if subscription == 'No':                     score += 2
    return score

df['churn_risk'] = df['churn_score'].apply(lambda x: 1 if x >= 3 else 0)

Customers scoring β‰₯ 3 are flagged as high churn risk for proactive re-engagement.


πŸ—„οΈ SQL Analysis

File: SQL_Queries.sql
Database: MySQL Β· customer_behaviour schema Β· shopping table

Query Tiers

🟒 Basic β€” Customer & Product Overview (Q1–Q20)

  • Total customers, revenue, average purchase amount
  • Sales breakdown by gender, age group, category, item, size, color, season
  • Payment method preferences, subscription vs non-subscription revenue
  • Discount and promo code usage counts

🟑 Intermediate β€” Revenue Intelligence & Marketing Effectiveness (Q1–Q20)

  • Average spend per customer, top 10 high-value customers
  • Revenue cross-tabbed by: location Γ— category, season Γ— category, gender Γ— category, age group Γ— category
  • Discount impact: total revenue, avg purchase, order count β€” with vs. without discount
  • Subscription status vs frequency of purchase cross-analysis
  • Shipping type vs average review rating (customer satisfaction benchmark)

πŸ”΄ Advanced β€” Segmentation & BI (Q1–Q5 + VIEW)

RFM-Style Customer Segmentation:

SELECT
    customer_id,
    CASE
        WHEN SUM(purchase_amount_usd) >= 300  THEN 'High_Value'
        WHEN SUM(purchase_amount_usd) >= 150  THEN 'Medium_Value'
        ELSE 'Low_Value'
    END AS monetary_segment,
    CASE
        WHEN MAX(previous_purchases) >= 15    THEN 'Highly_Engaged'
        WHEN MAX(previous_purchases) >= 5     THEN 'Moderately_Engaged'
        ELSE 'Low_Engagement'
    END AS engagement_segment,
    CASE
        WHEN frequency_of_purchases IN ('Weekly','Fortnightly') THEN 'High_Frequency'
        WHEN frequency_of_purchases = 'Monthly'                 THEN 'Medium_Frequency'
        ELSE 'Low_Frequency'
    END AS frequency_segment
FROM shopping
GROUP BY customer_id, frequency_of_purchases;

High-Value Loyal Customer Identification:

-- Spend β‰₯ $300 AND previous purchases β‰₯ 10 AND Weekly/Fortnightly frequency
SELECT customer_id, SUM(purchase_amount_usd) AS total_spent, ...
FROM shopping
GROUP BY customer_id, frequency_of_purchases
HAVING SUM(purchase_amount_usd) >= 300
   AND MAX(previous_purchases) >= 10
   AND frequency_of_purchases IN ('Weekly', 'Fortnightly');

Churn-Risk Customer Isolation:

-- Spend < $100 AND previous purchases < 3 AND Annual/Quarterly frequency
HAVING SUM(purchase_amount_usd) < 100
   AND MAX(previous_purchases) < 3
   AND frequency_of_purchases IN ('Annually','Quarterly');

Business Intelligence View (Tableau Data Source):

CREATE VIEW customer_behavior_intelligence AS
SELECT customer_id, gender, age, location, category, season,
       subscription_status, frequency_of_purchases,
       discount_applied, promo_code_used,
       SUM(purchase_amount_usd)  AS total_spent,
       AVG(purchase_amount_usd)  AS avg_spent,
       MAX(previous_purchases)   AS previous_purchases
FROM shopping
GROUP BY customer_id, gender, age, location, category, season,
         subscription_status, frequency_of_purchases,
         discount_applied, promo_code_used;

πŸ“Š Tableau Dashboards

File: Customer_Behaviour_Analysis.twbx

Dashboard 1 β€” Customer Segmentation

Customer Segmentation Dashboard

Visual Insight
Customer Segmentation Treemap Purchase frequency distribution across 6 tiers
Age-wise Purchase Rate Treemap Adults 37.97% Β· Young Adults 31.82% Β· Seniors 30.21%
Loyal Customers Table All repeat buyers (previous purchases = 50)
Top 10 High-Value Customers Ranked by composite score
High-Value Customer Bar Chart Previous purchases vs purchase amount (26 customers)

Filters: Location Β· Frequency of Purchases


Dashboard 2 β€” Product & Category Performance

Product & Category Dashboard

Visual Insight
Operations Heatmap Payment method Γ— Shipping type transaction counts
Category-Season Matrix Revenue cross-tab: Clothing leads at $104,264
Size Trend Bar Chart M (44,410) > L (27,071) > S (16,429) > XL (10,961)
Colour Preference Chart Top colours: Cyan (4,519), Silver (4,550), Gray (4,499)
Seasonal Product Demand Table Item-level demand split across Fall/Spring/Summer/Winter
Item-wise Performance Chart Previous purchases by individual product
Marketing Effectiveness Promo code impact: 32.14% Yes vs 67.86% No
High-level Category Bar Accessories $74,200 Β· Footwear $36,093 Β· Outerwear $18,524

Filters: Location Β· Category Β· Item Purchased Β· Season


πŸ“ˆ Key Findings

Customer Segmentation

  • Purchase frequency is evenly distributed across all 6 tiers (539–584 customers each) β€” no dominant buying pattern, signalling an opportunity to shift customers to higher-frequency tiers
  • Adults (35–54) drive 37.97% of purchases, the largest segment, followed closely by Young Adults (31.82%) and Seniors (30.21%)

Revenue & Products

  • Clothing dominates at $104,264 β€” 40% above Accessories ($74,200) and nearly 3Γ— Outerwear ($18,524)
  • Clothing revenue is stable across all four seasons, making it a reliable year-round inventory anchor
  • Medium (M) is the dominant size at 44,410 units β€” assortment should reflect this distribution

Marketing Effectiveness

  • Discount and promo code usage does not significantly increase average purchase value β€” tools attract existing buyers rather than driving incremental spend
  • Subscribed customers purchase more frequently, making subscription conversion the priority retention strategy

Statistical Results

Test Finding
Welch's t-test (Gender vs Spend) p > 0.05 β€” gender has no statistically significant impact on purchase amount
95% Confidence Interval True population mean of purchase amount falls within a narrow, stable range
IQR Outlier Detection No extreme outliers found β€” purchase distribution is clean

πŸ’‘ Recommendations

# Recommendation Rationale
01 Tiered Loyalty Programme Use RFM segments to create Gold/Silver/Bronze tiers; prioritise High Value + Highly Engaged customers
02 Convert Discount Users to Subscribers Redirect 32% promo-code budget toward subscription conversion β€” subscribed customers spend more consistently
03 Proactive Churn Intervention Run churn score model monthly; flag score β‰₯ 3 customers for personalised re-engagement within 30 days
04 Seasonal Stock Alignment Prioritise Clothing year-round, Outerwear in Fall/Winter; align promotional spend to Spring Accessories peaks
05 Senior Customer Strategy Seniors are 30% of purchases but under-targeted β€” design dedicated product bundles and communication
06 Operational Shipping Review Benchmark shipping type vs satisfaction ratings; route customers to highest-rated shipping options

πŸš€ How to Run

Python Notebook

# 1. Clone the repository
git clone https://github.com/soumimukherjee22/customer-behaviour-analysis.git
cd customer-behaviour-analysis

# 2. Install dependencies
pip install pandas numpy seaborn matplotlib plotly scipy jupyter

# 3. Launch the notebook
jupyter notebook Customer_Behaviour_Analysis.ipynb

SQL Setup

-- 1. Create the database
CREATE DATABASE customer_behaviour;
USE customer_behaviour;

-- 2. Import the dataset as the 'shopping' table
-- (Use MySQL Workbench import wizard or LOAD DATA INFILE)

-- 3. Run SQL_Queries.sql top-to-bottom
-- It will: clean columns β†’ add age_group β†’ run all 30+ queries β†’ create the BI view
SOURCE SQL_Queries.sql;

Tableau

1. Open Tableau Desktop (2023+ recommended)
2. Open Customer_Behaviour_Analysis.twbx
3. The workbook connects to embedded data β€” no additional setup needed
4. Use the tab bar to navigate between dashboards
5. Interact with Location, Category, and Season filters to explore the data

πŸ› οΈ Tech Stack

Tool Version Purpose
Python 3.10+ EDA, statistical analysis, churn modelling
pandas 2.0+ Data manipulation and aggregation
NumPy 1.24+ Numerical computation
seaborn / matplotlib Latest Static visualisations
Plotly 5.0+ Interactive charts
SciPy 1.10+ Hypothesis testing, statistical tests
MySQL 8.0 Database, querying, BI view creation
Tableau Desktop 2023+ Interactive dashboards

πŸ“‚ Deliverables Summary

File Description
Customer_Behaviour_Analysis.ipynb Full EDA, hypothesis testing, confidence interval, churn scoring
SQL_Queries.sql 30+ queries across 3 tiers + RFM segmentation + BI VIEW
Customer_Behaviour_Analysis.twbx 2 interactive Tableau dashboards
README.md This file

πŸ‘€ Author

Soumi Mukherjee
Data Analyst Β· Python Β· SQL Β· Tableau

LinkedIn GitHub email


πŸ“„ License

This project is licensed under the MIT License β€” see the LICENSE file for details.


Built with ❀️ as a portfolio project · Dataset: Shopping Trends (Synthetic Retail Data)

About

End-to-end customer behavior analytics project using Python, SQL, and Tableau to build a business intelligence system for customer segmentation, churn risk detection, marketing effectiveness analysis, and revenue optimization through interactive dashboards.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors