Skip to content

This project applies **RFM (Recency -- Frequency -- Monetary)** analysis to segment customers of SuperStore.\ The purpose is to identify valuable customer groups and provide actionable insights for targeted marketing and retention campaigns.

Notifications You must be signed in to change notification settings

havanlocbk/Customer-Segmentation-with-RFM-Analysis---SuperStore---Retail-by-using-Python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 

Repository files navigation

πŸ›’ Customer Segmentation with RFM Analysis - SuperStore - Retail by using Python

image

Author: Loc Ha
Date: 2025-04-25

πŸ›  Tools Used

Python Pandas Matplotlib Seaborn


πŸ“‘ Table of Contents

I. πŸ“Œ Background & Overview
II. πŸ“‚ Dataset Description & Data Structure
III. βš’οΈ Main Process
IV. πŸ“Š Key Insights & Recommendations


πŸ“Œ Background & Overview

🏒 Business Context

SuperStore is a fictional retail company based in the United States, specializing in all-occasion gifts.
During the Christmas and New Year season, the Marketing team plans to launch loyalty and promotional campaigns. Manual customer segmentation with Excel is no longer feasible at scale.

🎯 Objective

This project applies RFM (Recency – Frequency – Monetary) analysis to segment customers of SuperStore, aiming to identify valuable customer groups and provide actionable insights for targeted marketing and retention campaigns.

❓ Business Questions:

  • Which customer groups are most valuable to retain?
  • What segments show high churn risk?
  • How can the company increase purchase frequency and customer lifetime value?

πŸ“‚ Dataset Description & Data Structure

πŸ—‚ Dataset Description

  • Source: Online Retail Transaction Dataset (UK-based non-store online retail)
  • Period: 01/12/2010 – 09/12/2011
  • Size: ~500K transactions
  • Format: .csv

🧩 Data Structure

πŸ“Š Table Schema (Simplified)

Column Name Description Data Type
InvoiceNo Transaction ID. If starts with 'C' β†’ cancel String/Int
StockCode Product code, unique String/Int
Description Product name String
Quantity Quantity per transaction Int
InvoiceDate Date and time of transaction DateTime
UnitPrice Price per unit (GBP) Float
CustomerID Unique customer ID Int

βš’οΈ Main Process

πŸ”„ Main Process

  1. Data Preparation: Handle nulls, clean invalid records, convert types.
πŸ“Œ View Python code for Data Preparation
# EXPLORE DATA
print(main_data.info())
print(main_data.describe())
print(main_data.isnull().sum())

# CORRECT DATA TYPES
main_data['CustomerID'] = main_data['CustomerID'].astype('Int64')
print(main_data['CustomerID'].dtype)
main_data['InvoiceDate'] = pd.to_datetime(main_data['InvoiceDate'], errors='coerce')

# HANDLE NULLS
main_data = main_data[main_data['CustomerID'].notnull()]

# FILTER DATA
main_data = main_data[(main_data['Quantity'] > 0) & (main_data['UnitPrice'] > 0)]
main_data = main_data[
    main_data['StockCode'].str.match(r'^[A-Za-z0-9]{4,7}$') &
    main_data['Description'].str.contains(
        'POSTAGE|Bank Charges|CARRIAGE|Next Day Carriage|Manual',
        case=False, na=False
    )
]
  1. Feature Engineering: Create TotalPrice = Quantity Γ— UnitPrice.
main_data['TotalPrice'] = main_data['Quantity'] * main_data['UnitPrice']
  1. RFM Calculation: Compute Recency, Frequency, Monetary per customer.
πŸ“Œ View Python code for RFM Calculation
# Reference date for recency
snapshot_date = pd.to_datetime('2011-12-31')

# Recency (days since last purchase)
recency = main_data.groupby('CustomerID')['InvoiceDate'].max().reset_index()
recency['Recency'] = (snapshot_date - recency['InvoiceDate']).dt.days

# Frequency (unique invoices)
frequency = main_data.groupby('CustomerID')['InvoiceNo'].nunique().reset_index()
frequency.columns = ['CustomerID', 'Frequency']

# Monetary (total spend)
monetary = main_data.groupby('CustomerID')['TotalPrice'].sum().reset_index()
monetary.columns = ['CustomerID', 'Monetary']

# Combine
rfm = recency[['CustomerID', 'Recency']].merge(frequency, on='CustomerID').merge(monetary, on='CustomerID')
print(rfm.head())
  1. Scoring: Assign quintiles (1–5) for R, F, M and build RFM Score.
rfm['R_Score'] = pd.qcut(rfm['Recency'], 5, labels=[5,4,3,2,1])             # lower recency β†’ higher score
rfm['F_Score'] = pd.qcut(rfm['Frequency'].rank(method='first'), 5, labels=[1,2,3,4,5])
rfm['M_Score'] = pd.qcut(rfm['Monetary'], 5, labels=[1,2,3,4,5])
rfm['RFM_Score'] = rfm['R_Score'].astype(str) + rfm['F_Score'].astype(str) + rfm['M_Score'].astype(str)
  1. Segmentation: Map scores to segments (Champions, Loyal, At Risk, etc.).
πŸ“Œ View Python code for RFM Segmentation
segment_map = {
    'Champions': ['555','554','544','545','454','455','445'],
    'Loyal': ['543','444','435','355','354','345','344','335'],
    'Potential Loyalist': ['553','551','552','541','542','533','532','531','452','451','442','441','431','453','433','432','423','353','352','351','342','341','333','323'],
    'New Customers': ['512','511','422','421','412','411','311'],
    'Promising': ['525','524','523','522','521','515','514','513','425','424','413','414','415','315','314','313'],
    'Need Attention': ['535','534','443','434','343','334','325','324'],
    'About To Sleep': ['331','321','312','221','213','231','241','251'],
    'At Risk': ['255','254','245','244','253','252','243','242','235','234','225','224','153','152','145','143','142','135','134','133','125','124'],
    'Cannot Lose Them': ['155','154','144','214','215','115','114','113'],
    'Hibernating customers': ['332','322','233','232','223','222','132','123','122','212','211'],
}
def assign_segment(rfm_score):
    for segment, scores in segment_map.items():
        if rfm_score in scores:
            return segment
    return 'Lost customers'

rfm['Segment'] = rfm['RFM_Score'].apply(assign_segment)
print(rfm.head())
  1. Visualization: Histograms, treemap, bar charts, boxplots.
πŸ“Œ View Python code for Visualization
# Histograms
fig, ax = plt.subplots(figsize=(12, 3)); sns.histplot(rfm['Recency'], kde=True); ax.set_title('Distribution of Recency'); plt.show()
fig, ax = plt.subplots(figsize=(12, 3)); sns.histplot(rfm['Frequency'], kde=True); ax.set_title('Distribution of Frequency'); plt.show()
fig, ax = plt.subplots(figsize=(12, 3)); sns.histplot(rfm['Monetary'], kde=True); ax.set_title('Distribution of Monetary'); plt.show()

# Treemap of segment share
temp_rfm = rfm.groupby('Segment')['CustomerID'].count().reset_index()
temp_rfm.columns = ['Segment','Cust_count']
temp_rfm['Count_share'] = temp_rfm['Cust_count'] / temp_rfm['Cust_count'].sum()

import squarify
colors = ['#FF0000','#00FFFF','#FFFFAA','#A52A2A','#800080','#00FF00','#808000','#FF0CB3','#FFA500','#FF00FF','#736F6E']
fig, ax = plt.subplots(1, figsize=(15,8))
squarify.plot(
    sizes=temp_rfm['Cust_count'],
    label=temp_rfm['Segment'],
    value=[f'{x*100:.2f}%' for x in temp_rfm['Count_share']],
    alpha=.8, color=colors,
    bar_kwargs=dict(linewidth=1.5, edgecolor="White")
)
plt.title('RFM Segments of Customer Count', fontsize=16); plt.axis('off'); plt.show()

# Countplot by segment
fig, ax = plt.subplots(figsize=(12, 6))
sns.countplot(data=rfm, x='Segment', order=rfm['Segment'].value_counts().index, palette='viridis')
plt.title('Number of Customers in Each Segment'); plt.xticks(rotation=45); plt.xlabel('Segment'); plt.ylabel('Number of Customers'); plt.show()

# Boxplots R/F/M by segment
fig, axes = plt.subplots(3, 1, figsize=(15, 18))
sns.boxplot(x='Segment', y='Recency',   data=rfm.sort_values('Segment'), palette='viridis', ax=axes[0]); axes[0].set_title('Recency Distribution by Customer Segment');   axes[0].set_xlabel('Customer Segment'); axes[0].set_ylabel('Recency (Days)'); axes[0].tick_params(axis='x', rotation=45)
sns.boxplot(x='Segment', y='Frequency', data=rfm.sort_values('Segment'), palette='viridis', ax=axes[1]); axes[1].set_title('Frequency Distribution by Customer Segment'); axes[1].set_xlabel('Customer Segment'); axes[1].set_ylabel('Frequency (Number of Orders)'); axes[1].tick_params(axis='x', rotation=45)
sns.boxplot(x='Segment', y='Monetary',  data=rfm.sort_values('Segment'), palette='viridis', ax=axes[2]); axes[2].set_title('Monetary Distribution by Customer Segment');  axes[2].set_xlabel('Customer Segment'); axes[2].set_ylabel('Monetary (Total Price)'); axes[2].tick_params(axis='x', rotation=45)
plt.tight_layout(); plt.show()

πŸ“ˆ Key insight from each chart

Distribution of Recency

image
  • Most customers made a purchase within the last 120 days (~4 months), and the number of customers steadily decreases as the recency period increases.
  • This is a positive signal indicating that the majority of customers are still active and engaged.
  • πŸ’‘Recommendation: Focus customer appreciation campaigns on those with low Recency to strengthen loyalty and encourage repeat purchases.

Distribution of Frequency

image
  • During the dataset period, purchase frequency is concentrated at 0–5 times, with very few customers buying more than 5 times and only a handful exceeding 10 times.
  • This indicates that most customers are low-frequency buyers, and repeat purchases are relatively rare.
  • πŸ’‘ Recommendation: Encourage customers to increase purchase frequency through loyalty programs such as points accumulation, free shipping for low-value orders, or vouchers for subsequent purchases.

Distribution of Monetary

image
  • Most customers have a Customer Lifetime Value (CLV) below $1000, with the majority concentrated under $500.

  • Beyond $1000, the number of customers drops sharply, and very few exceed higher spending thresholds.

  • πŸ’‘ Recommendations:

    • $1–$500: Build shopping habits through discount vouchers for next orders or welcome combos for new customers.
    • $500–$1000: Encourage higher spending with tiered loyalty levels (Silver, Gold, Platinum) and offer bonus points with expiration dates to drive usage.
    • >$1000: Promote premium products with exclusive perks and focus on retaining these customers, nurturing them toward VIP level >$4000.
    • >$4000 (VIPs): Provide VIP/black cards, exclusive personal benefits (dedicated account manager, personalized gifts on birthdays/holidays).
    • One $8000 CLV customer: Requires special care and attention with personalized services and unique rewards.

Customer Segments Analysis (Recency – Frequency – Monetary)

image
Segment Current Status (R/F/M) πŸ’‘ Recommendation
Champions Recency <50 days, high Frequency & Monetary (some $1000–3000+) VIP care: loyalty cards, exclusive offers, or personal services.
Loyal Recency ~100 days, medium Frequency & Monetary Maintain benefits, nurture to increase CLV.
Potential Loyalists Recency ~50–100 days, medium Frequency & Monetary, some with high intent Reward consistent purchases with points, aim to convert to Loyal.
Promising Recency ~50–100 days, moderate Frequency & Monetary Upselling with combos and loyalty points to boost CLV.
New Customers Recency ~50–100 days, first-time buyers, small spend Onboarding guides, bundles, discount voucher for next purchase.
Need Attention Recency ~100 days, lower engagement than Loyal Special offers and programs to convert them into Loyal customers.
About To Sleep Avg. Recency ~100 days, very low Frequency & Monetary Send reminder emails with discounts to reactivate purchases.
At Risk Recency ~150–250 days, few transactions, low spend Free gifts with repurchase within 30 days; recommend products from past history.
Hibernating Recency ~100–350 days, declining Frequency & Monetary Reactivation campaigns with strong vouchers and personalized suggestions.
Cannot Lose Them Recency ~200–300 days, low Frequency & low Monetary Provide vouchers/free gifts for purchases within 30 days to win them back.
Lost Customers Recency ~250–350 days, very low Frequency & Monetary Final attempt: 25–50% discount or free products to bring them back.

πŸ“Š Company Status & Strategic Priorities

image image
Segment Insights % of Customers Approx. Count πŸ’‘ Strategic Recommendation
Champions – Strong loyalty, high frequency & spend 19.7% ~65 🟒 Priority 1: Maintain & grow with VIP care, exclusive offers, loyalty programs
Hibernating – Once active, now disengaged 11.0% ~36 🟑 Priority 2: Reactivation campaigns with 20–30% vouchers, personalized offers
Lost Customers – Long inactivity, low frequency & spend 9.3% ~31 🟑 Priority 2: Attempt win-back with strong discounts or free product trials
Potential Loyalists – Decent spend, medium frequency, growth opportunity 11.4% ~38 🟑 Priority 3: Encourage repeat purchases with points & loyalty tiers
Promising – Recently active, moderate spend 5.1% ~17 🟑 Priority 3: Upselling with bundles, reward points to increase CLV
About To Sleep – Avg. recency ~100 days, very low activity 10.5% ~35 πŸ”΄ Priority 4: Send reminders & discounts to prevent churn
At Risk – High recency (150–250 days), low activity 10.1% ~34 πŸ”΄ Priority 4: Free gifts or special offers within 30 days to reactivate
New Customers – First-time buyers, low spend 5.1% ~17 🟑 Priority 5: Onboarding guides, discounts to build habits
Loyal – Stable customers, medium frequency & spend 8.1% ~27 🟒 Ongoing: Maintain benefits, upsell premium products
Cannot Lose Them – Very few but critical customers 3.9% ~13 πŸ”΄ Immediate care: Exclusive vouchers/gifts to avoid churn

πŸš€ Strategic Priority Order

  1. 🟒 Maintain & expand Champions
  2. 🟑 Reactivate Hibernating & Lost Customers
  3. 🟑 Develop Potential Loyalists & Promising
  4. πŸ”΄ Prevent churn in About To Sleep & At Risk
  5. 🟑 Support New Customers to form purchase habits

πŸ“Š Key Insights & Recommendations

πŸ’‘ Key Insights

  • Recency: Most customers purchased recently (<120 days), but ~30% fall into At Risk, About To Sleep, Hibernating, Lost.
  • Frequency: Majority purchase ≀5 times; very few are frequent buyers.
  • Monetary: CLV mostly < Β£500, only a handful > Β£1000.
  • Segments: Champions (19.7%) and Loyal customers are strong, but churn-prone groups are sizable.

πŸ“ Recommendations

  1. Retain Champions: Exclusive rewards and personal care.
  2. Reactivation Campaigns: Discounts for Hibernating & Lost.
  3. Grow Potential Loyalists: Incentives for repeat purchases.
  4. Address At Risk: Reminder emails, promotions, gifts.
  5. Nurture New Customers: Onboarding and vouchers for retention.

About

This project applies **RFM (Recency -- Frequency -- Monetary)** analysis to segment customers of SuperStore.\ The purpose is to identify valuable customer groups and provide actionable insights for targeted marketing and retention campaigns.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published