📄 Download Full Project Report (PDF)
- Project Overview
- Business Context and Problem Statement
- Data Overview
- Project Structure
- Tools and Technologies
- Methods
- Key Insights
- Results and Conclusion
- Dashboard Preview
- How to Run the Project
- Author
An end-to-end digital collections analytics project for DebtStream, a B2B SaaS fintech platform that moves customer repayment journeys into a digital self-service flow. The project analyses 10,000 customer accounts uploaded between June 2022 and May 2023 across the full analytical pipeline: data engineering and analytical view construction in BigQuery SQL, exploratory data analysis and statistical testing in Python, and a compact three-page interactive stakeholder dashboard built in Power BI.
DebtStream helps lenders, utility providers, telecoms companies, debt purchasers, debt collection agencies and legal collections firms move customers into a digital self-service repayment journey. The platform's commercial value depends not on delivering reminders alone, but on whether customers open the journey, verify their identity, view their balance, select a repayment option, make a payment, and sustain a repayment plan over time.
This project analyses the full collections journey to answer:
- Where do customers drop off in the digital collections funnel, and which stage creates the biggest customer loss?
- Which primary communication channels drive customers into the self-service journey most effectively?
- Does supportive SMS repayment messaging outperform a direct payment reminder, and for which customer segments?
- Which balance bands and debt age bands recover better or worse through the digital collections process?
- Does completing the affordability assessment improve repayment plan sustainability and reduce broken-plan rates?
- Which customer segments combine weak recovery rates with high outstanding balance exposure?
- What should DebtStream prioritise to improve payment conversion, plan sustainability, and overall portfolio recovery?
- 6 raw source tables:
clients,customers,communication_events,journey_events,payments,repayment_plans - 7 analytical views built in BigQuery:
vw_customer_summary,vw_ab_test_customer_outcomes,vw_funnel_summary,vw_channel_performance,vw_ab_test_summary,vw_recovery_summary,vw_plan_performance - 3 views imported into Power BI as the primary reporting layer
- Analysis period: June 2022 to May 2023
- Scale: 10,000 customers · 19,244 communication events · 50,699 journey events · 2,362 payments · 743 repayment plans · 1,936 A/B test customers
DebtStream_Digital_Collections_Project/
│
├── README.md
├── .gitignore
│
├── assets/
│ ├── banner.svg # Project banner
│ ├── Overview & Funnel Performance.png # Power BI Page 1 screenshot
│ ├── SMS AB Test.png # Power BI Page 2 screenshot
│ └── Risk & Recovery.png # Power BI Page 3 screenshot
│
├── data/
│ └── raw/
│ ├── clients.csv
│ ├── customers.csv
│ ├── communication_events.csv
│ ├── journey_events.csv
│ ├── payments.csv
│ └── repayment_plans.csv
│
├── docs/
│ ├── DebtStream Analysis Report.pdf # Full project analytics report
│ ├── debtstream_data_dictionary # Field definitions and data types
│ └── debtstream_validation_summary # SQL validation results summary
│
├── notebooks/
│ ├── 01_Data_Validation_and_EDA.ipynb # Data validation, EDA, funnel and channel analysis
│ ├── 02_SMS_AB_Testing_Analysis.ipynb # A/B test comparison and statistical significance testing
│ └── 03_Repayment_and_Recovery_Insights.ipynb # Recovery analysis, segmentation and affordability impact
│
├── outputs/
│ ├── charts/ # Exported Python chart PNGs
│ └── summary_tables/ # Exported SQL summary CSVs
│
├── powerbi/
│ └── DebtStream_Analytics.pbix # Interactive three-page Power BI dashboard
│
└── sql/
├── 01_Setup_and_Base_Tables.sql # Table validation, row counts, integrity checks
├── 02_Funnel_and_Channel_Analysis.sql # Funnel, drop-off, channel and monthly trend
├── 03_SMS_AB_Testing.sql # A/B test population, outcomes and lift summary
├── 04_Repayment_and_Recovery.sql # Recovery by segment, plan sustainability, affordability
└── 05_PowerBI_Views.sql # All 7 BigQuery reporting views
| Tool | Purpose |
|---|---|
| BigQuery SQL | Raw table ingestion and validation, funnel analysis, SMS A/B test outputs, recovery analysis and Power BI reporting views |
| Python (Pandas, Matplotlib, Seaborn) | Data validation, EDA, statistical significance testing (z-tests and Welch t-test), recovery insights and chart exports |
| Power BI Desktop | Three-page interactive dashboard with DAX measures, dynamic SWITCH visuals and conditional formatting |
-
Customer-Level Funnel Analysis: digital collections journey measured across nine stages using distinct
customer_idcounts to prevent inflation from repeated communications or journey events. Stage-to-stage drop-off rates calculated at each transition to identify the highest-friction points in the customer journey. -
SMS A/B Test Analysis: two-group comparison of a direct payment reminder (Control, 929 customers) against a supportive repayment-options message (Variant, 1,007 customers). Analysis conducted at customer level to avoid double-counting. Two-proportion z-tests applied to all five binary outcome metrics and a Welch t-test used for average amount collected to confirm statistical significance.
-
Recovery and Segment Analysis: recovery rate calculated at individual customer level as successful amount collected divided by outstanding balance. Segmented by balance band, debt age band, client type and portfolio type. A balance band by debt age band recovery matrix produced to identify highest-risk segment combinations.
-
Affordability Impact Analysis: repayment plan quality compared between customers who completed and did not complete the affordability assessment, measuring first instalment paid rate and broken-plan rate as the two primary plan sustainability indicators.
-
Power BI Data Modelling: three BigQuery views imported as Fact Customer (10,000 rows), Fact AB Test (1,936 rows) and Monthly Funnel Trend (4,360 rows). Three disconnected DAX DATATABLE helper tables created for dynamic funnel and A/B test visuals via SWITCH measures. 60 DAX measures built across seven groups and validated against SQL benchmark outputs.
-
62.19% of customers who received a digital invite never clicked through to the journey. The Invite Delivered to Link Opened transition is the single largest drop-off in the entire funnel, representing a far greater opportunity than optimising any later stage.
-
SMS drives the highest digital engagement at scale with a 41.11% journey start rate, ahead of Email (35.14%) and Letter (26.70%). SMS also leads on payment conversion rate, making it the strongest end-to-end digital collections channel.
-
Variant (supportive) SMS outperformed Control (direct reminder) on four of five outcome metrics. Plan setup rate: 10.53% vs 6.14% (+4.39pp). Average amount collected: £36.35 vs £29.95 (+£6.40 per customer). Statistically significant on plan setup and first instalment paid rate (p < 0.05).
-
One-off payment rate was the only metric where Control outperformed Variant (6.78% vs 4.77%), confirming that supportive messaging deliberately shifts customers toward structured repayment plans rather than single payments. This is a commercially superior outcome for long-term recovery.
-
Overall recovery rate is 3.31% on £9,996,602 of outstanding balance. Recovery declines sharply by balance band: Low (10.57%), Medium (5.00%), High (3.22%), Very High (0.89%). Very High balance accounts carry £3,192,271 of outstanding debt and a 72.22% broken-plan rate, the highest combined commercial risk in the portfolio.
-
Recovery declines 4x as debt ages. Accounts aged 0-30 days recover at 6.10%, while accounts older than 365 days recover at just 1.48%. Early intervention is the single most effective lever for improving portfolio-level recovery.
-
Telecoms clients recover at 6.00%, Legal Collections at 0.84%. Client type has a material impact on recovery outcomes. A single portfolio-wide benchmark masks significant performance differences across client types.
-
Only 2.03% of customers completed the affordability assessment, yet those who did paid their first instalment at 86.21% vs 75.19% and broke their plans at 31.53% vs 43.89%. The vast majority of the available plan sustainability benefit is currently unrealised.
The project confirms that DebtStream's digital collections performance is structurally constrained by a single dominant problem: the transition from delivered invite to digital journey start. Fixing this one step, through better message content, timing and channel sequencing, creates more opportunity than any other intervention available.
The SMS A/B test provides statistically validated evidence that supportive repayment-options messaging outperforms the direct payment reminder for plan setup and long-term recovery quality. This is not a marginal finding. Variant customers generated £6.40 more per customer on average with a plan setup rate nearly double that of Control. The recommended approach is a segmented messaging strategy: supportive messaging for Medium and higher-balance customers who need flexibility, and direct reminders retained for customers with higher immediate payment readiness.
Recovery analysis confirms that balance band and debt age must be treated as the two core segmentation variables for collections strategy. Low-balance, newly uploaded accounts are well-suited to direct payment journeys. Medium-balance accounts respond strongly to plan setup nudges. High and Very High balance accounts require affordability-led support, closer plan monitoring, and likely specialist intervention given their 48-72% broken-plan rates.
Affordability completion stands out as the most underutilised lever in the dataset. An 11pp improvement in first instalment paid rate and a 12pp reduction in broken-plan rate from completing a single assessment step represents substantial value currently left unrealised for 97.97% of customers.
Page 1: Overview and Funnel Performance
Page 2: SMS A/B Test and Messaging Strategy
Page 3: Segment Recovery and Risk
- Create a BigQuery project and dataset (e.g.
debtstream-analytics-project.debtstream_analytics) - Upload the six raw CSV files from
data/raw/as tables using the following names:clients,customers,communication_events,journey_events,payments,repayment_plans - Run SQL files in order from
sql/:01_Setup_and_Base_Tables.sqlvalidates ingestion, checks row counts, duplicates, date ranges and referential integrity02_Funnel_and_Channel_Analysis.sqlbuilds the customer-level funnel and channel performance analysis03_SMS_AB_Testing.sqlcreatesvw_ab_test_customer_outcomesand compares Control vs Variant outcomes04_Repayment_and_Recovery.sqlanalyses recovery performance by segment and plan sustainability05_PowerBI_Views.sqlcreates all 7 BigQuery reporting views
- Export summary outputs as CSVs to
outputs/summary_tables/as specified in each file's NEXT STEPS comment
- Clone the repository
- Install dependencies:
pip install pandas numpy matplotlib seaborn scipy google-cloud-bigquery jupyter - Open Jupyter Notebook or JupyterLab
- Run notebooks in order from
notebooks/:01_Data_Validation_and_EDA.ipynbcovers data validation, EDA, funnel visualisation and channel analysis02_SMS_AB_Testing_Analysis.ipynbcovers A/B test comparison and statistical significance testing03_Repayment_and_Recovery_Insights.ipynbcovers recovery analysis, segmentation and affordability impact
- Charts are saved automatically to
outputs/charts/
- Install Power BI Desktop (free from Microsoft)
- Open
powerbi/DebtStream_Analytics.pbixin Power BI Desktop - Sign in with the Google account that has access to the BigQuery project
- In Home, click Transform Data to open Power Query, then Close and Apply
- Click Refresh to reload all three tables from BigQuery
- All 60 DAX measures and three helper tables are pre-built and ready to use
Utkarsh Pandey Data Analyst


