An end-to-end fraud detection pipeline that turns raw transaction data into actionable risk intelligence — built with SQL and Power BI.
Financial institutions deal with millions of transactions daily, and manual fraud review doesn't scale. This project builds a rule-based risk scoring system on top of the BankSim synthetic dataset to:
- Flag suspicious transactions using behavioral signals
- Segment them into risk tiers (High / Medium / Low)
- Measure how well the model performs
- Quantify the financial value of catching fraud early
The end result is a 4-page interactive Power BI dashboard covering model performance, operational prioritization, and ROI.
| Executive Overview | Model Performance |
|---|---|
![]() |
![]() |
| Operational Queue | Financial Impact |
|---|---|
![]() |
![]() |
| Metric | Value |
|---|---|
| Total Transactions | 21,051 |
| Fraud Transactions | 226 |
| Fraud Rate | 1.1% |
| Estimated Fraud Loss | $137.84K |
| Fraud Capture Rate | 55.8% |
| Net Benefit (after review cost) | $135.57K |
| ROI | 5959% |
- Database: MySQL
- Feature Engineering & Scoring: SQL
- Visualization: Power BI
- Dataset: BankSim (synthetic transaction data)
CSV Dataset
↓
MySQL (banksim_raw)
↓
Feature Engineering
├── customer_velocity
├── customer_spend_baseline
└── category_risk
↓
Risk Scoring Table (transaction_risk_score)
↓
Power BI Dashboard
Each transaction gets scored across three dimensions:
Velocity Risk — how often is this customer transacting?
- 5+ tx/day → 30 pts
- 3+ tx/day → 20 pts
- Otherwise → 5 pts
Amount Deviation Risk — is this spend unusual for this customer?
- ≥ 2× their average → 40 pts
- ≥ 1.5× their average → 25 pts
- Otherwise → 5 pts
Category Risk — is this a high-fraud category?
- Fraud rate ≥ 10% → 30 pts
- Fraud rate ≥ 5% → 20 pts
- Otherwise → 5 pts
Final Score = velocity + amount + category
Risk Tiers:
- 🔴 High → Score ≥ 85
- 🟡 Medium → Score ≥ 55
- 🟢 Low → Everything else
Evaluated entirely in SQL:
- Fraud Capture Rate: 55.8%
- Precision (High Risk): 27.69%
- False Positive Rate: 72.3%
The High Risk tier accounts for a small share of total transactions but drives 86% of preventable fraud loss.
├── Images/
│ ├── Executive Overview.png
│ ├── Financial Impact.png
│ ├── Model Performance.png
│ └── Operational Queue.png
├── PowerBI/
├── SQL_script/
└── README.md
BankSim is a synthetic payments dataset generated to simulate realistic fraud patterns. No real customer data is used.



