A comprehensive comparison of computational efficiency and predictive performance for credit card fraud detection.
This project benchmarks two popular machine learning algorithms—Random Forest and XGBoost—on the challenging task of credit card fraud detection. The dataset is highly imbalanced (~0.17% fraud), making it an excellent test case for evaluating model performance beyond simple accuracy.
- Comprehensive Benchmarking: Training time, inference speed, memory usage, and model size
- Fraud-Focused Metrics: AUPRC, F1-Score, Recall, Precision (not just accuracy!)
- Visual Comparisons: ROC curves, Precision-Recall curves, confusion matrices
- Imbalance Handling: Proper use of
class_weightandscale_pos_weight - GPU Support: Automatic GPU detection for XGBoost acceleration
git clone https://github.com/cauegrassi7/fraud-detection-benchmark.git
cd fraud-detection-benchmarkpip install -r requirements.txtDownload the Credit Card Fraud Detection dataset from Kaggle:
👉 https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud
Place the creditcard.csv file in the data/ directory:
fraud-detection-benchmark/
└── data/
└── creditcard.csv
python main.pyOr with a custom data path:
python main.py --data-path /path/to/your/creditcard.csv╔══════════════════════════════════════════════════════════════════╗
║ FRAUD DETECTION BENCHMARK: Random Forest vs XGBoost ║
╚══════════════════════════════════════════════════════════════════╝
┌──────────────────────────────────────────────────────────────────┐
│ PREDICTIVE PERFORMANCE (Higher is Better) │
├────────────────────┬────────────────────┬────────────────────────┤
│ Metric │ Random Forest │ XGBoost │
├────────────────────┼────────────────────┼────────────────────────┤
│ AUPRC │ 0.8523 │ 0.8701 ★ │
│ F1-Score │ 0.8234 │ 0.8456 ★ │
│ Recall │ 0.7891 │ 0.8123 ★ │
│ Precision │ 0.8612 │ 0.8823 ★ │
└────────────────────┴────────────────────┴────────────────────────┘
┌──────────────────────────────────────────────────────────────────┐
│ COMPUTATIONAL EFFICIENCY (Lower is Better) │
├────────────────────┬────────────────────┬────────────────────────┤
│ Metric │ Random Forest │ XGBoost │
├────────────────────┼────────────────────┼────────────────────────┤
│ Training Time │ 45.23 s │ 12.87 s ★ │
│ Inference/1k │ 0.023 s │ 0.008 s ★ │
│ Peak Memory │ 1,234 MB │ 567 MB ★ │
│ Model Size │ 89.5 MB │ 12.3 MB ★ │
└────────────────────┴────────────────────┴────────────────────────┘
fraud-detection-benchmark/
├── src/
│ ├── __init__.py # Package initialization
│ ├── data_processing.py # Data loading, scaling, splitting
│ ├── models.py # Model factory functions
│ ├── benchmark.py # Timing and memory measurement
│ └── visualization.py # Plot generation
├── data/
│ └── creditcard.csv # Dataset (user provides)
├── outputs/ # Generated plots and saved models
├── main.py # Main orchestrator script
├── requirements.txt # Python dependencies
└── README.md
| Property | Value |
|---|---|
| Total Transactions | 284,807 |
| Fraudulent | 492 (0.17%) |
| Normal | 284,315 (99.83%) |
| Features | 30 (V1-V28 PCA, Time, Amount) |
- StandardScaler applied to
AmountandTimeonly - V1-V28 already normalized via PCA transformation
- Stratified split (80/20) to maintain class distribution
Random Forest:
RandomForestClassifier(
n_estimators=100,
n_jobs=-1, # All CPU cores
class_weight='balanced', # Auto-balance classes
random_state=42
)XGBoost:
XGBClassifier(
n_estimators=100,
scale_pos_weight=577, # n_negative / n_positive
device='cuda', # GPU if available
eval_metric='aucpr',
random_state=42
)| Metric | Why It Matters for Fraud Detection |
|---|---|
| AUPRC | Most informative for imbalanced data; focuses on minority class |
| Recall | Catching fraud is critical—we want to minimize false negatives |
| Precision | High precision reduces false positives (annoying legitimate users) |
| F1-Score | Harmonic mean balancing precision and recall |
⚠️ Accuracy is misleading for this dataset! A model predicting "Normal" for everything would achieve 99.83% accuracy but catch zero fraud.
After running the benchmark, check the outputs/ folder for:
| File | Description |
|---|---|
roc_curves.png |
ROC curves with AUC scores |
pr_curves.png |
Precision-Recall curves with AUPRC |
time_comparison.png |
Training and inference time barplots |
memory_comparison.png |
Peak memory and model size barplots |
confusion_matrices.png |
Side-by-side confusion matrices |
metrics_comparison.png |
Grouped barplot of all metrics |
- Python 3.10+
- pandas >= 2.0.0
- numpy >= 1.24.0
- scikit-learn >= 1.3.0
- xgboost >= 2.0.0
- matplotlib >= 3.7.0
- seaborn >= 0.12.0
- joblib >= 1.3.0
This project is open source and available under the MIT License.
Contributions are welcome! Feel free to open issues or submit pull requests.