Fraud Detection and Risk Project

This project demonstrates a complete workflow for detecting and managing multiple types of fraud risk within a synthetic transaction dataset. It includes:

A synthetic dataset generator that creates a realistic collection of financial transactions with multiple correlated target variables.
An exploratory data analysis (EDA) script to understand the data distribution and relationships between features and targets.
A model training script that fits machine‑learning models to predict several risk indicators at once and logs experiments using MLflow.
A simple project structure that can be uploaded to GitHub or used as a starting point for more advanced fraud‑analytics work.

Project structure

fraud_detection_project/
├── .github/workflows
│   ├── cd.yml                    # automated containerization, infrastructure and deployment
│   └── ci.yml                    # automated unit and integration testing with linting and formatting checks
├── .venv
├── Dockerfile                    # everything you need for image creation
├── Makefile                      # simplify your command line
├── README.md                     # project overview and instructions
├── app
│   ├── __pycache__
│   └── main.py                   # FastAPI webapp  
├── data
│   ├── generate_data.py          # script to create the synthetic dataset
│   └── transactions.csv          # generated dataset (not committed by default)
├── lambda
│   ├── __pycache__
│   └── handler.py
├── lambda_function_payload.zip
├── mlflow.db
├── mlruns
├── monitoring
│   ├── __pycache__
│   └── monitor.py                # data and concept drift monitoring
├── notebooks
│   ├── eda.ipynb                 # exploratory data analysis script
│   └── unsupervised.ipynb        # dimensionality reduction techniques
├── orchestration
│   ├── __pycache__
│   └── flow.py                   # ml orchestration with prefect
├── plots                         # directory with plot output for eda and dimensionality reduction
├── pyproject.toml                # uv dependencies
├── requirements.txt    
├── src
│   ├── __pycache__
│   └── train.py                  # automated training file for fruad detection using XGboost and ensemble methods
├── terraform                     # infrastructure as code
├── tests                         # unit and integration tests
│   ├── __pycache__
│   ├── test_app.py
│   ├── test_integration.py
│   └── test_train.py
└── uv.lock

Dataset description

Realistic, publicly available datasets that simultaneously measure fraud likelihood, chargeback risk, account takeover probability, and anomaly score are difficult to find due to privacy concerns. To illustrate multivariate fraud‑analysis techniques, we therefore generate a synthetic dataset with the following characteristics:

Features: 20 numerical variables labelled feature_0 … feature_19 that simulate transaction attributes (amount, balance history, customer metadata, etc.). Some features carry stronger signal for fraud and other risks, while others are redundant or noisy.
Targets:
- fraud_label – a binary indicator (0/1) describing whether the transaction is fraudulent.
- chargeback_label – a binary indicator representing the risk that the transaction will result in a chargeback. It is correlated with, but not identical to, the fraud label.
- takeover_label – a binary indicator flagging the probability of an account takeover event.
- anomaly_score – a continuous value between 0 and 1 representing the degree to which a transaction is unusual. This score is derived from an unsupervised anomaly‑detection algorithm.

The dataset is created using data/generate_data.py. The script leverages scikit‑learn utilities (make_classification, IsolationForest) and basic random sampling to produce correlated risk outcomes. Running the script writes a CSV file (transactions.csv) containing 10 000 synthetic transactions.

Getting started

Set up a Python environment. From the project root, create a virtual environment and install dependencies. We recommend using uv for faster dependency installation. If uv is not installed, install it first with pip install uv.
```
python3 -m venv .venv
source .venv/bin/activate
# Install dependencies using uv (a drop‑in replacement for pip).  If you
# haven't installed uv yet, run `pip install uv` first.
uv pip install -r requirements.txt
```
Generate the dataset. Create the data/transactions.csv file with 10 000 rows (you can adjust --n_samples):
```
python data/generate_data.py --n_samples 10000 --output data/transactions.csv
```
Explore the data. Run the EDA script to view basic statistics and plots (figures will be saved to the plots/ directory):
```
python src/eda.py --data data/transactions.csv
```
Train the models. Fit predictive models for each risk indicator and log the experiments to an MLflow tracking server (running locally by default). MLflow runs will be stored under the mlruns directory:
```
python src/train.py --data data/transactions.csv
```
View the MLflow UI (optional). To inspect experiment metrics and compare models, launch the MLflow tracking UI:
```
mlflow ui
```
Then visit http://127.0.0.1:5000 in your browser.
Explore unsupervised techniques (optional but great for interviews). Unsupervised learning reveals hidden structure in the data and makes for compelling visualisations. Run the unsupervised script to produce two‑dimensional embeddings of the transactions using PCA, MDS, Isomap, UMAP, t‑SNE, and Isolation Forest:
```
python src/unsupervised.py --data data/transactions.csv --sample_size 3000
```
The script will create a plots/unsupervised/ directory containing a scatter plot for each technique. These visuals can enrich your portfolio and illustrate your ability to apply manifold learning and anomaly detection.

Lambda Function

The project includes an AWS Lambda function for background processing of fraud detection tasks.

Building the Lambda Package

# Package Lambda function into deployment zip
make lambda-package

# Clean Lambda package
make lambda-clean

Lambda Structure

lambda/
├── __init__.py         # Package marker
└── handler.py          # Lambda function entry point

The Lambda function is packaged as lambda_function_payload.zip and deployed via Terraform.

Deploying Lambda

The Lambda function is automatically deployed when running:

cd terraform
terraform apply

The function is configured to:

Process messages from SQS queue (when configured)
Execute with Python 3.12 runtime
Timeout after 30 seconds
Use IAM role with S3 and SQS permissions

Notes

The dataset generator and scripts are self‑contained—no external accounts or proprietary data are required.
The project uses only open‑source libraries (pandas, numpy, scikit‑learn, matplotlib, mlflow) that can be installed from PyPI.
All plots are created with matplotlib without specifying colors, complying with typical guidelines for reproducibility.
Feel free to extend this skeleton by adding feature engineering, hyper‑parameter tuning, additional models, or a notebook for interactive analysis.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fraud Detection and Risk Project

Project structure

Dataset description

Getting started

Lambda Function

Building the Lambda Package

Lambda Structure

Deploying Lambda

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
.github/workflows		.github/workflows
app		app
data		data
lambda		lambda
monitoring		monitoring
notebooks		notebooks
orchestration		orchestration
plots		plots
scripts		scripts
src		src
terraform		terraform
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Fraud Detection and Risk Project

Project structure

Dataset description

Getting started

Lambda Function

Building the Lambda Package

Lambda Structure

Deploying Lambda

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages