This project is a starting point for exploring credit‑card transaction data and building a simple fraud‑detection pipeline.
The repository contains an exploratory data analysis script (project.py) and a Random Forest fraud detection model (model.py). The EDA script surfaces basic descriptive statistics, while the model trains a balanced Random Forest classifier that achieves 96% precision and 74% recall on fraud detection.
The project uses the public ULB credit‑card fraud dataset, which contains 284,807 transactions from European cardholders over two days and a Class column indicating whether a transaction is fraudulent. Because the dataset (~150 MB) is too large to include in the repository, please download it from Kaggle and place the creditcard.csv file in the root of this project.
-
Clone this repository or download the source code.
-
Create and activate a virtual environment (Python 3.14):
python -m venv fintechproject source fintechproject/Scripts/activate # Windows (Git Bash) source fintechproject/bin/activate # macOS/Linux
-
Install dependencies:
pip install pandas scikit-learn
-
Download
creditcard.csvfrom Kaggle and save it in the project root. -
Run the EDA script:
python project.py
This will print dataset shape, null values, class distribution, and basic statistics.
-
Run the fraud detection model:
python model.py
This will train a Random Forest classifier and print the confusion matrix and classification report (precision, recall, F1-score).
- Feature engineering – extract useful features such as time‑based aggregates, rolling averages, or customer behaviour metrics.
- Additional models – compare performance with logistic regression, gradient boosting, or neural networks using metrics such as AUC and F1-score.
- Thresholding and evaluation – explore how different classification thresholds affect false positives/negatives and overall risk.
Feel free to fork this repository and open pull requests with improvements. Suggestions for feature engineering or model architectures are always welcome.
This project is licensed under the MIT License. See the LICENSE file for details.