Skip to content

anmolsharma152/Home-Credit-Risk-API

Repository files navigation

🛡️ Real-Time Credit Risk Scoring Engine

A Fintech API that predicts loan default probability using Alternative Data, LightGBM, and FastAPI.

Project Status Tech Stack AUC Score

💼 The Business Problem

In emerging markets, many creditworthy individuals are "unbanked" or lack a traditional credit history. Traditional logistic regression models reject these applicants, causing:

  1. Lost Revenue: Good customers are turned away.
  2. Hidden Risk: Traditional metrics miss behavioral red flags.

The Goal: Build a machine learning engine that uses alternative data (telco usage, family status, external sources) to score applicants more accurately and serve decisions via a real-time REST API.


🏗️ Solution Architecture

The pipeline consists of three stages:

  1. ETL & Preprocessing: Handling outliers (e.g., the "365243 days employed" bug) and engineering financial ratios (Debt-to-Income, Annuity-to-Credit).
  2. Model Training: A LightGBM Classifier optimized for imbalanced data (8% default rate) using weighted loss functions.
  3. Deployment: A FastAPI microservice that accepts JSON payloads and returns a Risk Score (0-1) and a Credit Score (300-850).

📊 Key Results

Metric Score Context
ROC-AUC 0.767 Far exceeds the industry baseline of 0.70.
Recall (Defaulters) 62% Captures the majority of bad loans to protect capital.
Inference Time <50ms Suitable for real-time mobile app integration.

🛠️ Tech Stack

  • Machine Learning: LightGBM, Scikit-Learn
  • Explainability: SHAP (Shapley Additive exPlanations)
  • API Framework: FastAPI, Uvicorn, Pydantic
  • Data Processing: Pandas, NumPy

🚀 How to Run

1. Setup Environment

pip install lightgbm fastapi uvicorn shap pandas scikit-learn

2. Preprocessing & Training

Generate the features and train the model. This script handles the class imbalance automatically.

python risk_preprocessing.py
python train_risk_model.py

Output: Saves credit_risk_model.pkl and generates risk_drivers.png (SHAP).

3. Start the API

Launch the REST server locally.

python risk_api.py

Server runs at http://localhost:8000

4. Test a Prediction

Send a sample applicant payload to the endpoint.

curl -X POST "http://localhost:8000/predict" \
     -H "Content-Type: application/json" \
     -d @applicant_payload.json

Response:

{
  "approved": false,
  "risk_score": 0.5012,
  "credit_score": 574,
  "message": "High Default Risk Detected"
}

🕵️ Model Explainability

Using SHAP values, we identified the top drivers of default risk:

  1. EXT_SOURCE_2 / 3: External normalized credit scores.
  2. DAYS_BIRTH: Younger applicants showed statistically higher default rates.
  3. CREDIT_TERM: Longer loan terms correlated with higher risk.

SHAP Plot


🔮 Future Improvements

  • Dockerize: Containerize the API for cloud deployment (AWS ECS).
  • Monitoring: Add Prometheus to track "Data Drift" (e.g., if applicant income levels change over time).
  • A/B Testing: Deploy a "Challenger" model (XGBoost) to run alongside the "Champion" (LightGBM).

About

FastAPI implementation for credit risk scoring

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages