ClearScore Credit Scoring System

A production-grade credit risk assessment system built with FastAPI, Streamlit, and calibrated machine learning models. Scores loan applicants, generates probability-based risk classifications, and explains every decision with SHAP-based feature importance analysis.

Features

Calibrated ML Models: LightGBM with sigmoid calibration for accurate default probability estimates
Risk Tier Classification: Four-tier risk framework (Low, Medium, High, Very High) with optimal threshold at 0.1636
SHAP Explainability: Plain-English explanations of credit decisions with top 3 decision factors
FastAPI Backend: RESTful API for programmatic scoring and integration
Streamlit Web UI: Interactive demo with calendar date pickers and real-time risk assessment
Comprehensive Testing: 21+ pytest tests covering low-risk, high-risk, edge cases, and end-to-end integration

Project Structure

ClearScore/
├── app.py                      # FastAPI backend service
├── streamlit_demo.py           # Streamlit web UI
├── explain.py                  # SHAP-based explanation generation
├── test_credit_scoring.py      # pytest test suite (21 tests, all passing)
├── requirements.txt            # Python dependencies
├── calibrated_model.pkl        # Trained & calibrated LightGBM model
├── lgbm_model.pkl              # LightGBM base model for SHAP
├── encoders.pkl                # Feature encoders (label encoders)
├── feature_names.parquet       # Required feature set
└── README.md                   # This file

Installation

Prerequisites

Python 3.10+
pip or conda

Setup

Navigate to the repository
```
cd ClearScore
```

Create and activate a virtual environment

# Windows
python -m venv .venv
.venv\Scripts\Activate.ps1

# macOS/Linux
python3 -m venv .venv
source .venv/bin/activate

Install dependencies
```
pip install -r requirements.txt
```

Running the System

Option 1: Streamlit Web UI (Recommended for Demo)

streamlit run streamlit_demo.py

Opens a local web app at http://localhost:8501 where you can:

Enter applicant information (age, income, employment history, credit metrics, etc.)
Get instant credit risk prediction and classification
View SHAP-based explanation of the decision
Explore risk tier boundaries

Features:

Calendar date pickers for employment date and birthday
Real-time risk tier updates
Detailed explanations with top decision factors

Option 2: FastAPI Backend (For Production/Integration)

uvicorn app:app --reload --port 8000

Starts the API server at http://localhost:8000. Interactive docs available at /docs.

Available Endpoints:

`POST /predict`

Scores an applicant and returns risk assessment.

Request Body:

{
  "age": 35,
  "employment_years": 5,
  "income": 75000,
  "credit_score": 720,
  "existing_debt": 15000,
  "monthly_payment": 2500,
  "employment_date": "2019-06-15",
  "birthday": "1989-02-20"
}

Response:

{
  "default_probability": 0.12,
  "risk_tier": "Medium Risk",
  "recommendation": "Review",
  "shap_explanation": "This applicant presents moderate credit risk. Primary factors driving this assessment are: low credit score relative to income (40% impact), moderate existing debt (35% impact), recent employment (25% impact)."
}

Credit Risk Classification

The system uses a calibrated probability model with these risk tiers:

Default Probability	Risk Tier	Recommendation
< 0.08	Low Risk	✅ APPROVE
0.08 - 0.1636	Medium Risk	🔍 REVIEW
0.1636 - 0.25	High Risk	⚠️ LIKELY REJECT
≥ 0.25	Very High Risk	❌ REJECT

Optimal Threshold: 0.1636 — calibrated from model training as the best binary decision point for default prediction.

API Usage Examples

Using Python Requests

import requests

applicant = {
    "age": 45,
    "employment_years": 15,
    "income": 120000,
    "credit_score": 750,
    "existing_debt": 25000,
    "monthly_payment": 3000,
    "employment_date": "2009-03-10",
    "birthday": "1979-01-25"
}

response = requests.post("http://localhost:8000/predict", json=applicant)
result = response.json()

print(f"Default Probability: {result['default_probability']:.2%}")
print(f"Risk Tier: {result['risk_tier']}")
print(f"Explanation: {result['shap_explanation']}")

Using cURL

curl -X POST "http://localhost:8000/predict" \
  -H "Content-Type: application/json" \
  -d '{
    "age": 35,
    "employment_years": 5,
    "income": 75000,
    "credit_score": 720,
    "existing_debt": 15000,
    "monthly_payment": 2500,
    "employment_date": "2019-06-15",
    "birthday": "1989-02-20"
  }'

Testing

Run the comprehensive test suite:

pytest test_credit_scoring.py -v

Test Coverage:

✅ Low-risk applicant scenarios (excellent/good applicants)
✅ High-risk applicant scenarios (poor/very poor applicants)
✅ Risk tier classification boundaries
✅ Feature preprocessing and encoding
✅ SHAP explainability generation
✅ Edge cases (probability ranges, consistent predictions)
✅ End-to-end integration (full prediction pipeline)

Expected Output:

21 passed, 1 skipped, 40 warnings in ~8 seconds

Model Architecture

Input Features

The model expects these applicant features:

Feature	Type	Description
age	int	Applicant age in years
employment_years	int	Years at current employer
income	int	Annual income in dollars
credit_score	int	Credit score (typically 300-850)
existing_debt	int	Total outstanding debt in dollars
monthly_payment	int	Monthly credit payment in dollars
employment_date	str (YYYY-MM-DD)	Start date at current job (can be null)
birthday	str (YYYY-MM-DD)	Date of birth

Processing Pipeline

Feature Engineering
- Calculates days_employed from employment_date
- Calculates days_since_birth from birthday
- Computes debt-to-income ratio
- Encodes categorical features with LabelEncoder
Model Prediction
- LightGBM classifier generates raw probability scores
- Sigmoid calibration maps scores to real default probabilities (0-1)
Risk Classification
- Compares default probability against tier thresholds
- Assigns risk tier (Low/Medium/High/Very High)
Explainability
- SHAP TreeExplainer computes feature contribution to prediction
- Top 3 features formatted into plain-English explanation
- Risk tier determines explanation tone

SHAP Explanation Format

Each prediction includes a narrative explanation:

Example (Low Risk):

"This applicant presents low credit risk and can be considered favorably. Primary factors supporting this assessment are: high credit score relative to income (55% impact), stable long-term employment (30% impact), low existing debt (15% impact)."

Example (Very High Risk):

"This applicant presents very high credit risk and should not be approved for credit at this time. Primary factors driving this assessment are: low credit score relative to debt (60% impact), high monthly payment obligations (25% impact), limited employment history (15% impact)."

Configuration

Edit these constants in app.py to adjust credit risk policy:

THRESHOLD_LOW    = 0.08      # Approval threshold
THRESHOLD_MEDIUM = 0.1636    # Review threshold (optimal threshold)
THRESHOLD_HIGH   = 0.25      # Rejection threshold

Important: The optimal threshold of 0.1636 is calibrated from model training. Changing this value will significantly impact approval rates.

Troubleshooting

Streamlit app won't start

# Check port availability
netstat -ano | findstr :8501  # Windows
lsof -i :8501                 # macOS/Linux

# Use different port
streamlit run streamlit_demo.py --server.port 8502

FastAPI port in use

# Start on different port
uvicorn app:app --port 8001

Model loading error

Ensure these files exist in project root:

calibrated_model.pkl
lgbm_model.pkl
encoders.pkl
feature_names.parquet

Validation error when calling API

Dates must be in YYYY-MM-DD format
Numeric fields must be positive integers
employment_date can be null/empty; other required fields cannot

Dependencies

fastapi - REST API framework
uvicorn - ASGI server
streamlit - Web UI framework
scikit-learn - ML utilities and preprocessing
lightgbm - LightGBM model serving
pydantic - Request/response validation
shap - Model explainability
pandas, numpy - Data processing
joblib - Model serialization

See requirements.txt for complete list with versions.

Performance Metrics

Prediction latency: <100ms per applicant (includes SHAP explanation)
Threshold optimality: 0.1636 — calibrated binary decision point
Test coverage: 21 tests passing, all scenarios validated

Future Enhancements

Batch prediction endpoint for scoring multiple applicants
Audit log for compliance and fairness monitoring
Feature drift detection and model retraining alerts
Advanced explainability (prototype learning, counterfactuals)
Database integration for applicant history

Support

For issues or questions, please open a GitHub issue or contact the development team.

Last Updated: April 2026
Model Version: 1.0 (Calibrated LightGBM with Sigmoid Calibration)
Optimal Threshold: 0.1636
Test Suite: 21 passing tests

License

[Specify your license here]

Contact

For questions or issues, please open an issue or contact the development team.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
README.md		README.md
app.py		app.py
clearscore-prd-v1.pdf		clearscore-prd-v1.pdf
clearscore_calibrated.pkl		clearscore_calibrated.pkl
clearscore_encoders.pkl		clearscore_encoders.pkl
clearscore_lgbm.pkl		clearscore_lgbm.pkl
clearscore_model.pkl		clearscore_model.pkl
eda-credit.ipynb		eda-credit.ipynb
eda.ipynb		eda.ipynb
explain.py		explain.py
requirements.txt		requirements.txt
streamlit_demo.py		streamlit_demo.py
test_credit_scoring.py		test_credit_scoring.py
test_implementation.py		test_implementation.py

Folders and files

Latest commit

History

Repository files navigation

ClearScore Credit Scoring System

Features

Project Structure

Installation

Prerequisites

Setup

Running the System

Option 1: Streamlit Web UI (Recommended for Demo)

Option 2: FastAPI Backend (For Production/Integration)

POST /predict

Credit Risk Classification

API Usage Examples

Using Python Requests

Using cURL

Testing

Model Architecture

Input Features

Processing Pipeline

SHAP Explanation Format

Configuration

Troubleshooting

Streamlit app won't start

FastAPI port in use

Model loading error

Validation error when calling API

Dependencies

Performance Metrics

Future Enhancements

Support

License

Contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /predict`

Packages