Skip to content

JannsenRamos/ClearScore

Repository files navigation

ClearScore Credit Scoring System

A production-grade credit risk assessment system built with FastAPI, Streamlit, and calibrated machine learning models. Scores loan applicants, generates probability-based risk classifications, and explains every decision with SHAP-based feature importance analysis.

Features

  • Calibrated ML Models: LightGBM with sigmoid calibration for accurate default probability estimates
  • Risk Tier Classification: Four-tier risk framework (Low, Medium, High, Very High) with optimal threshold at 0.1636
  • SHAP Explainability: Plain-English explanations of credit decisions with top 3 decision factors
  • FastAPI Backend: RESTful API for programmatic scoring and integration
  • Streamlit Web UI: Interactive demo with calendar date pickers and real-time risk assessment
  • Comprehensive Testing: 21+ pytest tests covering low-risk, high-risk, edge cases, and end-to-end integration

Project Structure

ClearScore/
β”œβ”€β”€ app.py                      # FastAPI backend service
β”œβ”€β”€ streamlit_demo.py           # Streamlit web UI
β”œβ”€β”€ explain.py                  # SHAP-based explanation generation
β”œβ”€β”€ test_credit_scoring.py      # pytest test suite (21 tests, all passing)
β”œβ”€β”€ requirements.txt            # Python dependencies
β”œβ”€β”€ calibrated_model.pkl        # Trained & calibrated LightGBM model
β”œβ”€β”€ lgbm_model.pkl              # LightGBM base model for SHAP
β”œβ”€β”€ encoders.pkl                # Feature encoders (label encoders)
β”œβ”€β”€ feature_names.parquet       # Required feature set
└── README.md                   # This file

Installation

Prerequisites

  • Python 3.10+
  • pip or conda

Setup

  1. Navigate to the repository

    cd ClearScore
  2. Create and activate a virtual environment

    # Windows
    python -m venv .venv
    .venv\Scripts\Activate.ps1
    
    # macOS/Linux
    python3 -m venv .venv
    source .venv/bin/activate
  3. Install dependencies

    pip install -r requirements.txt

Running the System

Option 1: Streamlit Web UI (Recommended for Demo)

streamlit run streamlit_demo.py

Opens a local web app at http://localhost:8501 where you can:

  • Enter applicant information (age, income, employment history, credit metrics, etc.)
  • Get instant credit risk prediction and classification
  • View SHAP-based explanation of the decision
  • Explore risk tier boundaries

Features:

  • Calendar date pickers for employment date and birthday
  • Real-time risk tier updates
  • Detailed explanations with top decision factors

Option 2: FastAPI Backend (For Production/Integration)

uvicorn app:app --reload --port 8000

Starts the API server at http://localhost:8000. Interactive docs available at /docs.

Available Endpoints:

POST /predict

Scores an applicant and returns risk assessment.

Request Body:

{
  "age": 35,
  "employment_years": 5,
  "income": 75000,
  "credit_score": 720,
  "existing_debt": 15000,
  "monthly_payment": 2500,
  "employment_date": "2019-06-15",
  "birthday": "1989-02-20"
}

Response:

{
  "default_probability": 0.12,
  "risk_tier": "Medium Risk",
  "recommendation": "Review",
  "shap_explanation": "This applicant presents moderate credit risk. Primary factors driving this assessment are: low credit score relative to income (40% impact), moderate existing debt (35% impact), recent employment (25% impact)."
}

Credit Risk Classification

The system uses a calibrated probability model with these risk tiers:

Default Probability Risk Tier Recommendation
< 0.08 Low Risk βœ… APPROVE
0.08 - 0.1636 Medium Risk πŸ” REVIEW
0.1636 - 0.25 High Risk ⚠️ LIKELY REJECT
β‰₯ 0.25 Very High Risk ❌ REJECT

Optimal Threshold: 0.1636 β€” calibrated from model training as the best binary decision point for default prediction.

API Usage Examples

Using Python Requests

import requests

applicant = {
    "age": 45,
    "employment_years": 15,
    "income": 120000,
    "credit_score": 750,
    "existing_debt": 25000,
    "monthly_payment": 3000,
    "employment_date": "2009-03-10",
    "birthday": "1979-01-25"
}

response = requests.post("http://localhost:8000/predict", json=applicant)
result = response.json()

print(f"Default Probability: {result['default_probability']:.2%}")
print(f"Risk Tier: {result['risk_tier']}")
print(f"Explanation: {result['shap_explanation']}")

Using cURL

curl -X POST "http://localhost:8000/predict" \
  -H "Content-Type: application/json" \
  -d '{
    "age": 35,
    "employment_years": 5,
    "income": 75000,
    "credit_score": 720,
    "existing_debt": 15000,
    "monthly_payment": 2500,
    "employment_date": "2019-06-15",
    "birthday": "1989-02-20"
  }'

Testing

Run the comprehensive test suite:

pytest test_credit_scoring.py -v

Test Coverage:

  • βœ… Low-risk applicant scenarios (excellent/good applicants)
  • βœ… High-risk applicant scenarios (poor/very poor applicants)
  • βœ… Risk tier classification boundaries
  • βœ… Feature preprocessing and encoding
  • βœ… SHAP explainability generation
  • βœ… Edge cases (probability ranges, consistent predictions)
  • βœ… End-to-end integration (full prediction pipeline)

Expected Output:

21 passed, 1 skipped, 40 warnings in ~8 seconds

Model Architecture

Input Features

The model expects these applicant features:

Feature Type Description
age int Applicant age in years
employment_years int Years at current employer
income int Annual income in dollars
credit_score int Credit score (typically 300-850)
existing_debt int Total outstanding debt in dollars
monthly_payment int Monthly credit payment in dollars
employment_date str (YYYY-MM-DD) Start date at current job (can be null)
birthday str (YYYY-MM-DD) Date of birth

Processing Pipeline

  1. Feature Engineering

    • Calculates days_employed from employment_date
    • Calculates days_since_birth from birthday
    • Computes debt-to-income ratio
    • Encodes categorical features with LabelEncoder
  2. Model Prediction

    • LightGBM classifier generates raw probability scores
    • Sigmoid calibration maps scores to real default probabilities (0-1)
  3. Risk Classification

    • Compares default probability against tier thresholds
    • Assigns risk tier (Low/Medium/High/Very High)
  4. Explainability

    • SHAP TreeExplainer computes feature contribution to prediction
    • Top 3 features formatted into plain-English explanation
    • Risk tier determines explanation tone

SHAP Explanation Format

Each prediction includes a narrative explanation:

Example (Low Risk):

"This applicant presents low credit risk and can be considered favorably. Primary factors supporting this assessment are: high credit score relative to income (55% impact), stable long-term employment (30% impact), low existing debt (15% impact)."

Example (Very High Risk):

"This applicant presents very high credit risk and should not be approved for credit at this time. Primary factors driving this assessment are: low credit score relative to debt (60% impact), high monthly payment obligations (25% impact), limited employment history (15% impact)."


Configuration

Edit these constants in app.py to adjust credit risk policy:

THRESHOLD_LOW    = 0.08      # Approval threshold
THRESHOLD_MEDIUM = 0.1636    # Review threshold (optimal threshold)
THRESHOLD_HIGH   = 0.25      # Rejection threshold

Important: The optimal threshold of 0.1636 is calibrated from model training. Changing this value will significantly impact approval rates.


Troubleshooting

Streamlit app won't start

# Check port availability
netstat -ano | findstr :8501  # Windows
lsof -i :8501                 # macOS/Linux

# Use different port
streamlit run streamlit_demo.py --server.port 8502

FastAPI port in use

# Start on different port
uvicorn app:app --port 8001

Model loading error

Ensure these files exist in project root:

  • calibrated_model.pkl
  • lgbm_model.pkl
  • encoders.pkl
  • feature_names.parquet

Validation error when calling API

  • Dates must be in YYYY-MM-DD format
  • Numeric fields must be positive integers
  • employment_date can be null/empty; other required fields cannot

Dependencies

  • fastapi - REST API framework
  • uvicorn - ASGI server
  • streamlit - Web UI framework
  • scikit-learn - ML utilities and preprocessing
  • lightgbm - LightGBM model serving
  • pydantic - Request/response validation
  • shap - Model explainability
  • pandas, numpy - Data processing
  • joblib - Model serialization

See requirements.txt for complete list with versions.


Performance Metrics

  • Prediction latency: <100ms per applicant (includes SHAP explanation)
  • Threshold optimality: 0.1636 β€” calibrated binary decision point
  • Test coverage: 21 tests passing, all scenarios validated

Future Enhancements

  • Batch prediction endpoint for scoring multiple applicants
  • Audit log for compliance and fairness monitoring
  • Feature drift detection and model retraining alerts
  • Advanced explainability (prototype learning, counterfactuals)
  • Database integration for applicant history

Support

For issues or questions, please open a GitHub issue or contact the development team.


Last Updated: April 2026
Model Version: 1.0 (Calibrated LightGBM with Sigmoid Calibration)
Optimal Threshold: 0.1636
Test Suite: 21 passing tests

License

[Specify your license here]

Contact

For questions or issues, please open an issue or contact the development team.

About

A calibrated LightGBM credit scoring model with SHAP-based feature attribution and plain-English explanations, designed to meet EU AI Act high-risk AI requirements for transparency and the right to explanation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors