A production-grade credit risk assessment system built with FastAPI, Streamlit, and calibrated machine learning models. Scores loan applicants, generates probability-based risk classifications, and explains every decision with SHAP-based feature importance analysis.
- Calibrated ML Models: LightGBM with sigmoid calibration for accurate default probability estimates
- Risk Tier Classification: Four-tier risk framework (Low, Medium, High, Very High) with optimal threshold at 0.1636
- SHAP Explainability: Plain-English explanations of credit decisions with top 3 decision factors
- FastAPI Backend: RESTful API for programmatic scoring and integration
- Streamlit Web UI: Interactive demo with calendar date pickers and real-time risk assessment
- Comprehensive Testing: 21+ pytest tests covering low-risk, high-risk, edge cases, and end-to-end integration
ClearScore/
βββ app.py # FastAPI backend service
βββ streamlit_demo.py # Streamlit web UI
βββ explain.py # SHAP-based explanation generation
βββ test_credit_scoring.py # pytest test suite (21 tests, all passing)
βββ requirements.txt # Python dependencies
βββ calibrated_model.pkl # Trained & calibrated LightGBM model
βββ lgbm_model.pkl # LightGBM base model for SHAP
βββ encoders.pkl # Feature encoders (label encoders)
βββ feature_names.parquet # Required feature set
βββ README.md # This file
- Python 3.10+
- pip or conda
-
Navigate to the repository
cd ClearScore -
Create and activate a virtual environment
# Windows python -m venv .venv .venv\Scripts\Activate.ps1 # macOS/Linux python3 -m venv .venv source .venv/bin/activate
-
Install dependencies
pip install -r requirements.txt
streamlit run streamlit_demo.pyOpens a local web app at http://localhost:8501 where you can:
- Enter applicant information (age, income, employment history, credit metrics, etc.)
- Get instant credit risk prediction and classification
- View SHAP-based explanation of the decision
- Explore risk tier boundaries
Features:
- Calendar date pickers for employment date and birthday
- Real-time risk tier updates
- Detailed explanations with top decision factors
uvicorn app:app --reload --port 8000Starts the API server at http://localhost:8000. Interactive docs available at /docs.
Available Endpoints:
Scores an applicant and returns risk assessment.
Request Body:
{
"age": 35,
"employment_years": 5,
"income": 75000,
"credit_score": 720,
"existing_debt": 15000,
"monthly_payment": 2500,
"employment_date": "2019-06-15",
"birthday": "1989-02-20"
}Response:
{
"default_probability": 0.12,
"risk_tier": "Medium Risk",
"recommendation": "Review",
"shap_explanation": "This applicant presents moderate credit risk. Primary factors driving this assessment are: low credit score relative to income (40% impact), moderate existing debt (35% impact), recent employment (25% impact)."
}The system uses a calibrated probability model with these risk tiers:
| Default Probability | Risk Tier | Recommendation |
|---|---|---|
| < 0.08 | Low Risk | β APPROVE |
| 0.08 - 0.1636 | Medium Risk | π REVIEW |
| 0.1636 - 0.25 | High Risk | |
| β₯ 0.25 | Very High Risk | β REJECT |
Optimal Threshold: 0.1636 β calibrated from model training as the best binary decision point for default prediction.
import requests
applicant = {
"age": 45,
"employment_years": 15,
"income": 120000,
"credit_score": 750,
"existing_debt": 25000,
"monthly_payment": 3000,
"employment_date": "2009-03-10",
"birthday": "1979-01-25"
}
response = requests.post("http://localhost:8000/predict", json=applicant)
result = response.json()
print(f"Default Probability: {result['default_probability']:.2%}")
print(f"Risk Tier: {result['risk_tier']}")
print(f"Explanation: {result['shap_explanation']}")curl -X POST "http://localhost:8000/predict" \
-H "Content-Type: application/json" \
-d '{
"age": 35,
"employment_years": 5,
"income": 75000,
"credit_score": 720,
"existing_debt": 15000,
"monthly_payment": 2500,
"employment_date": "2019-06-15",
"birthday": "1989-02-20"
}'Run the comprehensive test suite:
pytest test_credit_scoring.py -vTest Coverage:
- β Low-risk applicant scenarios (excellent/good applicants)
- β High-risk applicant scenarios (poor/very poor applicants)
- β Risk tier classification boundaries
- β Feature preprocessing and encoding
- β SHAP explainability generation
- β Edge cases (probability ranges, consistent predictions)
- β End-to-end integration (full prediction pipeline)
Expected Output:
21 passed, 1 skipped, 40 warnings in ~8 seconds
The model expects these applicant features:
| Feature | Type | Description |
|---|---|---|
| age | int | Applicant age in years |
| employment_years | int | Years at current employer |
| income | int | Annual income in dollars |
| credit_score | int | Credit score (typically 300-850) |
| existing_debt | int | Total outstanding debt in dollars |
| monthly_payment | int | Monthly credit payment in dollars |
| employment_date | str (YYYY-MM-DD) | Start date at current job (can be null) |
| birthday | str (YYYY-MM-DD) | Date of birth |
-
Feature Engineering
- Calculates
days_employedfrom employment_date - Calculates
days_since_birthfrom birthday - Computes debt-to-income ratio
- Encodes categorical features with LabelEncoder
- Calculates
-
Model Prediction
- LightGBM classifier generates raw probability scores
- Sigmoid calibration maps scores to real default probabilities (0-1)
-
Risk Classification
- Compares default probability against tier thresholds
- Assigns risk tier (Low/Medium/High/Very High)
-
Explainability
- SHAP TreeExplainer computes feature contribution to prediction
- Top 3 features formatted into plain-English explanation
- Risk tier determines explanation tone
Each prediction includes a narrative explanation:
Example (Low Risk):
"This applicant presents low credit risk and can be considered favorably. Primary factors supporting this assessment are: high credit score relative to income (55% impact), stable long-term employment (30% impact), low existing debt (15% impact)."
Example (Very High Risk):
"This applicant presents very high credit risk and should not be approved for credit at this time. Primary factors driving this assessment are: low credit score relative to debt (60% impact), high monthly payment obligations (25% impact), limited employment history (15% impact)."
Edit these constants in app.py to adjust credit risk policy:
THRESHOLD_LOW = 0.08 # Approval threshold
THRESHOLD_MEDIUM = 0.1636 # Review threshold (optimal threshold)
THRESHOLD_HIGH = 0.25 # Rejection thresholdImportant: The optimal threshold of 0.1636 is calibrated from model training. Changing this value will significantly impact approval rates.
# Check port availability
netstat -ano | findstr :8501 # Windows
lsof -i :8501 # macOS/Linux
# Use different port
streamlit run streamlit_demo.py --server.port 8502# Start on different port
uvicorn app:app --port 8001Ensure these files exist in project root:
calibrated_model.pkllgbm_model.pklencoders.pklfeature_names.parquet
- Dates must be in
YYYY-MM-DDformat - Numeric fields must be positive integers
employment_datecan be null/empty; other required fields cannot
- fastapi - REST API framework
- uvicorn - ASGI server
- streamlit - Web UI framework
- scikit-learn - ML utilities and preprocessing
- lightgbm - LightGBM model serving
- pydantic - Request/response validation
- shap - Model explainability
- pandas, numpy - Data processing
- joblib - Model serialization
See requirements.txt for complete list with versions.
- Prediction latency: <100ms per applicant (includes SHAP explanation)
- Threshold optimality: 0.1636 β calibrated binary decision point
- Test coverage: 21 tests passing, all scenarios validated
- Batch prediction endpoint for scoring multiple applicants
- Audit log for compliance and fairness monitoring
- Feature drift detection and model retraining alerts
- Advanced explainability (prototype learning, counterfactuals)
- Database integration for applicant history
For issues or questions, please open a GitHub issue or contact the development team.
Last Updated: April 2026
Model Version: 1.0 (Calibrated LightGBM with Sigmoid Calibration)
Optimal Threshold: 0.1636
Test Suite: 21 passing tests
[Specify your license here]
For questions or issues, please open an issue or contact the development team.