A production-grade FinTech portfolio piece fusing quantitative finance, survival analysis, and ML explainability into a single unified intelligence system.
[GBM Market Data] + [Transaction Logs]
↓
Engine 1 — The Quant: 40+ technical indicators on spending behaviour
↓
Engine 2 — The Strategist: RFM segmentation + Cox PH survival
↓
Engine 3 — The Brain: XGBoost churn prediction + SHAP retention strategy
↓
Dashboard: Market Correlation · Revenue-at-Risk · LTV-Prioritised Retention Budget
| Layer | Technology |
|---|---|
| Market data | GBM + Cholesky-correlated shocks (NumPy) |
| Transaction logs | Synthetic FinTech ecosystem (Pandas) |
| Technical indicators | 40+ custom primitives (RSI, MACD, Bollinger, ADX, CCI, …) |
| RFM segmentation | 5-bin quantile scoring (scikit-learn KBinsDiscretizer) |
| Survival analysis | Cox Proportional Hazards (Lifelines) |
| Churn model | XGBoost + OOF CV + F-β(2) threshold optimisation |
| Explainability | SHAP TreeExplainer |
| Python dashboard | Plotly Dash |
| React dashboard | Vite + React 18 + Recharts |
| Testing | pytest + pytest-cov |
omni_engine/
├── configs/
│ ├── config.yaml # Master configuration
│ └── model_params.yaml # Hyperparameter registry
├── src/
│ ├── data/
│ │ ├── market_generator.py # GBM multi-asset price generator
│ │ └── transaction_generator.py # Synthetic FinTech transaction logs
│ ├── engines/
│ │ ├── quant_engine.py # 40+ technical indicators on customer spend
│ │ ├── strategist_engine.py # RFM scoring + Cox PH survival
│ │ └── brain_engine.py # XGBoost + SHAP + retention strategies
│ ├── dashboard/
│ │ ├── app.py # Plotly Dash server + REST API
│ │ └── layouts.py # Dash layout builder
│ └── utils/
│ ├── metrics.py # LTV, Revenue-at-Risk, F-beta
│ └── logger.py # Logging factory
├── tests/
│ ├── test_quant_engine.py
│ ├── test_strategist_engine.py
│ └── test_brain_engine.py
├── frontend/ # React executive dashboard
│ ├── src/
│ │ ├── App.jsx
│ │ ├── components/
│ │ │ ├── MarketCorrelation.jsx
│ │ │ ├── SegmentRisk.jsx
│ │ │ └── RetentionBudget.jsx
│ │ └── api/client.js
│ ├── index.html
│ ├── vite.config.js
│ └── package.json
├── outputs/ # Generated by pipeline (gitignored)
├── logs/ # Log files (gitignored)
├── main.py # Pipeline entry point
├── requirements.txt
└── README.md
pip install -r requirements.txtpython main.pyFast dev mode (1 000 customers, 1 year of data):
python main.py --fastRun pipeline then launch Dash server:
python main.py --dash
# → http://localhost:8050| File | Description |
|---|---|
market_data.parquet |
GBM OHLCV for AAPL, MSFT, BTC, ETH |
transactions.parquet |
Synthetic customer transaction log |
customers.parquet |
Customer profiles with latent segments |
quant_features.parquet |
40+ spending indicators per customer |
survival_rfm.parquet |
RFM scores + Cox PH survival predictions |
revenue_at_risk.csv |
Segment-level RAR with F-β weighting |
retention_strategies.parquet |
Per-customer action plan + budget |
dashboard_data.json |
React dashboard API feed |
cd frontend
npm install
npm run dev
# → http://localhost:3000Make sure the Python backend is running (python main.py --dash) so the API at http://localhost:8050/api/dashboard is available.
# All tests
pytest tests/ -v
# With coverage report
pytest tests/ -v --cov=src --cov-report=term-missing
# Single module
pytest tests/test_quant_engine.py -vTreats customer spending as a financial time series:
| Category | Indicators |
|---|---|
| Trend | SMA, EMA (7/21/50), DEMA, TEMA, EMA cross |
| Momentum | ROC (10/30), Momentum (10/30), TRIX, MACD (line/signal/hist) |
| Volatility | Std Dev (7d/30d), ATR, Max Drawdown, Sharpe Proxy, Bollinger Bands |
| Oscillators | RSI(14), Stochastic K/D, CCI(20), Williams %R, ADX/+DI/-DI |
| Volume equiv. | OBV, VWAP proxy, Z-Score(20), Active days ratio |
| Behavioural | Amount CV, Premium ratio, Refund ratio, Subscription ratio |
- RFM: 5-bin quantile scoring → 11 named segments (Champion → Lost)
- Cox PH: Covariates = RFM scores + Quant features
- Outputs: partial hazard, median survival, P(active) at 30/60/90/180 days
- Revenue-at-Risk:
LTV_proxy × P(churn_90d)aggregated by segment
- XGBoost: 5-fold stratified OOF training, F-β(2) threshold optimisation
- SHAP: TreeExplainer → top-2 risk drivers per customer
- LTV: Discounted cash flow over 24-month horizon
- Retention Strategies: Rule-based + SHAP-informed playbook per segment with budget proportional to Revenue-at-Risk
| Metric | Description |
|---|---|
| ROC-AUC | OOF cross-validated discrimination |
| F-β(2) | Recall-weighted (missing a churner costs 4× more than a false alarm) |
| Revenue-at-Risk | LTV × P(churn) — the dollar value at stake per segment |
| Concordance index | Cox PH ranking quality (higher = better survival ordering) |
All parameters are in configs/config.yaml. Key knobs:
data:
n_customers: 5000 # scale up/down
n_days: 730 # observation window
xgboost:
n_estimators: 400
scale_pos_weight: 3.0 # class imbalance weight
revenue_at_risk:
fbeta_beta: 2.0 # β for F-beta metric
churn_window_days: 90 # prediction horizon- Modular engines: each engine is independently testable and replaceable
- No data leakage: quant features computed before survival fit; survival predictions fed into XGBoost as covariates only
- F-β(2) over F-1: in churn prevention, false negatives (missed churners) are ~4× more costly than false positives
- Cox PH over logistic regression for time-to-event: survival analysis respects censoring and gives median churn window, not just a binary label
- SHAP over feature importance: SHAP gives directional, per-customer attribution rather than global impurity rankings
MIT