Skip to content

thed700/omni_engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Omni-Financial & Customer Intelligence Engine

A production-grade FinTech portfolio piece fusing quantitative finance, survival analysis, and ML explainability into a single unified intelligence system.

[GBM Market Data] + [Transaction Logs]
         ↓
   Engine 1 — The Quant:      40+ technical indicators on spending behaviour
         ↓
   Engine 2 — The Strategist: RFM segmentation + Cox PH survival
         ↓
   Engine 3 — The Brain:      XGBoost churn prediction + SHAP retention strategy
         ↓
   Dashboard:  Market Correlation · Revenue-at-Risk · LTV-Prioritised Retention Budget

Stack

Layer Technology
Market data GBM + Cholesky-correlated shocks (NumPy)
Transaction logs Synthetic FinTech ecosystem (Pandas)
Technical indicators 40+ custom primitives (RSI, MACD, Bollinger, ADX, CCI, …)
RFM segmentation 5-bin quantile scoring (scikit-learn KBinsDiscretizer)
Survival analysis Cox Proportional Hazards (Lifelines)
Churn model XGBoost + OOF CV + F-β(2) threshold optimisation
Explainability SHAP TreeExplainer
Python dashboard Plotly Dash
React dashboard Vite + React 18 + Recharts
Testing pytest + pytest-cov

Project Structure

omni_engine/
├── configs/
│   ├── config.yaml           # Master configuration
│   └── model_params.yaml     # Hyperparameter registry
├── src/
│   ├── data/
│   │   ├── market_generator.py       # GBM multi-asset price generator
│   │   └── transaction_generator.py  # Synthetic FinTech transaction logs
│   ├── engines/
│   │   ├── quant_engine.py           # 40+ technical indicators on customer spend
│   │   ├── strategist_engine.py      # RFM scoring + Cox PH survival
│   │   └── brain_engine.py           # XGBoost + SHAP + retention strategies
│   ├── dashboard/
│   │   ├── app.py                    # Plotly Dash server + REST API
│   │   └── layouts.py                # Dash layout builder
│   └── utils/
│       ├── metrics.py                # LTV, Revenue-at-Risk, F-beta
│       └── logger.py                 # Logging factory
├── tests/
│   ├── test_quant_engine.py
│   ├── test_strategist_engine.py
│   └── test_brain_engine.py
├── frontend/                         # React executive dashboard
│   ├── src/
│   │   ├── App.jsx
│   │   ├── components/
│   │   │   ├── MarketCorrelation.jsx
│   │   │   ├── SegmentRisk.jsx
│   │   │   └── RetentionBudget.jsx
│   │   └── api/client.js
│   ├── index.html
│   ├── vite.config.js
│   └── package.json
├── outputs/                          # Generated by pipeline (gitignored)
├── logs/                             # Log files (gitignored)
├── main.py                           # Pipeline entry point
├── requirements.txt
└── README.md

Quick Start

1. Install Python dependencies

pip install -r requirements.txt

2. Run the full pipeline

python main.py

Fast dev mode (1 000 customers, 1 year of data):

python main.py --fast

Run pipeline then launch Dash server:

python main.py --dash
# → http://localhost:8050

3. Outputs written to ./outputs/

File Description
market_data.parquet GBM OHLCV for AAPL, MSFT, BTC, ETH
transactions.parquet Synthetic customer transaction log
customers.parquet Customer profiles with latent segments
quant_features.parquet 40+ spending indicators per customer
survival_rfm.parquet RFM scores + Cox PH survival predictions
revenue_at_risk.csv Segment-level RAR with F-β weighting
retention_strategies.parquet Per-customer action plan + budget
dashboard_data.json React dashboard API feed

4. React frontend

cd frontend
npm install
npm run dev
# → http://localhost:3000

Make sure the Python backend is running (python main.py --dash) so the API at http://localhost:8050/api/dashboard is available.


Running Tests

# All tests
pytest tests/ -v

# With coverage report
pytest tests/ -v --cov=src --cov-report=term-missing

# Single module
pytest tests/test_quant_engine.py -v

Engine Details

Engine 1 — The Quant (40+ Indicators)

Treats customer spending as a financial time series:

Category Indicators
Trend SMA, EMA (7/21/50), DEMA, TEMA, EMA cross
Momentum ROC (10/30), Momentum (10/30), TRIX, MACD (line/signal/hist)
Volatility Std Dev (7d/30d), ATR, Max Drawdown, Sharpe Proxy, Bollinger Bands
Oscillators RSI(14), Stochastic K/D, CCI(20), Williams %R, ADX/+DI/-DI
Volume equiv. OBV, VWAP proxy, Z-Score(20), Active days ratio
Behavioural Amount CV, Premium ratio, Refund ratio, Subscription ratio

Engine 2 — The Strategist

  • RFM: 5-bin quantile scoring → 11 named segments (Champion → Lost)
  • Cox PH: Covariates = RFM scores + Quant features
    • Outputs: partial hazard, median survival, P(active) at 30/60/90/180 days
  • Revenue-at-Risk: LTV_proxy × P(churn_90d) aggregated by segment

Engine 3 — The Brain

  • XGBoost: 5-fold stratified OOF training, F-β(2) threshold optimisation
  • SHAP: TreeExplainer → top-2 risk drivers per customer
  • LTV: Discounted cash flow over 24-month horizon
  • Retention Strategies: Rule-based + SHAP-informed playbook per segment with budget proportional to Revenue-at-Risk

Key Metrics

Metric Description
ROC-AUC OOF cross-validated discrimination
F-β(2) Recall-weighted (missing a churner costs 4× more than a false alarm)
Revenue-at-Risk LTV × P(churn) — the dollar value at stake per segment
Concordance index Cox PH ranking quality (higher = better survival ordering)

Configuration

All parameters are in configs/config.yaml. Key knobs:

data:
  n_customers: 5000   # scale up/down
  n_days: 730         # observation window

xgboost:
  n_estimators: 400
  scale_pos_weight: 3.0   # class imbalance weight

revenue_at_risk:
  fbeta_beta: 2.0       # β for F-beta metric
  churn_window_days: 90 # prediction horizon

Architecture Decisions

  • Modular engines: each engine is independently testable and replaceable
  • No data leakage: quant features computed before survival fit; survival predictions fed into XGBoost as covariates only
  • F-β(2) over F-1: in churn prevention, false negatives (missed churners) are ~4× more costly than false positives
  • Cox PH over logistic regression for time-to-event: survival analysis respects censoring and gives median churn window, not just a binary label
  • SHAP over feature importance: SHAP gives directional, per-customer attribution rather than global impurity rankings

License

MIT

About

Omni-Financial & Customer Intelligence Engine: A production-grade FinTech pipeline fusing Quantitative Finance (GBM/Cholesky), Survival Analysis (Cox PH), and ML Explainability (XGBoost/SHAP) into a unified intelligence system.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors