-
Notifications
You must be signed in to change notification settings - Fork 0
model performance
Comprehensive evaluation of all four architectures on the out-of-time (OOT) test set (train ≤ 2022, test 2023–2025).
| Metric | Lead 1 | Lead 2 | Lead 3 | Lead 4 | Lead 5 | Lead 6 |
|---|---|---|---|---|---|---|
| F1 Score | 0.96 | 0.95 | 0.93 | 0.91 | 0.88 | 0.86 |
| Precision | 0.96 | 0.94 | 0.92 | 0.89 | 0.84 | 0.80 |
| Recall | 0.96 | 0.95 | 0.94 | 0.93 | 0.92 | 0.92 |
| RMSE | 277–298 | ~310 | 346–351 | ~370 | ~390 | 401–424 |
Critical achievement: Recall > 92% across all horizons — the system catches 9+ of every 10 genuine crises while maintaining ≤ 20% false alarm rate.
| Model | Lead 1 | Lead 3 | Lead 6 | Best At |
|---|---|---|---|---|
| RF | 0.97 | 0.91 | 0.82 | Short horizons |
| LSTM | 0.95 | 0.92 | 0.84 | Mid horizons |
| Transformer | 0.93 | 0.92 | 0.87 | Long horizons |
| Ensemble | 0.96 | 0.93 | 0.86 | All horizons |
| Model | Lead 1 Precision | Lead 1 Recall | Lead 6 Precision | Lead 6 Recall |
|---|---|---|---|---|
| RF | 0.96 | 0.97 | 0.78 | 0.87 |
| LSTM | 0.97 | 0.85 | 0.82 | 0.79 |
| Transformer | 0.91 | 0.94 | 0.82 | 0.90 |
| Ensemble | 0.96 | 0.96 | 0.80 | 0.92 |
- RF excels at not missing surges (highest recall at Lead 1) but precision drops at long horizons
- LSTM has the fewest false alarms (highest precision at Lead 1) thanks to SurgeJointLoss
- Transformer maintains recall at long horizons where other models degrade
- Ensemble blends strengths: borrows RF's short-range recall and Transformer's long-range stability
| Model | Lead 1 | Lead 3 | Lead 6 |
|---|---|---|---|
| RF | 277 | 346 | 424 |
| LSTM | 280 | 348 | 410 |
| Transformer | 298 | 351 | 401 |
| Ensemble | 277 | 346 | 401 |
RMSE naturally increases with horizon distance (predicting farther ahead is harder).
The Horizon-Aware Ensemble exploits a key insight: no single model dominates across all horizons.
Lead 1-2: RF dominance → Short-term local patterns
Lead 3-4: Balanced → Transition zone
Lead 5-6: TF dominance → Long-range geopolitical patterns
2 plots in data/plots/model_performance/:
- F1 score by horizon (all models)
- Precision/recall comparison chart
| Scenario | Lead 1 | Lead 6 |
|---|---|---|
| If the model predicts "surge" | 96% chance it's genuine | 80% chance it's genuine |
| If a real surge happens | 96% chance model catches it | 92% chance model catches it |
| Misses per 10 genuine surges | < 1 | < 1 |
| False alarms per 10 alerts | < 1 | 2 |
The system is operationally useful at all horizons, with a false alarm rate that stays manageable even at 6-month forecasts.
- Horizon-Aware Ensemble — Ensemble architecture and weights
- Random Forest, LSTM, Transformer — Individual model details
- Surge Detection — How surges are defined
- Training Pipeline — Training methodology
- Project Overview — Research questions answered
- Project Overview — Goals, research questions, methodology, and team
- Glossary — Key terms used throughout this wiki
Raw inputs that feed the prediction system.
| Page | Description |
|---|---|
| Visa Data | US Department of State visa issuance statistics (108 monthly PDFs) |
| Encounter Data | CBP Southwest border encounter statistics (FY2019–2026) |
| Google News | 170K+ news articles across 15 countries × 8 topics |
| Google Trends | Monthly search-interest time series (15 countries × 8 keywords) |
| Exchange Rates | IMF Real Effective Exchange Rate for 6 countries |
The end-to-end flow from raw data to production forecasts.
| Page | Description |
|---|---|
| Data Collection | Ingestion layer: async scraping, bounded concurrency, retry logic |
| Data Processing | PDF parsing, JSON→Parquet, encounter merging |
| NLP Enrichment | Embedding → Clustering → Labeling → Sentiment |
| Panel Construction | Feature engineering: 18 lag features, 6 lead targets |
| Training Pipeline | Out-of-time train/test split, 4 architectures |
| Inference Pipeline | Horizon-aware ensemble, production prediction flow |
Machine learning architectures and their roles in the ensemble.
| Page | Description |
|---|---|
| Random Forest | cuML GPU Random Forest — best at short horizons (Lead 1–2) |
| LSTM | MigrationLSTM — country-aware with SurgeJointLoss |
| Transformer | MigrationTransformer — best at long horizons (Lead 5–6) |
| Horizon-Aware Ensemble | Dynamic weighting: RF→short, Transformer→long |
| SurgeJointLoss | Dual-objective loss: Huber + BCE for crisis detection |
| Jina v5 Embeddings | TensorRT INT8 news article embeddings (768-dim) |
| Flan-T5 Summarization | TensorRT INT8 cluster labeling engine |
Statistical techniques driving the lead-lag and surge analysis.
| Page | Description |
|---|---|
| Lead-Lag Analysis | Pearson correlation at 0–6 month offsets |
| Surge Detection | Quantile-based and σ-threshold spike identification |
| Sentiment Analysis | Rule-based lexicon scoring for migration-relevant news |
| Event Clustering | HDBSCAN GPU clustering + LED label generation |
| Cross-Correlation Analysis | CCF analysis, VAR benchmarking, ADF stationarity tests |
| Multiple Comparison Correction | Benjamini-Hochberg FDR for 58 significant signals |
What the system discovered about migration predictability.
| Page | Description |
|---|---|
| Event-Visa Findings | News events as leading indicators (r=0.617 at 3-month lag) |
| Exchange Rate Findings | Exchange rate signals (DR r=0.498 at 2-month lag) |
| Model Performance | Ensemble results: F1=0.96 at Lead 1, F1=0.86 at Lead 6 |
Reference documentation for every src/ subpackage and key files.
| Page | Description |
|---|---|
| Main Entry Point |
src/main.py CLI: bootstrap, collect-live, sync-data |
| Collection Module |
src/collection/* — visa, encounter, news, trends, HF sync |
| Processing Module |
src/processing/* — parse, merge, build_panel, summarize |
| Analysis Module |
src/analysis/* — events, exchange_rate, trends_analysis, plots |
| Models Module |
src/models/* — surge_model, train_and_evaluate, inference |
| News Scraper | Deep dive: batch decoding, checkpoint recovery, throttling |
| PDF Parser | Deep dive: PyMuPDF table extraction, VISA_MAP normalization |
| TensorRT Engines | Deep dive: Jina-v5, Flan-T5, LED TensorRT engines |
| Build Panel Detail | Deep dive: lag/lead construction, forward-fill strategies |
| HF Sync | Deep dive: bidirectional Hugging Face Hub sync |
Compute, reproducibility, and operational details.
| Page | Description |
|---|---|
| GPU Acceleration | TensorRT INT8, cuML, CUDA streams, NVML profiling |
| Reproducibility | HF bootstrap, run.sh pipeline, dependency checking |