model performance

Model Performance

Comprehensive evaluation of all four architectures on the out-of-time (OOT) test set (train ≤ 2022, test 2023–2025).

Ensemble Results (Production Model)

Metric	Lead 1	Lead 2	Lead 3	Lead 4	Lead 5	Lead 6
F1 Score	0.96	0.95	0.93	0.91	0.88	0.86
Precision	0.96	0.94	0.92	0.89	0.84	0.80
Recall	0.96	0.95	0.94	0.93	0.92	0.92
RMSE	277–298	~310	346–351	~370	~390	401–424

Critical achievement: Recall > 92% across all horizons — the system catches 9+ of every 10 genuine crises while maintaining ≤ 20% false alarm rate.

Individual Model Comparison

F1 Score by Horizon

Model	Lead 1	Lead 3	Lead 6	Best At
RF	0.97	0.91	0.82	Short horizons
LSTM	0.95	0.92	0.84	Mid horizons
Transformer	0.93	0.92	0.87	Long horizons
Ensemble	0.96	0.93	0.86	All horizons

Precision vs. Recall Tradeoff

Model	Lead 1 Precision	Lead 1 Recall	Lead 6 Precision	Lead 6 Recall
RF	0.96	0.97	0.78	0.87
LSTM	0.97	0.85	0.82	0.79
Transformer	0.91	0.94	0.82	0.90
Ensemble	0.96	0.96	0.80	0.92

Interpretation

RF excels at not missing surges (highest recall at Lead 1) but precision drops at long horizons
LSTM has the fewest false alarms (highest precision at Lead 1) thanks to SurgeJointLoss
Transformer maintains recall at long horizons where other models degrade
Ensemble blends strengths: borrows RF's short-range recall and Transformer's long-range stability

Volume RMSE

Model	Lead 1	Lead 3	Lead 6
RF	277	346	424
LSTM	280	348	410
Transformer	298	351	401
Ensemble	277	346	401

RMSE naturally increases with horizon distance (predicting farther ahead is harder).

Why Ensemble Wins

The Horizon-Aware Ensemble exploits a key insight: no single model dominates across all horizons.

Lead 1-2: RF dominance  → Short-term local patterns
Lead 3-4: Balanced      → Transition zone
Lead 5-6: TF dominance  → Long-range geopolitical patterns

Visualizations

2 plots in data/plots/model_performance/:

F1 score by horizon (all models)
Precision/recall comparison chart

Operational Interpretation

Scenario	Lead 1	Lead 6
If the model predicts "surge"	96% chance it's genuine	80% chance it's genuine
If a real surge happens	96% chance model catches it	92% chance model catches it
Misses per 10 genuine surges	< 1	< 1
False alarms per 10 alerts	< 1	2

The system is operationally useful at all horizons, with a false alarm rate that stays manageable even at 6-month forecasts.

Wiki navigation

Quick Start

Project Overview — Goals, research questions, methodology, and team
Glossary — Key terms used throughout this wiki

Data Sources

Raw inputs that feed the prediction system.

Page	Description
Visa Data	US Department of State visa issuance statistics (108 monthly PDFs)
Encounter Data	CBP Southwest border encounter statistics (FY2019–2026)
Google News	170K+ news articles across 15 countries × 8 topics
Google Trends	Monthly search-interest time series (15 countries × 8 keywords)
Exchange Rates	IMF Real Effective Exchange Rate for 6 countries

Pipeline

The end-to-end flow from raw data to production forecasts.

Page	Description
Data Collection	Ingestion layer: async scraping, bounded concurrency, retry logic
Data Processing	PDF parsing, JSON→Parquet, encounter merging
NLP Enrichment	Embedding → Clustering → Labeling → Sentiment
Panel Construction	Feature engineering: 18 lag features, 6 lead targets
Training Pipeline	Out-of-time train/test split, 4 architectures
Inference Pipeline	Horizon-aware ensemble, production prediction flow

Models

Machine learning architectures and their roles in the ensemble.

Page	Description
Random Forest	cuML GPU Random Forest — best at short horizons (Lead 1–2)
LSTM	MigrationLSTM — country-aware with SurgeJointLoss
Transformer	MigrationTransformer — best at long horizons (Lead 5–6)
Horizon-Aware Ensemble	Dynamic weighting: RF→short, Transformer→long
SurgeJointLoss	Dual-objective loss: Huber + BCE for crisis detection
Jina v5 Embeddings	TensorRT INT8 news article embeddings (768-dim)
Flan-T5 Summarization	TensorRT INT8 cluster labeling engine

Analysis Methods

Statistical techniques driving the lead-lag and surge analysis.

Page	Description
Lead-Lag Analysis	Pearson correlation at 0–6 month offsets
Surge Detection	Quantile-based and σ-threshold spike identification
Sentiment Analysis	Rule-based lexicon scoring for migration-relevant news
Event Clustering	HDBSCAN GPU clustering + LED label generation
Cross-Correlation Analysis	CCF analysis, VAR benchmarking, ADF stationarity tests
Multiple Comparison Correction	Benjamini-Hochberg FDR for 58 significant signals

Key Findings

What the system discovered about migration predictability.

Page	Description
Event-Visa Findings	News events as leading indicators (r=0.617 at 3-month lag)
Exchange Rate Findings	Exchange rate signals (DR r=0.498 at 2-month lag)
Model Performance	Ensemble results: F1=0.96 at Lead 1, F1=0.86 at Lead 6

Source Modules

Reference documentation for every src/ subpackage and key files.

Page	Description
Main Entry Point	`src/main.py` CLI: bootstrap, collect-live, sync-data
Collection Module	`src/collection/*` — visa, encounter, news, trends, HF sync
Processing Module	`src/processing/*` — parse, merge, build_panel, summarize
Analysis Module	`src/analysis/*` — events, exchange_rate, trends_analysis, plots
Models Module	`src/models/*` — surge_model, train_and_evaluate, inference
News Scraper	Deep dive: batch decoding, checkpoint recovery, throttling
PDF Parser	Deep dive: PyMuPDF table extraction, VISA_MAP normalization
TensorRT Engines	Deep dive: Jina-v5, Flan-T5, LED TensorRT engines
Build Panel Detail	Deep dive: lag/lead construction, forward-fill strategies
HF Sync	Deep dive: bidirectional Hugging Face Hub sync

Infrastructure

Compute, reproducibility, and operational details.

Page	Description
GPU Acceleration	TensorRT INT8, cuML, CUDA streams, NVML profiling
Reproducibility	HF bootstrap, run.sh pipeline, dependency checking

model performance

Model Performance

Ensemble Results (Production Model)

Individual Model Comparison

F1 Score by Horizon

Precision vs. Recall Tradeoff

Interpretation

Volume RMSE

Why Ensemble Wins

Visualizations

Operational Interpretation

See Also

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Wiki navigation

Quick Start

Data Sources

Pipeline

Models

Analysis Methods

Key Findings

Source Modules

Infrastructure

Clone this wiki locally