Research notebooks and optimization tools for the EstWarden Baltic Security Monitor. All analysis runs against the public dataset.
git clone https://github.com/Estwarden/research.git
git clone https://github.com/Estwarden/dataset.git data
cd research
pip install -r requirements.txt
jupyter notebook notebooks/Run in order — notebook 01 produces daily_matrix.parquet that all others load.
| # | Notebook | Question | Method |
|---|---|---|---|
| 01 | Data Profile | What shape is the data? | Align 20K signals with 497 indicator labels, build daily matrix |
| 02 | Lead Indicators | Which sources spike before YELLOW? | Point-biserial correlation at lag 0-3, Random Forest importance |
| 03 | Anomaly Thresholds | What z-score = real anomaly? | Per-source ROC, Youden's J, bootstrap stability |
| 04 | Narrative Velocity | How fast do narratives spread? | Velocity, amplification ratio, campaign predictor |
| 05 | Source Independence | Are sources redundant? | Correlation matrix, mutual information, PCA decomposition |
| 06 | CTI Rebuild | Can we beat hand-tuned weights? | Logistic regression vs gradient boosting, time-series CV |
From notebook 03 — per-source anomaly score:
From notebook 04 — campaign prediction:
From notebook 06 — logistic CTI (deployable):
Automated CTI optimization (Karpathy pattern):
cd autoresearch
python3 optimize.py # Phase 1: 85K trials, 3-fold CV
./run.sh # Phase 2: LLM structural improvements- Composite Threat Index — formula, weights, thresholds
- Findings — what changed in production
| Repo | What |
|---|---|
| Dataset | 27K signals, 20 sources, indicators, campaigns |
| Collectors | Data collection pipelines (Dagu DAGs) |
| Integrations | MCP server, Home Assistant, CLI |
| estwarden.eu | Live dashboard |
MIT