Notebook 06 Robustification by jonathanhhb · Pull Request #188 · laser-base/laser-generic

jonathanhhb · 2026-06-01T20:34:13Z

Note on the branch name

This branch is named notebook-06-only but is no longer accurate to its scope — it now also touches the CI workflow, the docs Makefile, and docs/execute_notebooks.py. The original nb06 fix turned out not to be sufficient on its own: the Build Combined Doc workflow had three independent failure points stacked on top of each other, and only fixing all three got the pipeline to green. Renaming the branch mid-review felt riskier than just noting it here, but a more honest name would be something like unblock-build-combined-doc.

Summary

Gets the Build Combined Doc workflow to a green end-to-end run with a published combined_mkdocs.md artifact. Three independent fixes, each addressing a separate cause of the failure observed on main:

1. nb06: `calc_capacity` overflow (root cause)

Notebook 06's BirthsByCBR + KaplanMeierEstimator setup at CBR=90 over 100 years asks laser-core's calc_capacity for ~146 billion agents — well past UINT32_MAX. The 1.0.3 clamping fix (#413 in laser-core) makes the overflow honest (returns UINT32_MAX instead of a sign-wrapped value) but doesn't reduce the underlying demand; you still get MemoryError: Unable to allocate 16.0 GiB for an array with shape (4294967295,).

The notebook-side fix is to swap to ConstantPopVitalDynamics, which recycles agents and never calls calc_capacity, so capacity stays at the initial population. The biological dynamics the notebook is demonstrating are unchanged; only the implementation of the demographic equilibrium is different.

(Existing commits on this branch: revert to high CBR, swap component, regime-robustness sweep, programmatic validation cell, plot redo, etc.)

2. `NB_TIMEOUT` per-cell default (build-combined-doc workflow)

Two tutorials have heavy parameter-sweep cells that exceed the Makefile's NB_TIMEOUT ?= 600 default:

04_SIR_nobirths_outbreak_size: 115 sims (23 R0 × 5 S0) in one cell
05_SIR_wbirths_age_distribution: 25 sims in one cell

Both were observed to time out at 600 s in CI before this PR (and would have hidden the nb06 failure too — the workflow died at nb04 before reaching nb06 in two consecutive runs).

Added nb_timeout as a workflow_dispatch input on .github/workflows/build-combined-doc.yml with default '1800', forwarded to make docs-jenner NB_TIMEOUT=.... The user does not have to remember to set this on each dispatch — the 600 s Makefile default is overridden automatically. The input exists so a future build with an even heavier notebook can crank it higher without a PR.

3. `NB_EXCLUDE` for research-only notebooks

EW_analysis.ipynb has cell errors unrelated to anything in this branch. Per direction, it's research-only and meant for manual execution, not the automated doc build. Rather than papering over with --allow-errors=true, this PR adds a proper exclude mechanism:

docs/execute_notebooks.py accepts --exclude SUBSTR[,SUBSTR...] (substring match against docs/-relative path)
Makefile defaults NB_EXCLUDE ?= EW_analysis and threads it through to docs-executed-nbs

Excluded notebooks still ship in the rendered mkdocs HTML site (mkdocs-jupyter reads source .ipynb files directly with execute: false in the standard configs), so they're only removed from the automated execute-and-check pipeline, not from the docs. Override on the command line to include them in a one-off run: make docs-jenner NB_EXCLUDE=.

Verification

✅ Local make docs-jenner ALLOW_NB_ERRORS=1 with nb04/05 hidden produced exec'd outputs for all remaining notebooks; only nb06 (then-unfixed) and EW_analysis errored — confirming the diagnosis.
✅ Build Combined Doc workflow run 27154284233 on the latest commit of this branch: success, 1938 s for the build step, all 20 notebooks (21 − EW_analysis) executed cleanly, combined_mkdocs.md artifact uploaded (0.30 MB).

Not in this PR

EW_analysis fix. Errors are unrelated and should be addressed separately if/when someone wants the notebook in the automated pipeline.
Trimming nb04/05 sweep cells. A future cleanup could reduce the 23×5 / 25-sim sweeps to a smaller representative set to bring per-notebook execution time down. NB_TIMEOUT=1800 gives ~3× headroom over the observed runtimes, so this isn't blocking anything; it's a "would be nice" if anyone cares about total CI minutes.
Bumping the Makefile's own NB_TIMEOUT default from 600 to 1800. Kept at 600 deliberately so PR-time builds (mkdocs-pr-notebooks.yml via mkdocs.ci.yml) stay honest about catching unexpectedly slow notebooks. Only the workflow_dispatch path raises it.

The single-sim cell used cbr=90 and the sweep sampled cbrs in [70,100], chosen to shorten the analytic period T ≈ 2π√(A·G) (since A ∝ 1/μ ∝ 1000/CBR) into the 2-4 year range. The realism cost was knowingly accepted in the markdown above, but it pushed the laser-core capacity pre-allocation through the roof: pop · exp(CBR/1000 · years) at 200_000 · e^9 = ~1.6B agents per sim, which compounds across the 10-sim sweep loop. Lower CBR to a realistic global range (single sim: 25; sweep: 15-35). Capacity per sim drops to ~200_000 · e^2.5 ≈ 2.4M — roughly 650× less. T_exp shifts from ~2-4 y to ~5-8 y, still well within the existing cutoff=18250 (50-year burn-in) and distance=T_exp*365/2 peak-finder window — 12-20 cycles fit in the remaining 50 years, plenty for both peak-finding and the autocorrelation FFT. All other params left alone: pop=2e5, nticks=365*100, the beta / inf_mean / R0 sweep ranges, and the cutoff/peakfinder distances. NotebookEdit also re-serialized the two edited cells' source from multi-line array to single string (both valid nbformat) and cleared their cached execution_count. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The notebook ended with a scatter plot that a human had to eyeball to decide whether the demonstration worked. Replace the trust-the-eye step with a numerical check that asserts on regression: - Per-method Pearson correlation of each T_obs estimator against T_exp across the 10 sims (catches an estimator that's reporting noise). - Per-sim best-of-three relative error: the peak-finder / valley-finder / autocorrelation FFT estimators are individually noisy on a stochastic SIR oscillator, but at least one usually agrees, so the per-sim minimum is the right consensus signal. - Median and max of best-of-three across sims with explicit thresholds. - `assert` at the end — a future regression breaks the cell, not just the plot. Thresholds (corr > 0.5, median < 30%, max < 60%) chosen with headroom for the new lower-CBR / weaker-damping regime. The original CBR=70-100 data showed max best-of-three ~21% and per-method correlations >0.8. On the user's first CBR=15-35 run the best-of-three is ~2-24% on the rows visible, well within bounds. Tunable up top via three named constants so a future model change can adjust without re-reading the body. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Two fixes in one commit because they're both the same idea: don't let one known-noisy estimator dominate the verdict. Plot (cell 14): - Identity line was hardcoded [1, 4]; at realistic CBR T_exp now sits in 4-7y so most points fell beyond the reference. Auto-scale to the actual data range with a small pad. - pf1 and pf2 both drew as default "o" with no color spec — they visually merged. Give each a distinct color and marker shape (o / s / x) with a legend. - Add axis labels, title, equal aspect ratio so y=x is actually 45°, grid at low alpha. Test (cell 16): - The original required all 3 methods to correlate with T_exp at r > 0.5. The autocorrelation FFT is known to land on T_exp/2 when the signal carries a second harmonic, which collapses its correlation across a 10-sim ensemble even when the demonstration is working fine (the user's run: pf1 r=0.79, pf2 r=0.81, autocorr r=0.07 → median best-of-three was still only 9.6%). All-3 was inconsistent with the per-sim "best-of-three" philosophy used for the rel-err thresholds. - Relax to MIN_PASSING_METHODS=2 of 3. Still catches an "all methods broken" regression but tolerates one known-noisy estimator. - Report which method(s) fell short in the assert message so the reason for any future failure is immediately legible. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The previous commit reduced CBR to 25 (single sim) / 15-35 (sweep) to control memory, but that broke the demographic equilibrium the notebook quietly relied on. With CBR=25 the headless run showed total population collapsing from 200K → ~57K over 100 years: the KaplanMeierEstimator + 89-year-truncated age pyramid only balances when CBR is high enough that the age-cap truncation is negligible. So the periodicity analysis was running on a non-stationary system. The test "passed" only because period is a timescale and the cutoff=18250 skipped the worst transient — the dynamics underneath were sick. Per author feedback, the high CBR (90 single sim, 70-100 sweep) is a deliberate methodological choice for three reasons: faster equilibration (characteristic timescale ~1/CBR years), shorter analytic period, more endemically stable disease. The right fix is to leave CBR alone and change the demographic component to one that doesn't pre-allocate slots for every future birth. `ConstantPopVitalDynamics` (already used in nb 02 for the same reason) recycles existing agents on death rather than appending new ones, so the LaserFrame stays at the initial 200K regardless of CBR — no capacity blowup, and N is stable by construction with no reliance on a survival estimator tuned to a specific CBR regime. Changes: - Imports: drop BirthsByCBR, MortalityByEstimator, AliasedDistribution, KaplanMeierEstimator; add ConstantPopVitalDynamics. - Cell 4: cbr back to 90; drop pyramid/survival/rate_const/stable_age_dist; build recycle_rates instead of birthrate_map. - Cell 5: pass zero birthrates to Model(...) so no birth slots are pre-allocated; replace BirthsByCBR + MortalityByEstimator with ConstantPopVitalDynamics(model, recycle_rates). - Cell 9: sweep cbrs back to 70 + 30*rand. - Cell 11: same component swap inside the sweep loop. - Cell 16: comment refresh — strike the stale "lower-CBR / weaker-damping" language now that we're back in the high-CBR regime. Bigger-picture follow-up (separate work, not this commit): the author suggests laser-core's calc_capacity should raise when projected size exceeds some threshold, directing users to recycling or squashing rather than silently pre-allocating billions of slots. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The headline test runs at CBR=70-100 because that's where the system equilibrates fast enough and the period is short enough to fit many cycles in a reasonable simulation. The world average is ~18 and Niger is ~45 — so the test directly verifies the analytic formula T ≈ 2π√(A·G) only in a deliberately non-physical regime. A careful reader would correctly conclude the formula was tested *in the demonstrated regime*; a less-careful reader might conclude the formula is trustworthy at realistic CBRs without that having been shown. Two changes to bridge that gap: 1) A markdown caveat between the "Larger test suite" header and the sweep setup. Names the CBR range, explains why the author chose it (faster equilibration, shorter period, endemic stability), and flags that the regime-robustness section near the end is where the formula is checked at realistic CBRs. 2) A "Regime robustness" section after the validation: holds R0=10 and inf_mean=12 fixed, sweeps CBR across [15, 25, 40, 60, 80, 100] (covering both realistic and test-convenience ranges), and plots T_obs/T_exp vs CBR with the realistic range shaded. Prints per-CBR results plus a comparison of mean ratio in the realistic (≤40) vs testing (≥60) ranges, and flags drift > 0.20 between them. Uses the same ID_freq_peakfinder + ID_freq_autocorr helpers defined earlier in the notebook, and the same ConstantPopVitalDynamics demographics so every point on the sweep is also demographically stable. Adds ~3 min of runtime (6 sims × ~30s). Soft check rather than assert: a regime-dependent ratio is informative but not necessarily a failure mode of the notebook — it might just reveal a real limit on the formula's accuracy. Worth knowing, not worth aborting on. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

This PR updates tutorial Notebook 06 to use constant-population “recycling” vital dynamics (ConstantPopVitalDynamics) rather than explicit births + mortality, primarily to avoid pre-allocating population capacity under intentionally high CBR values used to shorten equilibration/oscillation periods in the SIR examples.

Changes:

Replace BirthsByCBR + MortalityByEstimator (and associated demographic setup) with ConstantPopVitalDynamics, and pass zero birthrates to Model(...) to prevent capacity pre-allocation.
Update the observed-vs-analytic period plot to use a data-driven y=x reference range and a square aspect.
Add “Programmatic validation” (assert-based) and a “Regime robustness” CBR sweep section.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Two related issues raised in PR review: 1. mkdocs.ci.yml runs the notebook with allow_errors=false, so a hard `assert` on stochastic thresholds could break the docs build on a bad seed even when the tutorial is otherwise fine. Downgrade the default to a soft warning and gate the AssertionError behind STRICT_VALIDATION=1 — local dev / a dedicated gating CI lane can set the env var, the docs build leaves it unset. 2. NaN-safety in the best-of-three picker. Both validation and regime sweep now guard against non-finite estimator output: - Validation cell: np.corrcoef returns NaN for a constant series (would have been printed as "r = +nan [fail]" and counted as a pass before the relax-to-2-of-3 commit). Now explicitly tagged as "nan" in the per-method print, counted as failed, and surfaced in the failed-methods list. NaN in median_best / max_best aggregates now produces an explicit failure line instead of silently passing the threshold comparison. - Regime-sweep cell: `min(candidates, key=lambda t: abs(t-T_exp))` was not NaN-safe — Python's min with NaN can pick the NaN as "best", propagating it into ratio and plots. Filter to finite candidates first; fall back to NaN with an explicit warning if all three estimators are non-finite. Drift comparison switched to np.nanmean to tolerate the rare NaN fallback. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

NotebookEdit writes .ipynb files without a trailing newline, which trips the end-of-file-fixer pre-commit hook on the CI lane. Locally this gets fixed by the user's pre-commit, but commits made in this sandbox don't run the hook so it slipped through. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

So this branch can add NB_TIMEOUT plumbing in the same PR.

The Makefile defaults NB_TIMEOUT?=600 (seconds, per cell). Two tutorial notebooks have heavy parameter-sweep cells that exceed that: - nb04 (SIR_nobirths_outbreak_size): 115 sims (23 R0s × 5 S0s) - nb05 (SIR_wbirths_age_distribution): 25 sims Both cells were observed to hit the 600 s cell timeout in CI, causing the workflow to fail before nb06 (which has its own separate fix on this branch) could be evaluated. Add `nb_timeout` as a workflow_dispatch input (string, default '1800') alongside the existing `allow_notebook_errors`, and forward it as `NB_TIMEOUT=...` to `make docs-jenner`. 1800 s gives ~3× headroom over the observed cell runtimes; per-cell rather than per-notebook, so the many short cells in the rest of the suite are unaffected. Default 1800 was chosen deliberately to make the dispatch path "just work" without anyone having to remember to override the input. PR-time builds keep the original 600 s default (mkdocs-pr-notebooks.yml uses mkdocs.ci.yml + mkdocs-jupyter directly, not this workflow), so PR runs stay honest about catching unexpectedly slow notebooks. Note that this PR is named `notebook-06-only` but now also carries this CI plumbing change — the two are tied: nb06 won't be reached in CI until nb04/05 stop hitting the cell timeout, so shipping the nb06 fix without the timeout change wouldn't actually unblock the doc build. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

The Build Combined Doc CI run on this branch executed all 21 notebooks cleanly — including nb 06 with the ConstantPopVitalDynamics swap and nb 04 / nb 05 with the bumped NB_TIMEOUT — but the check step still failed because `EW_analysis.ipynb` has unrelated cell errors. Per guidance, that notebook is research-only and is meant to be executed manually, not as part of the automated doc build. Add a comma-separated `--exclude` argument to docs/execute_notebooks.py that substring-matches against the docs/-relative path. The Makefile defaults `NB_EXCLUDE ?= EW_analysis`, so excluded notebooks fall out of execution by default without anyone having to remember the override. Override on the command line (or pass the empty string) to include research notebooks in a one-off build: make docs-jenner NB_EXCLUDE= # execute everything make docs-jenner NB_EXCLUDE=other_research # different override Excluded notebooks still ship in the mkdocs HTML site (mkdocs-jupyter reads source .ipynb files directly with execute:false in the standard configs), so removing them from automated execution does not remove them from the rendered docs — only from the pipeline that produces combined_mkdocs.md. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

+   "cell_type": "code",
+   "source": "import os\n\n# Tolerances picked with run-to-run variation in mind. With CBR=70-100 and\n# stable population (via ConstantPopVitalDynamics recycling), max\n# best-of-three relative error tends to sit around 20% and per-method\n# correlations are typically >0.7. Tighten if the model gets cleaner; loosen\n# only if a legitimate parameter range starts hitting the wall.\n#\n# We require *at least* MIN_PASSING_METHODS of the three estimators to correlate\n# with T_exp. The autocorrelation FFT is known to land on T_exp/2 when the\n# signal carries a second harmonic, and the peak-finder can drift on noisy\n# signals — but the three together are robust, which mirrors the per-sim\n# \"best-of-three\" relative error philosophy used below.\n#\n# By default a failure prints a soft warning and continues, so a stochastic\n# outlier doesn't kill the docs build (the notebook is executed in CI under\n# allow_errors=false via mkdocs.ci.yml). Set STRICT_VALIDATION=1 in the env\n# to convert failures into a hard AssertionError for local validation or a\n# dedicated gating CI lane.\nCORR_THRESHOLD = 0.5\nMIN_PASSING_METHODS = 2\nMEDIAN_REL_ERR_THRESHOLD = 0.30\nMAX_REL_ERR_THRESHOLD = 0.60\nSTRICT_VALIDATION = os.environ.get(\"STRICT_VALIDATION\", \"0\") == \"1\"\n\nmethods = [\"T_obs_peakfinder1\", \"T_obs_peakfinder2\", \"T_obs_autocorr\"]\nT_exp_arr = params_df[\"T_exp\"].to_numpy()\nrel_err_per_method = {\n    m: np.abs(params_df[m].to_numpy() - T_exp_arr) / T_exp_arr for m in methods\n}\nbest_per_sim = np.minimum.reduce(list(rel_err_per_method.values()))\ncorrelations = {m: float(np.corrcoef(T_exp_arr, params_df[m].to_numpy())[0, 1]) for m in methods}\n\nprint(\"=\" * 64)\nprint(\"Periodicity validation: T_obs vs T_exp\")\nprint(\"=\" * 64)\n\nprint(\"\\nPer-method correlation with T_exp:\")\npassing_methods = 0\nfor m, r in correlations.items():\n    # NaN guard: np.corrcoef returns NaN for a constant series; treat as fail.\n    if math.isnan(r):\n        ok = False\n        status = \"nan\"\n    else:\n        ok = r > CORR_THRESHOLD\n        status = \"PASS\" if ok else \"fail\"\n    print(f\"  {m:24s}  r = {r:+.3f}  [{status}]\")\n    if ok:\n        passing_methods += 1\n\nprint(f\"\\nMethods passing correlation threshold: {passing_methods}/{len(methods)}  (need ≥ {MIN_PASSING_METHODS})\")\n\nprint(\"\\nPer-sim best-of-three relative error:\")\nfor i, err in enumerate(best_per_sim):\n    print(f\"  sim {i}: {err * 100:5.1f}%\")\n\nmedian_best = float(np.median(best_per_sim))\nmax_best = float(np.max(best_per_sim))\nprint(f\"\\nMedian best-of-three rel error: {median_best * 100:5.1f}%  (threshold {MEDIAN_REL_ERR_THRESHOLD * 100:.0f}%)\")\nprint(f\"Max best-of-three rel error:    {max_best * 100:5.1f}%  (threshold {MAX_REL_ERR_THRESHOLD * 100:.0f}%)\")\n\nfailures = []\nif passing_methods < MIN_PASSING_METHODS:\n    failed_methods = [m for m, r in correlations.items() if math.isnan(r) or r <= CORR_THRESHOLD]\n    failures.append(\n        f\"only {passing_methods}/{len(methods)} methods correlated with T_exp \"\n        f\"(failed: {', '.join(failed_methods)})\"\n    )\n# NaN aggregates would otherwise silently pass the threshold comparisons —\n# count them as failures explicitly.\nif math.isnan(median_best):\n    failures.append(\"median best-of-three is NaN (an estimator returned non-finite values)\")\nelif median_best > MEDIAN_REL_ERR_THRESHOLD:\n    failures.append(f\"median best-of-three {median_best:.3f} > {MEDIAN_REL_ERR_THRESHOLD}\")\nif math.isnan(max_best):\n    failures.append(\"max best-of-three is NaN (an estimator returned non-finite values)\")\nelif max_best > MAX_REL_ERR_THRESHOLD:\n    failures.append(f\"max best-of-three {max_best:.3f} > {MAX_REL_ERR_THRESHOLD}\")\n\nprint(\"\\n\" + \"=\" * 64)\nif failures:\n    msg = \"Periodicity validation FAILED:\\n  - \" + \"\\n  - \".join(failures)\n    if STRICT_VALIDATION:\n        raise AssertionError(msg)\n    print(f\"⚠️  SOFT-WARN: {msg}\")\n    print(\"   (export STRICT_VALIDATION=1 to convert this into a hard failure.)\")\nelse:\n    print(f\"ALL CHECKS PASSED — T_obs tracks T_exp across {len(params_df)} sims\")\nprint(\"=\" * 64)",


Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

… review) The validation cell's soft-warn rationale comment named mkdocs.ci.yml and allow_errors=false as the reason for not raising hard, but that's the PR-time pipeline. The Build Combined Doc workflow this PR exercises runs notebooks via docs/execute_notebooks.py (which always passes nbconvert --allow-errors so cells can raise without failing nbconvert) and gates on the result via docs/check_executed_nbs.py, governed by ALLOW_NB_ERRORS (default 0 in this workflow). Reword to name the real gate. The rationale is unchanged: a hard AssertionError on a stochastic outlier would land as a cell error and fail the build, so soft-warn by default + STRICT_VALIDATION=1 opt-in remains the right design. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

KevinMcCarthyAtIDM

Looks good to me. At some point, we should come back to this and try to understand why the match between expectation and observation isn't better than it is, but that appears to have been happening in all versions of the notebook and this PR doesn't change that.

Jonathan Bloedow and others added 7 commits June 1, 2026 13:30

Removed comment in notebook which referred to previous configuration.

3ade19e

ruff

44a1f8f

jonathanhhb requested a review from Copilot June 1, 2026 20:34

Copilot started reviewing on behalf of jonathanhhb June 1, 2026 20:34 View session

Copilot AI reviewed Jun 1, 2026

View reviewed changes

Comment thread docs/tutorials/notebooks/06_SIR_wbirths_natural_periodicity.ipynb Outdated

Comment thread docs/tutorials/notebooks/06_SIR_wbirths_natural_periodicity.ipynb Outdated

Jonathan Bloedow and others added 2 commits June 1, 2026 13:52

jonathanhhb requested a review from KevinMcCarthyAtIDM June 1, 2026 21:11

jonathanhhb changed the title ~~Notebook 06 only~~ Notebook 06 Robustification Jun 1, 2026

jonathanhhb mentioned this pull request Jun 2, 2026

ci: replace mkdocs-pr.yml with path-gated notebook-executing variant #189

Open

jonathanhhb assigned KevinMcCarthyAtIDM Jun 2, 2026

Jonathan Bloedow and others added 3 commits June 8, 2026 08:27

Merge origin/main: pick up build-combined-doc workflow + Makefile

9d90183

So this branch can add NB_TIMEOUT plumbing in the same PR.

jonathanhhb requested a review from Copilot June 8, 2026 18:24

Copilot started reviewing on behalf of jonathanhhb June 8, 2026 18:24 View session

Copilot AI reviewed Jun 8, 2026

View reviewed changes

jonathanhhb and others added 2 commits June 8, 2026 11:44

Potential fix for pull request finding

5dcbc40

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

KevinMcCarthyAtIDM approved these changes Jun 17, 2026

View reviewed changes

jonathanhhb merged commit 76897ba into main Jun 17, 2026
9 checks passed

jonathanhhb deleted the notebook-06-only branch June 17, 2026 18:12

YeChen-IDM mentioned this pull request Jun 17, 2026

docs: run notebook 06 at reduced scale in CI to avoid combined-doc build OOM #194

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Notebook 06 Robustification#188

Notebook 06 Robustification#188
jonathanhhb merged 14 commits into
mainfrom
notebook-06-only

jonathanhhb commented Jun 1, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

KevinMcCarthyAtIDM left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		"cell_type": "code",
		"source": "import os\n\n# Tolerances picked with run-to-run variation in mind. With CBR=70-100 and\n# stable population (via ConstantPopVitalDynamics recycling), max\n# best-of-three relative error tends to sit around 20% and per-method\n# correlations are typically >0.7. Tighten if the model gets cleaner; loosen\n# only if a legitimate parameter range starts hitting the wall.\n#\n# We require at least MIN_PASSING_METHODS of the three estimators to correlate\n# with T_exp. The autocorrelation FFT is known to land on T_exp/2 when the\n# signal carries a second harmonic, and the peak-finder can drift on noisy\n# signals — but the three together are robust, which mirrors the per-sim\n# \"best-of-three\" relative error philosophy used below.\n#\n# By default a failure prints a soft warning and continues, so a stochastic\n# outlier doesn't kill the docs build (the notebook is executed in CI under\n# allow_errors=false via mkdocs.ci.yml). Set STRICT_VALIDATION=1 in the env\n# to convert failures into a hard AssertionError for local validation or a\n# dedicated gating CI lane.\nCORR_THRESHOLD = 0.5\nMIN_PASSING_METHODS = 2\nMEDIAN_REL_ERR_THRESHOLD = 0.30\nMAX_REL_ERR_THRESHOLD = 0.60\nSTRICT_VALIDATION = os.environ.get(\"STRICT_VALIDATION\", \"0\") == \"1\"\n\nmethods = [\"T_obs_peakfinder1\", \"T_obs_peakfinder2\", \"T_obs_autocorr\"]\nT_exp_arr = params_df[\"T_exp\"].to_numpy()\nrel_err_per_method = {\n m: np.abs(params_df[m].to_numpy() - T_exp_arr) / T_exp_arr for m in methods\n}\nbest_per_sim = np.minimum.reduce(list(rel_err_per_method.values()))\ncorrelations = {m: float(np.corrcoef(T_exp_arr, params_df[m].to_numpy())[0, 1]) for m in methods}\n\nprint(\"=\" * 64)\nprint(\"Periodicity validation: T_obs vs T_exp\")\nprint(\"=\" * 64)\n\nprint(\"\\nPer-method correlation with T_exp:\")\npassing_methods = 0\nfor m, r in correlations.items():\n # NaN guard: np.corrcoef returns NaN for a constant series; treat as fail.\n if math.isnan(r):\n ok = False\n status = \"nan\"\n else:\n ok = r > CORR_THRESHOLD\n status = \"PASS\" if ok else \"fail\"\n print(f\" {m:24s} r = {r:+.3f} [{status}]\")\n if ok:\n passing_methods += 1\n\nprint(f\"\\nMethods passing correlation threshold: {passing_methods}/{len(methods)} (need ≥ {MIN_PASSING_METHODS})\")\n\nprint(\"\\nPer-sim best-of-three relative error:\")\nfor i, err in enumerate(best_per_sim):\n print(f\" sim {i}: {err * 100:5.1f}%\")\n\nmedian_best = float(np.median(best_per_sim))\nmax_best = float(np.max(best_per_sim))\nprint(f\"\\nMedian best-of-three rel error: {median_best * 100:5.1f}% (threshold {MEDIAN_REL_ERR_THRESHOLD * 100:.0f}%)\")\nprint(f\"Max best-of-three rel error: {max_best * 100:5.1f}% (threshold {MAX_REL_ERR_THRESHOLD * 100:.0f}%)\")\n\nfailures = []\nif passing_methods < MIN_PASSING_METHODS:\n failed_methods = [m for m, r in correlations.items() if math.isnan(r) or r <= CORR_THRESHOLD]\n failures.append(\n f\"only {passing_methods}/{len(methods)} methods correlated with T_exp \"\n f\"(failed: {', '.join(failed_methods)})\"\n )\n# NaN aggregates would otherwise silently pass the threshold comparisons —\n# count them as failures explicitly.\nif math.isnan(median_best):\n failures.append(\"median best-of-three is NaN (an estimator returned non-finite values)\")\nelif median_best > MEDIAN_REL_ERR_THRESHOLD:\n failures.append(f\"median best-of-three {median_best:.3f} > {MEDIAN_REL_ERR_THRESHOLD}\")\nif math.isnan(max_best):\n failures.append(\"max best-of-three is NaN (an estimator returned non-finite values)\")\nelif max_best > MAX_REL_ERR_THRESHOLD:\n failures.append(f\"max best-of-three {max_best:.3f} > {MAX_REL_ERR_THRESHOLD}\")\n\nprint(\"\\n\" + \"=\" * 64)\nif failures:\n msg = \"Periodicity validation FAILED:\\n - \" + \"\\n - \".join(failures)\n if STRICT_VALIDATION:\n raise AssertionError(msg)\n print(f\"⚠️ SOFT-WARN: {msg}\")\n print(\" (export STRICT_VALIDATION=1 to convert this into a hard failure.)\")\nelse:\n print(f\"ALL CHECKS PASSED — T_obs tracks T_exp across {len(params_df)} sims\")\nprint(\"=\" * 64)",

Conversation

jonathanhhb commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Note on the branch name

Summary

1. nb06: calc_capacity overflow (root cause)

2. NB_TIMEOUT per-cell default (build-combined-doc workflow)

3. NB_EXCLUDE for research-only notebooks

Verification

Not in this PR

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

KevinMcCarthyAtIDM left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jonathanhhb commented Jun 1, 2026 •

edited

Loading

1. nb06: `calc_capacity` overflow (root cause)

2. `NB_TIMEOUT` per-cell default (build-combined-doc workflow)

3. `NB_EXCLUDE` for research-only notebooks