Add snapshot fixes from polioland#360
Conversation
Multi-segment simulations (save snapshot, reload, continue) had three bugs in the core snapshot infrastructure that caused visible discontinuities: 1. date_of_death offset: absolute-timestep death schedules were not adjusted when reloading into a new segment starting at t=0, causing agents to live too long (their deaths were scheduled far in the future relative to the new timeline). save_snapshot now accepts t= to record t_snap; load_snapshot automatically subtracts t_snap from date_of_death and clamps to >= 1. 2. pop_final: results.pop[0] was reset to init_pop on reload, ignoring all births and deaths from the first segment and causing a population jump at the boundary. save_snapshot now accepts pop_final= (per-node population at snapshot time); load_snapshot returns it via pars["pop_final"] so callers can restore results.pop[0, :]. 3. keep_mask: save_snapshot now accepts an optional keep_mask boolean array to squash terminal-state agents (e.g. fully-recovered) before writing, reducing snapshot size without requiring callers to mutate the frame first. All three params are optional and backwards-compatible. t_snap and pop_final are surfaced in the returned pars dict for model-specific use. Also adds: - 8 unit tests in test_laserframe.py covering round-trips, offset math, clamping, and absent-field behaviour - tests/test_snapshot_continuity.py: a model-agnostic continuity test built on a minimal deterministic ABM (fixed lifespan + constant birth rate) that directly exercises all three bugs and doubles as a runnable visual tool Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR updates LaserFrame snapshot save/load to support multi-segment simulations without discontinuities, and adds regression tests to catch the previously observed boundary issues.
Changes:
- Extend
LaserFrame.save_snapshot()to optionally persistt_snap,pop_final, and apply an optionalkeep_mask. - Extend
LaserFrame.load_snapshot()to surfacet_snap/pop_finalviaparsand offset/clampdate_of_deathwhent_snapis present. - Add unit and integration-style tests to validate snapshot round-trips and continuity across a save/reload boundary.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
src/laser/core/laserframe.py |
Adds optional snapshot metadata (t_snap, pop_final) and continuity adjustment logic on load; introduces keep_mask handling. |
tests/test_laserframe.py |
Adds unit tests for t_snap, pop_final, keep_mask, and date_of_death offset/clamping behavior. |
tests/test_snapshot_continuity.py |
Adds a deterministic staged-vs-flat continuity test to detect boundary discontinuities. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…ad time. We can fix at save time.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| to >= 1). The live frame is not mutated. | ||
| pop_final (np.ndarray, optional): 1-D array of final population counts per node | ||
| at the snapshot boundary. Returned via pars["pop_final"] on load so the | ||
| caller can restore results.pop[0, :] for a continuous population time-series. | ||
| keep_mask (np.ndarray, optional): Boolean array of shape (count,). If provided, | ||
| squash(keep_mask) is called before writing to compact terminal-state agents | ||
| out of the snapshot (e.g. fully-recovered agents that no longer affect dynamics). |
There was a problem hiding this comment.
The docstring says the live frame is not mutated, but when keep_mask is provided this method calls self.squash(keep_mask), which mutates the frame (shrinks count and compacts scalar arrays). Please document this side effect explicitly (or clarify that only the date_of_death offset is non-mutating) so callers don’t assume save_snapshot() is safe to call mid-simulation without changing in-memory state.
| to >= 1). The live frame is not mutated. | |
| pop_final (np.ndarray, optional): 1-D array of final population counts per node | |
| at the snapshot boundary. Returned via pars["pop_final"] on load so the | |
| caller can restore results.pop[0, :] for a continuous population time-series. | |
| keep_mask (np.ndarray, optional): Boolean array of shape (count,). If provided, | |
| squash(keep_mask) is called before writing to compact terminal-state agents | |
| out of the snapshot (e.g. fully-recovered agents that no longer affect dynamics). | |
| to >= 1). This offsetting is applied only to the data written to disk; | |
| the in-memory 'date_of_death' property on the live frame is not mutated. | |
| pop_final (np.ndarray, optional): 1-D array of final population counts per node | |
| at the snapshot boundary. Returned via pars["pop_final"] on load so the | |
| caller can restore results.pop[0, :] for a continuous population time-series. | |
| keep_mask (np.ndarray, optional): Boolean array of shape (count,). If provided, | |
| ``squash(keep_mask)`` is called on this LaserFrame before writing, which | |
| mutates the live frame in-place by shrinking ``_count`` and compacting | |
| 1-D scalar numpy arrays. Callers should not assume that the frame remains | |
| unchanged when a ``keep_mask`` is supplied. |
| self._save(f, "people") | ||
|
|
||
| # Write date_of_death already offset to the new segment's timeline so | ||
| # load_snapshot receives clean, ready-to-use values with no post-load fixup. | ||
| # We overwrite the dataset written by _save without touching the live frame. | ||
| if t is not None and "date_of_death" in self._properties: | ||
| dod = self._properties["date_of_death"] | ||
| adjusted = np.maximum(dod[: self._count].astype(np.int64) - t, 1).astype(dod.dtype) | ||
| del f["people"]["date_of_death"] | ||
| f["people"].create_dataset("date_of_death", data=adjusted) |
There was a problem hiding this comment.
save_snapshot() writes the full date_of_death dataset via _save(), then immediately deletes and recreates it when t is provided. For large frames this doubles I/O and memory for that column. Consider avoiding the initial write for date_of_death (e.g., allow _save() to skip/override specific properties) so the adjusted values are written once.
| # Restore pop[0] of seg2 from pop_final so the stitched series is continuous | ||
| if "pop_final" in pars: | ||
| pop_seg2[0] = int(pars["pop_final"].sum()) | ||
|
|
There was a problem hiding this comment.
test_pop_final_continuity() currently doesn’t actually validate the pop_final plumbing: pop_seg2[0] is initialized from frame2.count, which already equals frame1.count at the boundary, so the assertion would still pass even if pop_final were never saved/loaded/used. To exercise the bug described in the PR, simulate a caller that resets the segment-2 population baseline incorrectly (e.g., set pop_seg2[0] to INIT_POP/0 before applying pop_final), and assert that continuity is only restored when pars["pop_final"] is present and applied.
Several improvements following code review: - save_snapshot: eliminate double-write of date_of_death by computing the t-offset array before opening the HDF5 file and passing it as an override to _save(), which writes each field exactly once. Previously the raw value was written by _save() then immediately deleted and rewritten, doubling I/O for that column on large frames. - save_snapshot docstring: clarify that the t= offset touches only the data written to disk (in-memory frame unchanged), and explicitly document that keep_mask mutates the frame via squash() so callers are not surprised. - test_snapshot_continuity: replace vacuous test_pop_final_continuity with one that actually exercises the plumbing — simulates a model incorrectly resetting its pop baseline to INIT_POP on reload, asserts that is wrong, then shows pars["pop_final"] corrects it. - test_snapshot_continuity: add test_keep_mask_staged_run to cover the third save_snapshot feature that was missing from continuity-level testing. - test_snapshot_continuity: expand module docstring with background on the three bugs being guarded against (previously referenced an external file not present in the repo), update coverage summary, fix stale comment that attributed the date_of_death offset to load_snapshot. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
clorton
left a comment
There was a problem hiding this comment.
I would really like us to implement a base/core save/load to HDF5 which we then wrap with all the special cases (date_of_death offset, patches restore along with people, etc.).
Fix snapshot continuity bugs in LaserFrame and add continuity tests
Multi-segment simulations (save snapshot, reload, continue) had three bugs
in the core snapshot infrastructure that caused visible discontinuities:
date_of_death offset: absolute-timestep death schedules were not adjusted
when reloading into a new segment starting at t=0, causing agents to live
too long (their deaths were scheduled far in the future relative to the
new timeline). save_snapshot now accepts t= to
automatically subtracts time steps elapsed from date_of_death.
pop_final: results.pop[0] was reset to init_pop on reload, ignoring all
births and deaths from the first segment and causing a population jump at
the boundary. save_snapshot now accepts pop_final= (per-node population at
snapshot time); load_snapshot returns it via pars["pop_final"] so callers
can restore results.pop[0, :].
keep_mask: save_snapshot now accepts an optional keep_mask boolean array
to squash terminal-state agents (e.g. fully-recovered) before writing,
reducing snapshot size without requiring callers to mutate the frame first.
All three params are optional and backwards-compatible. t_snap and pop_final
are surfaced in the returned pars dict for model-specific use.
Also adds:
clamping, and absent-field behaviour
on a minimal deterministic ABM (fixed lifespan + constant birth rate) that
directly exercises all three bugs and doubles as a runnable visual tool