Skip to content

Fix StatisticsPanel pickling: serialize metrics by registry name, atomic writes#1

Open
zacsims wants to merge 1 commit intomainfrom
fix/panel-pickle-registry
Open

Fix StatisticsPanel pickling: serialize metrics by registry name, atomic writes#1
zacsims wants to merge 1 commit intomainfrom
fix/panel-pickle-registry

Conversation

@zacsims
Copy link
Copy Markdown
Collaborator

@zacsims zacsims commented Apr 22, 2026

Summary

Fixes the long-standing Bug A in the panel/summary infrastructure where summary_create_panel(..., preset=...) and summary_add_metric(...) fail with:

PicklingError: Can't pickle <function cell_counts at 0x...>:
it's not the same object as spatialtissuepy.summary.population.cell_counts

and, under parallel tool calls, a follow-on:

UnpicklingError: Ran out of input

Two symptoms, two independent root causes in the same subsystem.

Root cause 1: MetricInfo.func is not the module attribute of the same qualname

registry.register() stores the raw func on the MetricInfo object but returns a functools.wraps-wrapped function as the module attribute. Pickle resolves callables by __qualname__ at unpickle time, finds the wrapper at e.g. spatialtissuepy.summary.population.cell_counts, compares identity against the stored raw func, mismatch → refuses to serialize.

Fix: MetricInfo.__reduce__ now pickles by name and re-resolves from the live registry via _resolve_metric_for_pickle(name) on load. Built-in metrics register at import, so panels round-trip without user action. Registered custom metrics round-trip as long as the consumer re-registers before loading; if they don't, unpickle raises a descriptive RuntimeError naming the missing metric instead of an identity-mismatch. Inline metrics added via StatisticsPanel.add_custom_function() are per-panel only and correctly refuse to pickle with a TypeError pointing users at register_custom_metric.

Root cause 2: non-atomic writes produce truncated pickles under concurrency

SessionManager.store_panel / store_data used open(path, 'wb'); pickle.dump(...). Two parallel MCP tool calls can open the same file simultaneously — whichever writer finishes second with less data truncates the file the first had already filled, and the next read sees "Ran out of input".

Fix: _atomic_pickle_dump writes to a per-call temp file (pid + thread id + uuid slug, so in-process threading doesn't collide either) and os.replace-s it into place. Readers always see either the previous file or a complete new one — never a torn write. Failed writes clean up the temp file so no partial bytes leak.

Tests

New tests/test_summary_panel_pickle.py, 11 cases:

  • All four preset panels (basic, spatial, neighborhood, comprehensive) round-trip through pickle.
  • Registered custom metric round-trips.
  • Inline custom function refuses to pickle with a clear TypeError.
  • Custom metric removed from the registry between dump and load raises RuntimeError naming the missing metric.
  • Eight concurrent threads writing different payloads to the same path via _atomic_pickle_dump always leave a valid complete pickle on disk.
  • Failed pickle attempt cleans up the temp file.
  • End-to-end: SessionManager.store_panel + load_panel round-trips a basic built-in panel.

Test plan

  • pytest tests/test_summary_panel_pickle.py — 11/11 pass.
  • Full test suite: 273 passed, 2 skipped, 0 regressions.
  • Maintainer review of the __reduce__ semantics for MetricInfo and the atomic-write helper.

Scope

No public API changes. MetricInfo's call / repr / field surface is unchanged; SessionManager.store_panel / store_data keep the same signature. The only new public callable is the module-level _resolve_metric_for_pickle (underscore-prefixed, used only as the pickle reconstruction target).

…mic writes

Resolves the long-standing Bug A in the spatialtissuepy MCP panel
infrastructure: summary_create_panel(..., preset=...) and
summary_add_metric(...) failed with

    PicklingError: Can't pickle <function cell_counts at 0x...>:
    it's not the same object as spatialtissuepy.summary.population.cell_counts

and, under parallel tool calls, a follow-on

    UnpicklingError: Ran out of input

The two symptoms share one subsystem but have independent root causes.

## 1. MetricInfo.func is not its own module attribute

register() in registry.py stores the raw `func` passed to the decorator
on the MetricInfo object, but returns a `functools.wraps`-wrapped
function as the module attribute. Pickle resolves callables by
`__qualname__` to find them at unpickle time, sees the wrapper at
e.g. `spatialtissuepy.summary.population.cell_counts`, compares identity
against the stored raw func, mismatch, refuses to serialise.

Fix: teach MetricInfo to pickle by *name* via __reduce__. On dump we
store only the metric name; on load we call
_resolve_metric_for_pickle(name) which fetches the live registry entry.
That is the durable object across processes, not the captured function
reference at registration time. Built-in metrics register at import, so
all built-in panels round-trip without user action. Registered custom
metrics round-trip as long as the consumer re-registers them before
loading; if they don't, unpickle raises a descriptive RuntimeError
naming the missing metric rather than a cryptic identity-mismatch.
Inline metrics added via StatisticsPanel.add_custom_function() are
per-panel only and correctly refuse to pickle with a TypeError pointing
users at register_custom_metric.

## 2. Non-atomic writes produce truncated pickles under concurrency

SessionManager.store_panel and store_data used plain
`open(path, 'wb'); pickle.dump(...)`. Two concurrent MCP tool calls
(typical when a client issues parallel summary_add_metric calls) could
open the same file simultaneously -- whichever writer finished second
with less data truncated the file that the first had already filled,
and the next read saw "Ran out of input".

Fix: _atomic_pickle_dump writes to a per-call tmp file
(pid + thread id + uuid slug, so in-process threading doesn't collide
either) and then os.replace-s it into place. os.replace is atomic on
POSIX and on Windows ReplaceFile, so readers always see either the
previous file or a complete new one -- never a torn write. On write
failure the temp file is cleaned up so we don't leak partial bytes.

## Tests

Adds tests/test_summary_panel_pickle.py with 11 cases covering:

* All four preset panels (basic / spatial / neighborhood / comprehensive)
  round-trip through pickle.
* A registered custom metric round-trips.
* An inline custom function refuses to pickle with a clear TypeError.
* A custom metric removed from the registry between dump and load
  raises RuntimeError naming the missing metric.
* Eight concurrent threads writing different payloads to the same path
  via _atomic_pickle_dump always leave a valid complete pickle on disk
  (no truncation).
* A failed pickle attempt cleans up the temp file and leaves no partial
  state behind.
* End-to-end: SessionManager.store_panel + load_panel round-trips a
  basic built-in panel through a temp session directory.

Full test suite: 273 passed, 2 skipped, 0 regressions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant