Skip to content

deepclone_from_array(): hydrate closure-bearing nodes as native lazy ghosts#25

Merged
nicolas-grekas merged 1 commit into
symfony:mainfrom
nicolas-grekas:lazy-ghosts
Jun 11, 2026
Merged

deepclone_from_array(): hydrate closure-bearing nodes as native lazy ghosts#25
nicolas-grekas merged 1 commit into
symfony:mainfrom
nicolas-grekas:lazy-ghosts

Conversation

@nicolas-grekas

@nicolas-grekas nicolas-grekas commented Jun 11, 2026

Copy link
Copy Markdown
Member

What

On PHP 8.4+, deepclone_from_array() now creates the object nodes that are expensive to hydrate as native lazy ghosts: nodes whose payload slots or replayed __unserialize state carry a named-closure or (PHP 8.5) const-expr-closure marker. Every object identity still exists when the call returns (back-references, shared & references and === behave exactly as for eager nodes), but a ghost's property hydration, closure resolution included, runs on first engine access of that node. All other nodes hydrate eagerly, and PHP 8.2/8.3 keeps hydrating everything eagerly. The signature is unchanged.

Why

Resolving closure markers is where hydration time goes: fake-closure creation plus class and function lookups for named closures, attribute-args re-evaluation for const-expr closures. Measured against v0.7.2 (20k-node graphs, PHP 8.4 release build):

  • 3 named closures per node: create 22.4 -> 5.3 ms, create + touch 1% of nodes 29.5 -> 4.8 ms, create + drop untouched 29.7 -> 12.2 ms, resident untouched 25.2 -> 9.4 MiB (shells plus slot index weigh less than materialized closures);
  • __unserialize with a closure in the state: create 16.4 -> 4.8 ms; full traversal 17.8 -> 19.9 ms is the worst measured regression;
  • plain scalar DTO graphs: identical within noise; without closure markers the call takes the eager path bit for bit.

Plain value slots are deliberately not deferred: copy-on-write makes them refcount bumps, so ghost bookkeeping would cost more than the hydration it defers.

How

  • Eligibility is decided upfront by scanning the payload's resolve table and the per-entry states masks for closure markers. Eligible nodes must additionally be user classes without internal ancestors (stdClass descendants excepted) and with declared properties. Closure-bearing __wakeup/__unserialize nodes defer too: their hook runs at the end of their own initialization instead of in the global, children-first phase-9 sequence. Per-entry validation stays inside the call, and the deferred state is recorded before any user code can run, only from entries the eager path would actually call, so mid-call touches on malformed payloads cannot run hooks the eager path would not have run.
  • Shared state lives in a new internal-only DeepClone\HydrationContext object; the ghost initializer is a Closure over its C-implemented private hydrate() method, created once per call, so ReflectionClass::getLazyInitializer() returns a plain Closure (identical across all ghosts of one call). The context retains the payload (the slot index points into it), the object table (back-reference targets must outlive the call), the shared refs table and a copy of the allow-list, and reports everything to the GC so abandoned half-hydrated graphs are collectable.
  • A per-id slot index, built eagerly with the same structural validation and error messages as the eager path, lets each ghost replay only its own slots, with per-column scope and property-info resolution hoisted like the eager walk does.
  • hydrate() refuses objects realized behind the context's back (markLazyObjectAsInitialized(), raw-value draining) and carries a per-id re-entrancy guard, so a hostile callback during hydration can neither double-apply markers nor leave a reverted ghost permanently un-hydratable.

Behavior notes

  • Structural payload errors and $allowed_classes enforcement, including the const-expr-closure gate, still throw inside deepclone_from_array().
  • Value-level resolution errors (a stale const-expr closure line, a named-closure target that no longer exists) surface at first access of the node instead; the engine reverts the ghost, which stays uninitialized and rethrows on every retry. Markers in the root prepared/mask tree and in refMasks keep resolving eagerly, which is why the existing closure test suite passes unchanged.
  • A never-initialized ghost's destructor is not called, and the payload plus the graph stay pinned until the last ghost initializes or dies (the cycle collector reclaims abandoned graphs).
  • Type sources for shared & references bound to typed properties register per node as it hydrates; a write through such a reference is only checked against the already-hydrated holders (documented in the README).
  • The VarExporter polyfill keeps resolving closures eagerly for now; payloads remain interchangeable in both directions.

Rider fixes

  • Binding a shared &-reference to a typed declared property aborted debug builds (deref assertion in zend_std_write_property()) and skipped type-source registration on release builds, so later writes through the reference bypassed the property type. deepclone_from_array() and deepclone_hydrate(..., DEEPCLONE_HYDRATE_PRESERVE_REFS) now mirror unserialize().
  • Object-ref markers (true) resolved against a ref slot returned either an alias or a by-value snapshot of the shared value depending on which consumer resolved first. They are now always by-value snapshots, making the result independent of hydration order; such payloads are only ever hand-crafted, deepclone_to_array() never emits them.

Tests

Seven new .phpt files cover ghost identity and per-node hydration granularity, deferred state replays (including nested ghost-from-hook initialization in both states orders and eager-parity on malformed payloads), shared-reference correctness under both touch orders, eager validation versus deferred-error retry semantics, GC of abandoned graphs and destructor skipping, wakeup-node eagerness with lazy children, const-expr deferral with the eager allow-list gate (8.5+), and the typed-ref binding fix. The full suite (46 tests) passes on a debug build of 8.6-dev (assertions and leak checking active) and the extension builds warning-free against release PHP 8.4 with -O2 and against the 8.2/8.3 -Werror matrix (ghost-only helpers are compiled out below 8.4).

…osts

On PHP 8.4+, object nodes whose payload slots or replayed __unserialize
state carry a named-closure or const-expr-closure marker are created as
uninitialized lazy ghosts: every object identity exists when the call
returns (back-references, shared &-references and === behave as for eager
nodes), but per-node hydration, closure resolution included, runs on first
engine access. Closure-bearing __wakeup/__unserialize nodes replay their
hook at the end of their own initialization; per-entry validation stays
inside the call. Nodes without closure markers hydrate eagerly as before
(copy-on-write makes plain value slots cheaper to hydrate than to ghost),
as does everything on PHP 8.2/8.3.

Shared state lives in the internal-only DeepClone\HydrationContext;
ReflectionClass::getLazyInitializer() returns a Closure bound to it,
shared by all ghosts of one call. Structural validation and
allowed_classes enforcement (including the const-expr gate) stay eager;
only value-level resolution errors surface at first access, where the
engine reverts the ghost and keeps it retryable.

Measured against v0.7.2 (20k-node graphs, release 8.4): closure-rich
graphs hydrate 4-6x faster on creation and partial consumption, occupy
2-3x less memory while untouched, and tear down about 2x faster when
dropped untouched; fully traversed graphs pay a comparable total; scalar
graphs are unchanged.

Also fixes two pre-existing issues lazy hydration would have amplified:
shared &-references bound to typed declared properties now register the
property as a type source instead of tripping the engine's deref
assertion, and object-ref markers resolved against ref slots are
order-independent by-value snapshots.
@nicolas-grekas nicolas-grekas merged commit 66cce22 into symfony:main Jun 11, 2026
11 checks passed
@nicolas-grekas nicolas-grekas deleted the lazy-ghosts branch June 11, 2026 07:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant