deepclone_from_array(): hydrate closure-bearing nodes as native lazy ghosts#25
Merged
Merged
Conversation
…osts On PHP 8.4+, object nodes whose payload slots or replayed __unserialize state carry a named-closure or const-expr-closure marker are created as uninitialized lazy ghosts: every object identity exists when the call returns (back-references, shared &-references and === behave as for eager nodes), but per-node hydration, closure resolution included, runs on first engine access. Closure-bearing __wakeup/__unserialize nodes replay their hook at the end of their own initialization; per-entry validation stays inside the call. Nodes without closure markers hydrate eagerly as before (copy-on-write makes plain value slots cheaper to hydrate than to ghost), as does everything on PHP 8.2/8.3. Shared state lives in the internal-only DeepClone\HydrationContext; ReflectionClass::getLazyInitializer() returns a Closure bound to it, shared by all ghosts of one call. Structural validation and allowed_classes enforcement (including the const-expr gate) stay eager; only value-level resolution errors surface at first access, where the engine reverts the ghost and keeps it retryable. Measured against v0.7.2 (20k-node graphs, release 8.4): closure-rich graphs hydrate 4-6x faster on creation and partial consumption, occupy 2-3x less memory while untouched, and tear down about 2x faster when dropped untouched; fully traversed graphs pay a comparable total; scalar graphs are unchanged. Also fixes two pre-existing issues lazy hydration would have amplified: shared &-references bound to typed declared properties now register the property as a type source instead of tripping the engine's deref assertion, and object-ref markers resolved against ref slots are order-independent by-value snapshots.
3a75b3c to
66cce22
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
On PHP 8.4+,
deepclone_from_array()now creates the object nodes that are expensive to hydrate as native lazy ghosts: nodes whose payload slots or replayed__unserializestate carry a named-closure or (PHP 8.5) const-expr-closure marker. Every object identity still exists when the call returns (back-references, shared&references and===behave exactly as for eager nodes), but a ghost's property hydration, closure resolution included, runs on first engine access of that node. All other nodes hydrate eagerly, and PHP 8.2/8.3 keeps hydrating everything eagerly. The signature is unchanged.Why
Resolving closure markers is where hydration time goes: fake-closure creation plus class and function lookups for named closures, attribute-args re-evaluation for const-expr closures. Measured against v0.7.2 (20k-node graphs, PHP 8.4 release build):
__unserializewith a closure in the state: create 16.4 -> 4.8 ms; full traversal 17.8 -> 19.9 ms is the worst measured regression;Plain value slots are deliberately not deferred: copy-on-write makes them refcount bumps, so ghost bookkeeping would cost more than the hydration it defers.
How
resolvetable and the per-entrystatesmasks for closure markers. Eligible nodes must additionally be user classes without internal ancestors (stdClassdescendants excepted) and with declared properties. Closure-bearing__wakeup/__unserializenodes defer too: their hook runs at the end of their own initialization instead of in the global, children-first phase-9 sequence. Per-entry validation stays inside the call, and the deferred state is recorded before any user code can run, only from entries the eager path would actually call, so mid-call touches on malformed payloads cannot run hooks the eager path would not have run.DeepClone\HydrationContextobject; the ghost initializer is a Closure over its C-implemented privatehydrate()method, created once per call, soReflectionClass::getLazyInitializer()returns a plainClosure(identical across all ghosts of one call). The context retains the payload (the slot index points into it), the object table (back-reference targets must outlive the call), the shared refs table and a copy of the allow-list, and reports everything to the GC so abandoned half-hydrated graphs are collectable.hydrate()refuses objects realized behind the context's back (markLazyObjectAsInitialized(), raw-value draining) and carries a per-id re-entrancy guard, so a hostile callback during hydration can neither double-apply markers nor leave a reverted ghost permanently un-hydratable.Behavior notes
$allowed_classesenforcement, including the const-expr-closure gate, still throw insidedeepclone_from_array().prepared/masktree and inrefMaskskeep resolving eagerly, which is why the existing closure test suite passes unchanged.&references bound to typed properties register per node as it hydrates; a write through such a reference is only checked against the already-hydrated holders (documented in the README).Rider fixes
&-reference to a typed declared property aborted debug builds (deref assertion inzend_std_write_property()) and skipped type-source registration on release builds, so later writes through the reference bypassed the property type.deepclone_from_array()anddeepclone_hydrate(..., DEEPCLONE_HYDRATE_PRESERVE_REFS)now mirrorunserialize().true) resolved against a ref slot returned either an alias or a by-value snapshot of the shared value depending on which consumer resolved first. They are now always by-value snapshots, making the result independent of hydration order; such payloads are only ever hand-crafted,deepclone_to_array()never emits them.Tests
Seven new .phpt files cover ghost identity and per-node hydration granularity, deferred state replays (including nested ghost-from-hook initialization in both states orders and eager-parity on malformed payloads), shared-reference correctness under both touch orders, eager validation versus deferred-error retry semantics, GC of abandoned graphs and destructor skipping, wakeup-node eagerness with lazy children, const-expr deferral with the eager allow-list gate (8.5+), and the typed-ref binding fix. The full suite (46 tests) passes on a debug build of 8.6-dev (assertions and leak checking active) and the extension builds warning-free against release PHP 8.4 with
-O2and against the 8.2/8.3-Werrormatrix (ghost-only helpers are compiled out below 8.4).