perf(vm): unbox typed-array element reads and writes in the bytecode VM#900
Conversation
…access path Typed-array element access in the bytecode VM fell through to the generic TGocciaObjectValue computed-access branch, allocating an IntToStr index name plus a heap TGocciaNumberLiteralValue on every read, and boxing the scalar via RegisterToValue on every write. Arithmetic and comparisons were already 100% scalar (ADR 0001/0005), so these boundary boxings dominated allocation-heavy typed-array workloads (issue #800). Add unboxed fast paths to ExecGetComputedProperty / ExecSetComputedProperty for TGocciaTypedArrayValue receivers at array-index keys: reads go straight into a register scalar via the new RegisterFromDouble; numeric-scalar writes store directly (ToNumber on a Number is side-effect-free, so the spec's observable conversion is preserved). BigInt kinds, non-index keys, non-scalar values, and out-of-range / detached / immutable cases fall through to the unchanged boxed path, so all value semantics are preserved. Dedup the element store while here: integer NaN/Infinity coercion now lives only in WriteBinaryNumberElement, and the element read/write paths single-validate via ReadElementUnchecked / WriteElementUnchecked. sort_large_countingsort.js (prod, bytecode): allocations 7,471,627 -> 4,719,119 (-36.8%); x86_64 jobs=4 at the 20s deadline 14.1s -> 10.4s. typed-arrays.js element access: read +57%, write +35%, Float64 write +26%. No regressions: full JS suite 11,009/11,009 in both modes; test262 staging, TypedArray, Array, and DataView identical before/after. Closes #800 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Document the TryReadIndexedScalar/TryWriteIndexedScalar fast path on the computed-access cores so the bytecode VM doc stays accurate after the typed-array element unboxing change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub. 1 Skipped Deployment
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (4)
💤 Files with no reviewable changes (1)
✅ Files skipped from review due to trivial changes (3)
📝 WalkthroughWalkthroughAdds unboxed scalar fast paths for typed-array indexed reads and writes in the VM, plus a raw double-to-register helper. Typed-array element access gains unchecked helpers and scalar fast-path methods. Tests and docs were updated to cover the new behavior and related cache policy notes. ChangesTypedArray Unboxed Fast Paths
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
Comment |
Suite TimingTest Runner (interpreted: 11,046 passed; bytecode: 11,046 passed)
MemoryGC rows aggregate the main thread plus all worker thread-local GCs. Test runner worker shutdown frees thread-local heaps in bulk; that shutdown reclamation is not counted as GC collections or collected objects.
Benchmarks (interpreted: 437; bytecode: 437)
MemoryGC rows aggregate the main thread plus all worker thread-local GCs. Benchmark runner performs explicit between-file collections, so collection and collected-object counts can be much higher than the test runner.
Measured on ubuntu-latest x64. |
Benchmark Results437 benchmarks · PR vs same-runner Interpreted: 🟢 28 improved · 🔴 57 regressed · 352 unchanged · avg +0.9% Typical per-run noise (median variance): interpreted ±3.3%, bytecode ±2.4%. Deltas within noise overlap and read as unchanged. arraybuffer.js — Interp: 🔴 6, 8 unch. · avg -3.8% · Bytecode: 🟢 2, 12 unch. · avg +1.8%
arrays.js — Interp: 🔴 2, 17 unch. · avg -0.2% · Bytecode: 🟢 1, 18 unch. · avg +0.4%
async-await.js — Interp: 6 unch. · avg -2.7% · Bytecode: 6 unch. · avg +0.5%
async-generators.js — Interp: 2 unch. · avg +7.0% · Bytecode: 2 unch. · avg -2.7%
atomics.js — Interp: 🟢 1, 5 unch. · avg +2.1% · Bytecode: 🔴 1, 5 unch. · avg -1.1%
base64.js — Interp: 10 unch. · avg +0.5% · Bytecode: 🔴 1, 9 unch. · avg -3.0%
classes.js — Interp: 🟢 1, 🔴 2, 28 unch. · avg -0.6% · Bytecode: 🟢 2, 🔴 2, 27 unch. · avg -0.3%
closures.js — Interp: 11 unch. · avg -1.5% · Bytecode: 11 unch. · avg +1.0%
collections.js — Interp: 🔴 1, 11 unch. · avg +0.9% · Bytecode: 🟢 1, 🔴 2, 9 unch. · avg +0.7%
csv.js — Interp: 13 unch. · avg +5.3% · Bytecode: 13 unch. · avg +3.0%
destructuring.js — Interp: 🟢 1, 🔴 2, 19 unch. · avg +1.2% · Bytecode: 🟢 5, 17 unch. · avg +2.3%
fibonacci.js — Interp: 🔴 1, 7 unch. · avg +0.4% · Bytecode: 8 unch. · avg -2.3%
float16array.js — Interp: 🟢 2, 🔴 10, 20 unch. · avg -0.5% · Bytecode: 🟢 11, 🔴 2, 19 unch. · avg +10.6%
for-in/for-in.js — Interp: 🔴 3 · avg -3.4% · Bytecode: 3 unch. · avg +0.1%
for-of.js — Interp: 🔴 4, 3 unch. · avg -8.1% · Bytecode: 🔴 1, 6 unch. · avg -2.1%
generators.js — Interp: 4 unch. · avg +1.8% · Bytecode: 🔴 3, 1 unch. · avg -3.2%
intl.js — Interp: 🔴 1, 5 unch. · avg -3.4% · Bytecode: 🔴 2, 4 unch. · avg -3.8%
iterators.js — Interp: 🔴 12, 30 unch. · avg -4.9% · Bytecode: 🟢 5, 🔴 4, 33 unch. · avg -0.2%
json.js — Interp: 🟢 1, 22 unch. · avg +0.1% · Bytecode: 🔴 1, 22 unch. · avg -1.7%
jsx.jsx — Interp: 🟢 3, 🔴 1, 17 unch. · avg +2.1% · Bytecode: 🟢 8, 🔴 1, 12 unch. · avg +2.6%
modules.js — Interp: 🔴 1, 8 unch. · avg -2.8% · Bytecode: 🟢 1, 8 unch. · avg +2.7%
numbers.js — Interp: 12 unch. · avg +2.7% · Bytecode: 12 unch. · avg -1.5%
objects.js — Interp: 7 unch. · avg -1.4% · Bytecode: 7 unch. · avg +1.2%
promises.js — Interp: 12 unch. · avg -2.0% · Bytecode: 🟢 1, 11 unch. · avg +3.4%
property-access.js — Interp: 5 unch. · avg -3.1% · Bytecode: 5 unch. · avg +2.3%
regexp.js — Interp: 11 unch. · avg -1.2% · Bytecode: 🟢 1, 10 unch. · avg +1.8%
strings.js — Interp: 🟢 1, 🔴 1, 17 unch. · avg +1.0% · Bytecode: 🔴 4, 15 unch. · avg -2.2%
temporal.js — Interp: 🟢 1, 5 unch. · avg +0.8% · Bytecode: 6 unch. · avg -2.0%
tsv.js — Interp: 🟢 4, 5 unch. · avg +2.9% · Bytecode: 9 unch. · avg -2.3%
typed-arrays.js — Interp: 🟢 2, 🔴 1, 19 unch. · avg -1.6% · Bytecode: 🟢 12, 10 unch. · avg +13.3%
uint8array-encoding.js — Interp: 🟢 2, 🔴 8, 8 unch. · avg -7.7% · Bytecode: 🟢 3, 🔴 3, 12 unch. · avg +2.6%
weak-collections.js — Interp: 🟢 9, 🔴 1, 5 unch. · avg +54.2% · Bytecode: 🟢 1, 🔴 3, 11 unch. · avg -7.4%
Deterministic profile diffDeterministic profile diff: no significant changes. Measured on ubuntu-latest x64. Each PR run also builds the |
test262 Conformance
Areas closest to 100%
Per-test deltas (+1 / -0)Newly passing (1):
Steady-state failures are non-blocking; regressions vs the cached main baseline (lower total pass count, or any PASS → non-PASS transition) fail the conformance gate. Measured on ubuntu-latest x64, bytecode mode. Areas grouped by the first two test262 path components; minimum 25 attempted tests, areas already at 100% excluded. Δ vs main compares against the most recent cached |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/bytecode-vm.md`:
- Line 135: Update the computed-property access description to separate read and
write fallback behavior in the VM docs. In the `ClassifyPropertyKey` /
`ExecGetComputedProperty` / `ExecSetComputedProperty` /
`ExecDeleteComputedProperty` description, keep the read path saying out-of-range
or detached typed-array accesses fall back to the boxed path, but revise the
write path so `TryWriteIndexedScalar` is described as handling non-`BigInt`
writes directly even for out-of-range or immutable backing buffers. Make the
`TGocciaComputedAccessOptions` and `TGocciaTypedArrayValue` wording reflect this
split so the semantics match the `TryReadIndexedScalar`/`TryWriteIndexedScalar`
contracts.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: 40a6e302-da60-46b1-82e6-329ea3b725ff
📒 Files selected for processing (5)
docs/bytecode-vm.mdsource/units/Goccia.VM.Registers.passource/units/Goccia.VM.passource/units/Goccia.Values.TypedArrayValue.pastests/built-ins/TypedArray/element-access-unboxed.js
Record ADR 0080 capturing why interning/pooling boxed TGocciaValue instances to reduce allocation count does not improve runtime in this FPC codebase, so the C/C++ "fewer allocations => faster" intuition is not imported a fourth time. Generalizes ADR 0013 (reject string interning) to boxed numbers with the #900 spike data: a small-int + Infinity/NaN cache on the bytecode VM RegisterToValue path cut allocations 25% but moved runtime +2.2% (flat-to-worse, interleaved medians). Notes the narrow exceptions that do pay off (SmallInt 0-255, special -value singletons / ADR 0002) and the interleaved-measurement guardrail, and cross-links core-patterns.md. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The docs (core-patterns.md, garbage-collector.md) described a "SmallInt cache for 0-255" used by RuntimeCopy as an accepted, working optimization. It never existed: there is no array-of-TGocciaNumberLiteralValue anywhere in git history, and the claim was introduced by a docs-only commit (#302, Apr 2026) with no implementation. The real number value-reuse is RuntimeCopy returning the ADR 0002 special-value singletons (0, 1, NaN, +/-Infinity, -0); all other numbers allocate via Create. Correct both docs and ADR 0080 (which had repeated the phantom claim) to describe only the singletons that actually exist, and note that a spiked 0-255 range cache showed no runtime gain. Surfaced while verifying ADR 0080's "narrow exception". Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Measured both sides of the boundary so the "kept exception" is data-backed, not asserted: disabling the singleton reuse costs +786k allocations and ~1.4-1.7% on the allocation-heavy counting-sort test (within noise on typical integer code), while widening it to a small-integer range removed more allocations for no runtime gain (+2.2%). Even the kept cache barely moves runtime — it is retained because it is free, not because it is a meaningful speedup. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…80900 # Conflicts: # docs/adr/README.md
Address PR review (coderabbitai): the computed-access note conflated the read and write fast-path fallbacks. Reads fall through to the boxed path for BigInt kinds, non-index keys, and out-of-range/detached indices; non-BigInt scalar writes are handled in place even for out-of-range or immutable cases (store skipped, success reported) and only fall through for BigInt kinds, non-index keys, or non-scalar values. Wording now matches the TryReadIndexedScalar/TryWriteIndexedScalar contracts. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Summary
TGocciaObjectValuecomputed-access branch, which allocated anIntToStrindex name plus a heapTGocciaNumberLiteralValueon every read, and boxed the scalar viaRegisterToValueon every write. Arithmetic and comparisons were already 100% scalar (ADR 0001 / ADR 0005), so these boundary boxings dominated allocation-heavy typed-array workloads — the root cause behind Reduce bytecode-VM value boxing so allocation-heavy conformance tests meet the per-test deadline #800's motivating teststaging/sm/TypedArray/sort_large_countingsort.js.ExecGetComputedProperty/ExecSetComputedPropertyforTGocciaTypedArrayValuereceivers at array-index keys (newTryReadIndexedScalar/TryWriteIndexedScalar+RegisterFromDouble): reads move the element straight into a register scalar; numeric-scalar writes store it directly. Also dedups the element store (integer NaN/Infinity coercion now lives only inWriteBinaryNumberElement) and single-validates the hot path viaReadElementUnchecked/WriteElementUnchecked.undefined); a non-BigInt scalar write is handled in place even for an out-of-range index or immutable buffer (store skipped, success reported per integer-indexed exotic[[Set]]) and only falls through for BigInt kinds (a Number value still throwsTypeError), non-index keys, or non-scalar values (objects withvalueOf, booleans,null/undefined).ToNumberon a Number is side-effect-free, so the observable conversion ordering of integer-indexed[[Set]](ES2026 §10.4.5) is preserved. Not a JIT; the test and the 20s harness deadline are unchanged.Results (prod, bytecode)
Context: the issue's ~31M-allocation / 36.8s-CI figures were accurate when filed (validated at
#795). Intervening work (#802 argument arena, #818 unused-argumentselision) already cut this test to ~7.47M allocations and back under the deadline; this PR removes the remaining named lever — typed-array element boxing — as a durable, hardware-independent win.Motivating test
sort_large_countingsort.js:main)--jobs=4@--timeout-ms=20000(CI-like)--jobs=1benchmarks/typed-arrays.js"TypedArray element access" (bytecode, vsmain): sequential read +57%, sequential write +35%, Float64 write +26%; native methods (fill/reduce/sort/iteration) flat within run-to-run noise.Testing
Details:
tests/built-ins/TypedArray/element-access-unboxed.jscovers −0 round-trips (float preserves, integer normalizes), unboxed reads feeding>/===, register-resident scalar writes,Float16Arrayindices, non-scalar values routed through the slow path, a Number-into-BigIntTypeError, and large-array (counting-sort-scale) access../format.pas --checkclean; Pascal pre-commit hooks (trunc-guards, format) pass.docs/bytecode-vm.mdnotes the fast path.core-patterns.md/garbage-collector.mddescribed but was never implemented (only the ADR 0002 special-value singletons exist).Closes #800