frostney · frostney · Jun 28, 2026 · Jun 28, 2026 · Jun 28, 2026 · Jun 28, 2026
diff --git a/docs/adr/0081-reject-value-caches-for-allocation-reduction.md b/docs/adr/0081-reject-value-caches-for-allocation-reduction.md
@@ -0,0 +1,13 @@
+# Reject shared value caches as a runtime optimization
+
+**Date:** 2026-06-28
+**Area:** `runtime`
+**Pull Request:** [#900](https://github.com/frostney/GocciaScript/pull/900)
+
+Reducing allocation *count* is not, by itself, a runtime lever in this engine, so shared caches of boxed `TGocciaValue` instances — interning or pooling them to avoid allocation — are rejected as a performance optimization. The only value reuse the engine actually has is the handful of special-value singletons returned by `RuntimeCopy` and the register-boxing paths (`0`, `1`, `NaN`, `±Infinity`, `-0`; see [ADR 0002](0002-singleton-special-values.md)). Every attempt to add caching *beyond* that fixed set has been measured and rejected: dictionary-based string interning ([ADR 0013](0013-reject-string-interning.md), −4% across 172 benchmarks) and the boxed-number range cache described below. A `SmallInt` 0–255 cache that earlier docs described as if implemented never actually existed in the source (corrected alongside this ADR) — itself a sign of how readily the C/C++ "fewer allocations ⇒ faster" intuition takes hold. This ADR exists so it is not imported again.
+
+Alongside the [#900](https://github.com/frostney/GocciaScript/pull/900) typed-array element unboxing, a lazy, GC-pinned cache of boxed small integers (range −32768..1024) plus `±Infinity`/`NaN` singleton reuse was spiked into the bytecode VM's `RegisterToValue` — the register→`TGocciaValue` boxing site that feeds call arguments. On the `sm/TypedArray/sort_large_countingsort.js` workload it cut heap allocations 4,719,119 → 3,534,333 (**−25%, deterministic**), yet runtime did not move: interleaved medians 6920 ms → 7072 ms (**+2.2%, flat-to-worse**), a fibonacci benchmark +0.6% (noise), and boot time unchanged. FreePascal's allocator plus the mark-and-sweep GC make these short-lived boxed values cheap to create and reclaim, so the cache's per-box branch (range check + array index + nil check) offsets whatever the avoided allocation saved — the same mechanism that made string interning a regression.
+
+The one form of value reuse worth keeping — and **not** superseded — is the special-value singleton set of [ADR 0002](0002-singleton-special-values.md) (`0`, `1`, `NaN`, `±0`, `±Infinity`, plus `true`/`false`, `null`/`undefined`), reused by `RuntimeCopy` and `RegisterToValue`. It is a tiny, fixed set matched by direct comparison with a high hit rate on the path it sits on — not an array, not a range, not content-keyed. The boundary was measured on both sides: *disabling* the singleton reuse (always allocating) costs +786k allocations and only ~1.4–1.7% on the allocation-heavy `sort_large_countingsort.js` test, within noise on typical integer code — a small, essentially free win; *widening* it to a small-integer range (the spike above) removed more allocations (−1.18M) for no runtime gain (+2.2%). So even the kept cache barely moves runtime, and everything past the narrow fixed set is pure cost — the singleton set is the measured sweet spot, kept because it is free rather than because it is a meaningful speedup. If boxed-value allocation ever shows up as a *measured* bottleneck, the lever to evaluate is arena/pool allocation that lowers per-object GC cost without a per-box lookup — not a content- or range-keyed value cache.
+
+Guardrail for any future attempt: measure with **interleaved** before/after binaries (alternate per repetition, compare medians via the runner's `--bare`), never sequential batches. The first, sequential measurement here falsely showed −13% on the test and +63% on a fibonacci bench purely from machine-load drift, which interleaving erased. Allocation count is deterministic and hardware-independent, but it is not, on its own, evidence of a runtime win. [core-patterns.md § String Interning — Attempted and Rejected](../core-patterns.md#string-interning--attempted-and-rejected). [garbage-collector.md](../garbage-collector.md).
diff --git a/docs/adr/README.md b/docs/adr/README.md
@@ -90,3 +90,4 @@ Durable architecture and implementation decisions for GocciaScript. New ADRs use
 - [0078 — Thread-local cleanup registry for managed threadvars](0078-thread-local-cleanup-registry.md)
 - [0079 — Keep speculatively-scanned tokens across parenthesized-group probes](0079-keep-speculatively-scanned-tokens.md)
 - [0080 — FormatDouble first-hit precision scan](0080-formatdouble-first-hit-precision-scan.md)
+- [0081 — Reject shared value caches as a runtime optimization](0081-reject-value-caches-for-allocation-reduction.md)
diff --git a/docs/bytecode-vm.md b/docs/bytecode-vm.md
@@ -132,7 +132,7 @@ Hits and fills serve only exact-class `TGocciaObjectValue` / `TGocciaVMLiteralOb
 
 Cached pointers (scope, shape) are compared for identity only and never dereferenced. Scope cache entries carry an entry-version stamp against allocator address reuse; shape entries need none, because shapes are never freed within an engine's lifetime, function templates never outlive their engine, and cross-realm maps stop shape tracking before a foreign realm can cache their owner layout.
 
-Computed property access (`OP_ARRAY_GET`/`OP_ARRAY_SET`, `OP_GET_INDEX`/`OP_SET_INDEX`, `OP_DEL_INDEX`) shares one key-classification and receiver-dispatch implementation (`ClassifyPropertyKey` plus the `ExecGet/ExecSet/ExecDeleteComputedProperty` cores in `Goccia.VM.pas`); per-opcode semantic differences are explicit `TGocciaComputedAccessOptions`, not divergent copies.
+Computed property access (`OP_ARRAY_GET`/`OP_ARRAY_SET`, `OP_GET_INDEX`/`OP_SET_INDEX`, `OP_DEL_INDEX`) shares one key-classification and receiver-dispatch implementation (`ClassifyPropertyKey` plus the `ExecGet/ExecSet/ExecDeleteComputedProperty` cores in `Goccia.VM.pas`); per-opcode semantic differences are explicit `TGocciaComputedAccessOptions`, not divergent copies. A non-BigInt `TGocciaTypedArrayValue` receiver at an array-index key takes an unboxed element fast path (`TryReadIndexedScalar`/`TryWriteIndexedScalar`): reads move the element straight into a register scalar and numeric-scalar writes store it directly, so neither allocates the heap `TGocciaNumberLiteralValue` or index-name string the generic object branch would. BigInt kinds, non-index keys, and non-scalar write values fall through to the boxed path; an out-of-range or detached **read** does too (yielding `undefined`). A non-BigInt scalar **write**, however, keeps its integer-indexed exotic semantics in place even for an out-of-range index or immutable backing buffer — the store is skipped and reported as successful, never boxed. All value semantics are preserved, including the observable `ToNumber` ordering of integer-indexed `[[Set]]`.
 
 The current optimization target is reducing bytecode-mode suite time further without diverging interpreter and bytecode semantics.
 

diff --git a/docs/core-patterns.md b/docs/core-patterns.md
@@ -386,11 +386,13 @@ String interning (caching `TGocciaStringLiteralValue` instances in a `TDictionar
 - **Dictionary lookup cost exceeds allocation cost.** FreePascal's allocator is fast. A `TDictionary.TryGetValue` call involves hashing the string (O(n) in string length) plus a hash-table probe, which is more expensive than simply allocating a short-lived `TGocciaStringLiteralValue` and letting the GC reclaim it later.
 - **Low hit rate on hot paths.** `ToStringLiteral` on numbers produces mostly unique strings (`"42"`, `"3.14"`, etc.) that never hit the cache, paying the hash cost with zero benefit. This path is called frequently in arithmetic-heavy benchmarks.
 - **`RuntimeCopy` is the wrong interception point.** Every string literal evaluation goes through `RuntimeCopy`. Adding a dictionary lookup to this universal hot path penalizes all string operations, including those that create one-off strings (concatenation results, method return values).
-- **GC pressure is not the bottleneck.** The SmallInt cache works for numbers because integer equality is a single comparison. String equality requires content comparison, so the lookup cost scales with string length rather than being O(1).
+- **GC pressure is not the bottleneck.** The number special-value singletons work because the check is a single equality against a fixed set. String equality requires content comparison, so the lookup cost scales with string length rather than being O(1).
 
-**The `SmallInt` cache works because:** integer comparison is a single machine instruction, the cache is a fixed-size array (no hashing), and the hit rate for integers 0–255 is very high in typical code. None of these properties hold for arbitrary strings.
+**The number special-value singletons work because:** they are a tiny fixed set (`0`, `1`, `NaN`, `±Infinity`, `-0`) matched by direct comparison in `RuntimeCopy` — no hashing, no array, no range — with a high hit rate in typical code. There is **no** general small-integer (e.g. 0–255) range cache: earlier revisions of this doc and `garbage-collector.md` described one, but it was never implemented, and a spike that added it (plus `±Infinity`/`NaN` reuse on the VM boxing path) measured **no runtime gain** — see the boxed-numbers note below. None of the singletons' properties hold for arbitrary strings.
 
-**Do not re-attempt** dictionary-based string interning. If string allocation becomes a measurable bottleneck in future profiling, consider instead: (a) pre-allocated singletons for a small fixed set of ultra-common strings (like `SmallInt` but for `"length"`, `"undefined"`, etc.), or (b) arena/pool allocation for `TGocciaStringLiteralValue` objects to reduce per-object GC overhead without per-string hashing.
+**Do not re-attempt** dictionary-based string interning. If string allocation becomes a measurable bottleneck in future profiling, consider instead: (a) pre-allocated singletons for a small fixed set of ultra-common strings (like the number special-value singletons but for `"length"`, `"undefined"`, etc.), or (b) arena/pool allocation for `TGocciaStringLiteralValue` objects to reduce per-object GC overhead without per-string hashing.
+
+The same result holds for **boxed numbers**: adding a small-integer range cache and reusing `±Infinity`/`NaN` singletons in the bytecode VM's `RegisterToValue` boxing path cut allocations ~25% on an allocation-heavy typed-array test but produced **no runtime improvement** (interleaved median +2.2%). Reducing allocation *count* is not, by itself, a runtime lever in this codebase — see [ADR 0081](adr/0081-reject-value-caches-for-allocation-reduction.md) for the data, the narrow exceptions that do pay off, and the interleaved-measurement guardrail.
 
 ## Related documents
 

diff --git a/docs/garbage-collector.md b/docs/garbage-collector.md
@@ -44,7 +44,7 @@ end;
 - **`AfterConstruction` / `BeforeDestruction`** — Every value auto-registers with the thread-local `TGarbageCollector.Instance` upon creation and unregisters before destruction so root sets cannot retain stale object pointers.
 - **`MarkReferences`** — Base implementation sets `FGCMark := GCCurrentMark` (marking the object as alive for the current collection). `AdvanceMark` increments the shared `GCCurrentMark` while the collector lock is held, and `TGarbageCollector.Instance` uses that mark while traversing objects. Subclasses override `MarkReferences` to also mark values they reference (e.g., `TGocciaObjectValue` marks its prototype and property values, `TGocciaFunctionValue` marks its closure scope, `TGocciaArrayValue` marks its elements). The `if GCMarked then Exit;` guard at the top of each override prevents re-visiting objects in cyclic reference graphs.
 - **`TraceWeakReferences` / `SweepWeakReferences`** — Optional hooks for weak containers and weak references. The default implementations do nothing. WeakMap uses `TraceWeakReferences` as an ephemeron pass: if a key is already marked by normal roots, its value is marked, but the key is never marked by the map. WeakMap and WeakSet use `SweepWeakReferences` to remove entries whose keys/values remain unmarked. WeakRef clears an unmarked target, and FinalizationRegistry removes dead cells while enqueueing cleanup jobs for their held values.
-- **`RuntimeCopy`** — Creates a fresh GC-managed copy of the value. Used by the evaluator when evaluating literal expressions: AST-owned literal values are not tracked by the GC, so `RuntimeCopy` produces a runtime value that is. The default implementation returns `Self` (for singletons and complex values). Primitives override this: numbers use the `SmallInt` cache for 0-255, booleans return singletons, strings create new instances (cheap due to copy-on-write).
+- **`RuntimeCopy`** — Creates a fresh GC-managed copy of the value. Used by the evaluator when evaluating literal expressions: AST-owned literal values are not tracked by the GC, so `RuntimeCopy` produces a runtime value that is. The default implementation returns `Self` (for singletons and complex values). Primitives override this: numbers reuse the special-value singletons (`0`, `1`, `NaN`, `±Infinity`, `-0`) and otherwise create a fresh instance, booleans return singletons, strings create new instances (cheap due to copy-on-write).
 
 ## Contributor Rules
 
@@ -151,7 +151,7 @@ The separate `memory.heap` JSON object comes from FreePascal's `GetHeapStatus`,
 
 The parser creates `TGocciaValue` instances (numbers, strings, booleans) and stores them inside `TGocciaLiteralExpression` AST nodes. These values are owned by the AST, not the GC. `TGocciaLiteralExpression.Create` calls `TGarbageCollector.Instance.UnregisterObject` to remove the value from GC tracking, and `TGocciaLiteralExpression.Destroy` frees the value (unless it is a singleton like `UndefinedValue`, `TrueValue`, or `FalseValue`).
 
-When the evaluator encounters a literal expression, it calls `Value.RuntimeCopy` to produce a fresh GC-managed runtime value. This cleanly separates compile-time constants (owned by the AST) from runtime values (managed by the GC). The overhead is minimal: integers 0-255 hit the `SmallInt` cache (zero allocation), booleans return singletons, and strings benefit from FreePascal's copy-on-write semantics.
+When the evaluator encounters a literal expression, it calls `Value.RuntimeCopy` to produce a fresh GC-managed runtime value. This cleanly separates compile-time constants (owned by the AST) from runtime values (managed by the GC). The overhead is minimal: `0`, `1`, and the special values (`NaN`, `±Infinity`, `-0`) reuse singletons (zero allocation), other numbers allocate cheaply, booleans return singletons, and strings benefit from FreePascal's copy-on-write semantics.
 
 ## Related Documents
 

diff --git a/source/units/Goccia.VM.Registers.pas b/source/units/Goccia.VM.Registers.pas
@@ -40,6 +40,7 @@ function RegisterHole: TGocciaRegister; inline;
 function RegisterBoolean(const AValue: Boolean): TGocciaRegister; inline;
 function RegisterInt(const AValue: Int64): TGocciaRegister; inline;
 function RegisterFloat(const AValue: Double): TGocciaRegister; inline;
+function RegisterFromDouble(const AValue: Double): TGocciaRegister; inline;
 function RegisterObject(const AValue: TGocciaValue): TGocciaRegister; inline;
 function ValueToRegister(const AValue: TGocciaValue): TGocciaRegister; inline;
 function RegisterToValue(const ARegister: TGocciaRegister): TGocciaValue; inline;
@@ -83,6 +84,30 @@ function RegisterFloat(const AValue: Double): TGocciaRegister; inline;
   Result.FloatValue := AValue;
 end;
 
+function RegisterFromDouble(const AValue: Double): TGocciaRegister; inline;
+var
+  Bits: Int64 absolute AValue;
+begin
+  // Build a register directly from a raw Double without ever allocating a heap
+  // TGocciaNumberLiteralValue. Mirrors the number branch of VMValueToRegisterFast:
+  // exact integers in LongInt range become grkInt (so downstream scalar opcodes and
+  // the Zero/One singletons engage on later boxing), and -0.0 stays float to keep
+  // its sign bit. NaN/Infinity/non-integers stay float.
+  if AValue = 0.0 then
+  begin
+    if Bits < 0 then
+      Exit(RegisterFloat(AValue)); // -0.0: preserve the sign bit as a float
+    Exit(RegisterInt(0));
+  end;
+  if AValue = 1.0 then
+    Exit(RegisterInt(1));
+  if (not IsNaN(AValue)) and (not IsInfinite(AValue)) and
+     (Frac(AValue) = 0.0) and
+     (AValue >= Low(LongInt)) and (AValue <= High(LongInt)) then
+    Exit(RegisterInt(Trunc(AValue)));
+  Result := RegisterFloat(AValue);
+end;
+
 function RegisterObject(const AValue: TGocciaValue): TGocciaRegister; inline;
 begin
   Result.Kind := grkObject;

diff --git a/source/units/Goccia.VM.pas b/source/units/Goccia.VM.pas
@@ -521,7 +521,8 @@ implementation
   Goccia.Values.ProxyValue,
   Goccia.Values.Shape,
   Goccia.Values.ToObject,
-  Goccia.Values.ToPrimitive;
+  Goccia.Values.ToPrimitive,
+  Goccia.Values.TypedArrayValue;
 
 const
   BYTECODE_PRIVATE_SLOT_PREFIX = '#slot:';
@@ -7656,11 +7657,24 @@ procedure TGocciaVM.ExecGetComputedProperty(const ADest: Integer;
   Key: TGocciaPropertyKey;
   KeyName: string;
   ReceiverArray: TGocciaArrayValue;
+  FastIndex: Integer;
+  FastElement: Double;
 begin
   if (caoThrowOnNullUndefined in AOptions) and
      (AObjReg.Kind in [grkUndefined, grkNull]) then
     ThrowTypeError(SErrorCannotConvertNullOrUndefined,
       SSuggestCheckNullBeforeAccess)
+  else if (AObjReg.Kind = grkObject) and
+          (AObjReg.ObjectValue is TGocciaTypedArrayValue) and
+          TryGetArrayIndexRegister(AKeyReg, FastIndex) and
+          TGocciaTypedArrayValue(AObjReg.ObjectValue)
+            .TryReadIndexedScalar(FastIndex, FastElement) then
+    // Typed-array unboxed element read: the element goes straight into the
+    // destination register as a scalar, with no heap TGocciaNumberLiteralValue and
+    // no IntToStr index name. Non-index keys, BigInt kinds, and out-of-range indices
+    // fall through to the generic object branch below, which handles length, methods,
+    // `undefined` for out-of-range reads, BigInt boxing, and symbol keys unchanged.
+    FRegisters[ADest] := RegisterFromDouble(FastElement)
   else if (AObjReg.Kind = grkObject) and
           (AObjReg.ObjectValue is TGocciaArrayValue) then
   begin
@@ -7752,7 +7766,21 @@ procedure TGocciaVM.ExecSetComputedProperty(const ATargetIndex: Integer;
   Value: TGocciaValue;
   TargetValue: TGocciaValue;
   BoxedTarget: TGocciaObjectValue;
+  FastIndex: Integer;
 begin
+  // Typed-array unboxed element write: a numeric-scalar value going to a valid
+  // integer index stores directly, with no heap TGocciaNumberLiteralValue and no
+  // IntToStr index name. ToNumber on a Number is side-effect-free, so the spec's
+  // observable conversion is preserved. BigInt kinds (a Number value must throw),
+  // non-index keys, and non-scalar values fall through to the boxed path below.
+  if (FRegisters[ATargetIndex].Kind = grkObject) and
+     (FRegisters[ATargetIndex].ObjectValue is TGocciaTypedArrayValue) and
+     RegisterIsNumericScalar(AValueReg) and
+     TryGetArrayIndexRegister(AKeyReg, FastIndex) and
+     TGocciaTypedArrayValue(FRegisters[ATargetIndex].ObjectValue)
+       .TryWriteIndexedScalar(FastIndex, RegisterToDouble(AValueReg)) then
+    Exit;
+
   Value := RegisterToValue(AValueReg);
   if (FRegisters[ATargetIndex].Kind = grkObject) and
      (FRegisters[ATargetIndex].ObjectValue is TGocciaArrayValue) then