Skip to content
Merged
13 changes: 13 additions & 0 deletions docs/adr/0081-reject-value-caches-for-allocation-reduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Reject shared value caches as a runtime optimization

**Date:** 2026-06-28
**Area:** `runtime`
**Pull Request:** [#900](https://github.com/frostney/GocciaScript/pull/900)

Reducing allocation *count* is not, by itself, a runtime lever in this engine, so shared caches of boxed `TGocciaValue` instances — interning or pooling them to avoid allocation — are rejected as a performance optimization. The only value reuse the engine actually has is the handful of special-value singletons returned by `RuntimeCopy` and the register-boxing paths (`0`, `1`, `NaN`, `±Infinity`, `-0`; see [ADR 0002](0002-singleton-special-values.md)). Every attempt to add caching *beyond* that fixed set has been measured and rejected: dictionary-based string interning ([ADR 0013](0013-reject-string-interning.md), −4% across 172 benchmarks) and the boxed-number range cache described below. A `SmallInt` 0–255 cache that earlier docs described as if implemented never actually existed in the source (corrected alongside this ADR) — itself a sign of how readily the C/C++ "fewer allocations ⇒ faster" intuition takes hold. This ADR exists so it is not imported again.

Alongside the [#900](https://github.com/frostney/GocciaScript/pull/900) typed-array element unboxing, a lazy, GC-pinned cache of boxed small integers (range −32768..1024) plus `±Infinity`/`NaN` singleton reuse was spiked into the bytecode VM's `RegisterToValue` — the register→`TGocciaValue` boxing site that feeds call arguments. On the `sm/TypedArray/sort_large_countingsort.js` workload it cut heap allocations 4,719,119 → 3,534,333 (**−25%, deterministic**), yet runtime did not move: interleaved medians 6920 ms → 7072 ms (**+2.2%, flat-to-worse**), a fibonacci benchmark +0.6% (noise), and boot time unchanged. FreePascal's allocator plus the mark-and-sweep GC make these short-lived boxed values cheap to create and reclaim, so the cache's per-box branch (range check + array index + nil check) offsets whatever the avoided allocation saved — the same mechanism that made string interning a regression.

The one form of value reuse worth keeping — and **not** superseded — is the special-value singleton set of [ADR 0002](0002-singleton-special-values.md) (`0`, `1`, `NaN`, `±0`, `±Infinity`, plus `true`/`false`, `null`/`undefined`), reused by `RuntimeCopy` and `RegisterToValue`. It is a tiny, fixed set matched by direct comparison with a high hit rate on the path it sits on — not an array, not a range, not content-keyed. The boundary was measured on both sides: *disabling* the singleton reuse (always allocating) costs +786k allocations and only ~1.4–1.7% on the allocation-heavy `sort_large_countingsort.js` test, within noise on typical integer code — a small, essentially free win; *widening* it to a small-integer range (the spike above) removed more allocations (−1.18M) for no runtime gain (+2.2%). So even the kept cache barely moves runtime, and everything past the narrow fixed set is pure cost — the singleton set is the measured sweet spot, kept because it is free rather than because it is a meaningful speedup. If boxed-value allocation ever shows up as a *measured* bottleneck, the lever to evaluate is arena/pool allocation that lowers per-object GC cost without a per-box lookup — not a content- or range-keyed value cache.

Guardrail for any future attempt: measure with **interleaved** before/after binaries (alternate per repetition, compare medians via the runner's `--bare`), never sequential batches. The first, sequential measurement here falsely showed −13% on the test and +63% on a fibonacci bench purely from machine-load drift, which interleaving erased. Allocation count is deterministic and hardware-independent, but it is not, on its own, evidence of a runtime win. [core-patterns.md § String Interning — Attempted and Rejected](../core-patterns.md#string-interning--attempted-and-rejected). [garbage-collector.md](../garbage-collector.md).
1 change: 1 addition & 0 deletions docs/adr/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,3 +90,4 @@ Durable architecture and implementation decisions for GocciaScript. New ADRs use
- [0078 — Thread-local cleanup registry for managed threadvars](0078-thread-local-cleanup-registry.md)
- [0079 — Keep speculatively-scanned tokens across parenthesized-group probes](0079-keep-speculatively-scanned-tokens.md)
- [0080 — FormatDouble first-hit precision scan](0080-formatdouble-first-hit-precision-scan.md)
- [0081 — Reject shared value caches as a runtime optimization](0081-reject-value-caches-for-allocation-reduction.md)
2 changes: 1 addition & 1 deletion docs/bytecode-vm.md
Original file line number Diff line number Diff line change
Expand Up @@ -132,7 +132,7 @@ Hits and fills serve only exact-class `TGocciaObjectValue` / `TGocciaVMLiteralOb

Cached pointers (scope, shape) are compared for identity only and never dereferenced. Scope cache entries carry an entry-version stamp against allocator address reuse; shape entries need none, because shapes are never freed within an engine's lifetime, function templates never outlive their engine, and cross-realm maps stop shape tracking before a foreign realm can cache their owner layout.

Computed property access (`OP_ARRAY_GET`/`OP_ARRAY_SET`, `OP_GET_INDEX`/`OP_SET_INDEX`, `OP_DEL_INDEX`) shares one key-classification and receiver-dispatch implementation (`ClassifyPropertyKey` plus the `ExecGet/ExecSet/ExecDeleteComputedProperty` cores in `Goccia.VM.pas`); per-opcode semantic differences are explicit `TGocciaComputedAccessOptions`, not divergent copies.
Computed property access (`OP_ARRAY_GET`/`OP_ARRAY_SET`, `OP_GET_INDEX`/`OP_SET_INDEX`, `OP_DEL_INDEX`) shares one key-classification and receiver-dispatch implementation (`ClassifyPropertyKey` plus the `ExecGet/ExecSet/ExecDeleteComputedProperty` cores in `Goccia.VM.pas`); per-opcode semantic differences are explicit `TGocciaComputedAccessOptions`, not divergent copies. A non-BigInt `TGocciaTypedArrayValue` receiver at an array-index key takes an unboxed element fast path (`TryReadIndexedScalar`/`TryWriteIndexedScalar`): reads move the element straight into a register scalar and numeric-scalar writes store it directly, so neither allocates the heap `TGocciaNumberLiteralValue` or index-name string the generic object branch would. BigInt kinds, non-index keys, and non-scalar write values fall through to the boxed path; an out-of-range or detached **read** does too (yielding `undefined`). A non-BigInt scalar **write**, however, keeps its integer-indexed exotic semantics in place even for an out-of-range index or immutable backing buffer — the store is skipped and reported as successful, never boxed. All value semantics are preserved, including the observable `ToNumber` ordering of integer-indexed `[[Set]]`.

The current optimization target is reducing bytecode-mode suite time further without diverging interpreter and bytecode semantics.

Expand Down
8 changes: 5 additions & 3 deletions docs/core-patterns.md
Original file line number Diff line number Diff line change
Expand Up @@ -386,11 +386,13 @@ String interning (caching `TGocciaStringLiteralValue` instances in a `TDictionar
- **Dictionary lookup cost exceeds allocation cost.** FreePascal's allocator is fast. A `TDictionary.TryGetValue` call involves hashing the string (O(n) in string length) plus a hash-table probe, which is more expensive than simply allocating a short-lived `TGocciaStringLiteralValue` and letting the GC reclaim it later.
- **Low hit rate on hot paths.** `ToStringLiteral` on numbers produces mostly unique strings (`"42"`, `"3.14"`, etc.) that never hit the cache, paying the hash cost with zero benefit. This path is called frequently in arithmetic-heavy benchmarks.
- **`RuntimeCopy` is the wrong interception point.** Every string literal evaluation goes through `RuntimeCopy`. Adding a dictionary lookup to this universal hot path penalizes all string operations, including those that create one-off strings (concatenation results, method return values).
- **GC pressure is not the bottleneck.** The SmallInt cache works for numbers because integer equality is a single comparison. String equality requires content comparison, so the lookup cost scales with string length rather than being O(1).
- **GC pressure is not the bottleneck.** The number special-value singletons work because the check is a single equality against a fixed set. String equality requires content comparison, so the lookup cost scales with string length rather than being O(1).

**The `SmallInt` cache works because:** integer comparison is a single machine instruction, the cache is a fixed-size array (no hashing), and the hit rate for integers 0–255 is very high in typical code. None of these properties hold for arbitrary strings.
**The number special-value singletons work because:** they are a tiny fixed set (`0`, `1`, `NaN`, `±Infinity`, `-0`) matched by direct comparison in `RuntimeCopy` — no hashing, no array, no range — with a high hit rate in typical code. There is **no** general small-integer (e.g. 0–255) range cache: earlier revisions of this doc and `garbage-collector.md` described one, but it was never implemented, and a spike that added it (plus `±Infinity`/`NaN` reuse on the VM boxing path) measured **no runtime gain** — see the boxed-numbers note below. None of the singletons' properties hold for arbitrary strings.

**Do not re-attempt** dictionary-based string interning. If string allocation becomes a measurable bottleneck in future profiling, consider instead: (a) pre-allocated singletons for a small fixed set of ultra-common strings (like `SmallInt` but for `"length"`, `"undefined"`, etc.), or (b) arena/pool allocation for `TGocciaStringLiteralValue` objects to reduce per-object GC overhead without per-string hashing.
**Do not re-attempt** dictionary-based string interning. If string allocation becomes a measurable bottleneck in future profiling, consider instead: (a) pre-allocated singletons for a small fixed set of ultra-common strings (like the number special-value singletons but for `"length"`, `"undefined"`, etc.), or (b) arena/pool allocation for `TGocciaStringLiteralValue` objects to reduce per-object GC overhead without per-string hashing.

The same result holds for **boxed numbers**: adding a small-integer range cache and reusing `±Infinity`/`NaN` singletons in the bytecode VM's `RegisterToValue` boxing path cut allocations ~25% on an allocation-heavy typed-array test but produced **no runtime improvement** (interleaved median +2.2%). Reducing allocation *count* is not, by itself, a runtime lever in this codebase — see [ADR 0081](adr/0081-reject-value-caches-for-allocation-reduction.md) for the data, the narrow exceptions that do pay off, and the interleaved-measurement guardrail.

## Related documents

Expand Down
4 changes: 2 additions & 2 deletions docs/garbage-collector.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ end;
- **`AfterConstruction` / `BeforeDestruction`** — Every value auto-registers with the thread-local `TGarbageCollector.Instance` upon creation and unregisters before destruction so root sets cannot retain stale object pointers.
- **`MarkReferences`** — Base implementation sets `FGCMark := GCCurrentMark` (marking the object as alive for the current collection). `AdvanceMark` increments the shared `GCCurrentMark` while the collector lock is held, and `TGarbageCollector.Instance` uses that mark while traversing objects. Subclasses override `MarkReferences` to also mark values they reference (e.g., `TGocciaObjectValue` marks its prototype and property values, `TGocciaFunctionValue` marks its closure scope, `TGocciaArrayValue` marks its elements). The `if GCMarked then Exit;` guard at the top of each override prevents re-visiting objects in cyclic reference graphs.
- **`TraceWeakReferences` / `SweepWeakReferences`** — Optional hooks for weak containers and weak references. The default implementations do nothing. WeakMap uses `TraceWeakReferences` as an ephemeron pass: if a key is already marked by normal roots, its value is marked, but the key is never marked by the map. WeakMap and WeakSet use `SweepWeakReferences` to remove entries whose keys/values remain unmarked. WeakRef clears an unmarked target, and FinalizationRegistry removes dead cells while enqueueing cleanup jobs for their held values.
- **`RuntimeCopy`** — Creates a fresh GC-managed copy of the value. Used by the evaluator when evaluating literal expressions: AST-owned literal values are not tracked by the GC, so `RuntimeCopy` produces a runtime value that is. The default implementation returns `Self` (for singletons and complex values). Primitives override this: numbers use the `SmallInt` cache for 0-255, booleans return singletons, strings create new instances (cheap due to copy-on-write).
- **`RuntimeCopy`** — Creates a fresh GC-managed copy of the value. Used by the evaluator when evaluating literal expressions: AST-owned literal values are not tracked by the GC, so `RuntimeCopy` produces a runtime value that is. The default implementation returns `Self` (for singletons and complex values). Primitives override this: numbers reuse the special-value singletons (`0`, `1`, `NaN`, `±Infinity`, `-0`) and otherwise create a fresh instance, booleans return singletons, strings create new instances (cheap due to copy-on-write).

## Contributor Rules

Expand Down Expand Up @@ -151,7 +151,7 @@ The separate `memory.heap` JSON object comes from FreePascal's `GetHeapStatus`,

The parser creates `TGocciaValue` instances (numbers, strings, booleans) and stores them inside `TGocciaLiteralExpression` AST nodes. These values are owned by the AST, not the GC. `TGocciaLiteralExpression.Create` calls `TGarbageCollector.Instance.UnregisterObject` to remove the value from GC tracking, and `TGocciaLiteralExpression.Destroy` frees the value (unless it is a singleton like `UndefinedValue`, `TrueValue`, or `FalseValue`).

When the evaluator encounters a literal expression, it calls `Value.RuntimeCopy` to produce a fresh GC-managed runtime value. This cleanly separates compile-time constants (owned by the AST) from runtime values (managed by the GC). The overhead is minimal: integers 0-255 hit the `SmallInt` cache (zero allocation), booleans return singletons, and strings benefit from FreePascal's copy-on-write semantics.
When the evaluator encounters a literal expression, it calls `Value.RuntimeCopy` to produce a fresh GC-managed runtime value. This cleanly separates compile-time constants (owned by the AST) from runtime values (managed by the GC). The overhead is minimal: `0`, `1`, and the special values (`NaN`, `±Infinity`, `-0`) reuse singletons (zero allocation), other numbers allocate cheaply, booleans return singletons, and strings benefit from FreePascal's copy-on-write semantics.

## Related Documents

Expand Down
25 changes: 25 additions & 0 deletions source/units/Goccia.VM.Registers.pas
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ function RegisterHole: TGocciaRegister; inline;
function RegisterBoolean(const AValue: Boolean): TGocciaRegister; inline;
function RegisterInt(const AValue: Int64): TGocciaRegister; inline;
function RegisterFloat(const AValue: Double): TGocciaRegister; inline;
function RegisterFromDouble(const AValue: Double): TGocciaRegister; inline;
function RegisterObject(const AValue: TGocciaValue): TGocciaRegister; inline;
function ValueToRegister(const AValue: TGocciaValue): TGocciaRegister; inline;
function RegisterToValue(const ARegister: TGocciaRegister): TGocciaValue; inline;
Expand Down Expand Up @@ -83,6 +84,30 @@ function RegisterFloat(const AValue: Double): TGocciaRegister; inline;
Result.FloatValue := AValue;
end;

function RegisterFromDouble(const AValue: Double): TGocciaRegister; inline;
var
Bits: Int64 absolute AValue;
begin
// Build a register directly from a raw Double without ever allocating a heap
// TGocciaNumberLiteralValue. Mirrors the number branch of VMValueToRegisterFast:
// exact integers in LongInt range become grkInt (so downstream scalar opcodes and
// the Zero/One singletons engage on later boxing), and -0.0 stays float to keep
// its sign bit. NaN/Infinity/non-integers stay float.
if AValue = 0.0 then
begin
if Bits < 0 then
Exit(RegisterFloat(AValue)); // -0.0: preserve the sign bit as a float
Exit(RegisterInt(0));
end;
if AValue = 1.0 then
Exit(RegisterInt(1));
if (not IsNaN(AValue)) and (not IsInfinite(AValue)) and
(Frac(AValue) = 0.0) and
(AValue >= Low(LongInt)) and (AValue <= High(LongInt)) then
Exit(RegisterInt(Trunc(AValue)));
Result := RegisterFloat(AValue);
end;

function RegisterObject(const AValue: TGocciaValue): TGocciaRegister; inline;
begin
Result.Kind := grkObject;
Expand Down
30 changes: 29 additions & 1 deletion source/units/Goccia.VM.pas
Original file line number Diff line number Diff line change
Expand Up @@ -521,7 +521,8 @@ implementation
Goccia.Values.ProxyValue,
Goccia.Values.Shape,
Goccia.Values.ToObject,
Goccia.Values.ToPrimitive;
Goccia.Values.ToPrimitive,
Goccia.Values.TypedArrayValue;

const
BYTECODE_PRIVATE_SLOT_PREFIX = '#slot:';
Expand Down Expand Up @@ -7656,11 +7657,24 @@ procedure TGocciaVM.ExecGetComputedProperty(const ADest: Integer;
Key: TGocciaPropertyKey;
KeyName: string;
ReceiverArray: TGocciaArrayValue;
FastIndex: Integer;
FastElement: Double;
begin
if (caoThrowOnNullUndefined in AOptions) and
(AObjReg.Kind in [grkUndefined, grkNull]) then
ThrowTypeError(SErrorCannotConvertNullOrUndefined,
SSuggestCheckNullBeforeAccess)
else if (AObjReg.Kind = grkObject) and
(AObjReg.ObjectValue is TGocciaTypedArrayValue) and
TryGetArrayIndexRegister(AKeyReg, FastIndex) and
TGocciaTypedArrayValue(AObjReg.ObjectValue)
.TryReadIndexedScalar(FastIndex, FastElement) then
// Typed-array unboxed element read: the element goes straight into the
// destination register as a scalar, with no heap TGocciaNumberLiteralValue and
// no IntToStr index name. Non-index keys, BigInt kinds, and out-of-range indices
// fall through to the generic object branch below, which handles length, methods,
// `undefined` for out-of-range reads, BigInt boxing, and symbol keys unchanged.
FRegisters[ADest] := RegisterFromDouble(FastElement)
else if (AObjReg.Kind = grkObject) and
(AObjReg.ObjectValue is TGocciaArrayValue) then
begin
Expand Down Expand Up @@ -7752,7 +7766,21 @@ procedure TGocciaVM.ExecSetComputedProperty(const ATargetIndex: Integer;
Value: TGocciaValue;
TargetValue: TGocciaValue;
BoxedTarget: TGocciaObjectValue;
FastIndex: Integer;
begin
// Typed-array unboxed element write: a numeric-scalar value going to a valid
// integer index stores directly, with no heap TGocciaNumberLiteralValue and no
// IntToStr index name. ToNumber on a Number is side-effect-free, so the spec's
// observable conversion is preserved. BigInt kinds (a Number value must throw),
// non-index keys, and non-scalar values fall through to the boxed path below.
if (FRegisters[ATargetIndex].Kind = grkObject) and
(FRegisters[ATargetIndex].ObjectValue is TGocciaTypedArrayValue) and
RegisterIsNumericScalar(AValueReg) and
TryGetArrayIndexRegister(AKeyReg, FastIndex) and
TGocciaTypedArrayValue(FRegisters[ATargetIndex].ObjectValue)
.TryWriteIndexedScalar(FastIndex, RegisterToDouble(AValueReg)) then
Exit;

Value := RegisterToValue(AValueReg);
if (FRegisters[ATargetIndex].Kind = grkObject) and
(FRegisters[ATargetIndex].ObjectValue is TGocciaArrayValue) then
Expand Down
Loading
Loading