perf(engine): trim Date shim slot fan-out and binary-search timezone lookup#897
Conversation
…lookup RCA of the staging/sm/Date/dst-offset-caching timeouts (#829): profiling showed the cost is the JS Date shim's per-call machinery, not DST-offset recomputation (the offset code profiles at 0 samples and short-circuits on UTC hosts). Per maintainer direction, fix the genuinely-ineffective code with Date staying in JS. - Goccia.Shims.pas: the Date [[DateValue]] stays in a WeakMap (only non-observable slot store), but the hot accessors now do a single WeakMap.get with `=== undefined` as the brand check (a slot is always a number, never undefined), plus a numeric fast-path in __GocciaDateFromSingleArgument and a single-read #local/#utc. ~1.52x on `new Date(NaN); setTime; getTimezoneOffset`. - Goccia.Temporal.TimeZone.pas: IsSupportedCanonicalTimeZoneIdentifier scanned the ~446-entry available-zone list linearly on every ZonedDateTime canonicalization; the list is already CompareStr-sorted, so use a binary search. ~1.20x on named-zone construction. Does not clear the 20s deadline (needs ~15x; residual is inherent interpreter overhead). Refs #819, #829. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub. 1 Skipped Deployment
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughFixes the embedded JavaScript ChangesDate Shim Fixes, Timezone Binary Search, and Tests
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related issues
Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
Comment |
Suite TimingTest Runner (interpreted: 10,990 passed; bytecode: 10,990 passed)
MemoryGC rows aggregate the main thread plus all worker thread-local GCs. Test runner worker shutdown frees thread-local heaps in bulk; that shutdown reclamation is not counted as GC collections or collected objects.
Benchmarks (interpreted: 436; bytecode: 436)
MemoryGC rows aggregate the main thread plus all worker thread-local GCs. Benchmark runner performs explicit between-file collections, so collection and collected-object counts can be much higher than the test runner.
Measured on ubuntu-latest x64. |
Benchmark Results436 benchmarks · PR vs same-runner Interpreted: 🟢 27 improved · 🔴 28 regressed · 381 unchanged · avg +0.0% Typical per-run noise (median variance): interpreted ±4.3%, bytecode ±2.4%. Deltas within noise overlap and read as unchanged. arraybuffer.js — Interp: 14 unch. · avg +0.7% · Bytecode: 🔴 1, 13 unch. · avg +0.4%
arrays.js — Interp: 🟢 3, 🔴 2, 14 unch. · avg -0.1% · Bytecode: 🔴 4, 15 unch. · avg -0.9%
async-await.js — Interp: 6 unch. · avg -5.4% · Bytecode: 🔴 1, 5 unch. · avg -4.2%
async-generators.js — Interp: 2 unch. · avg -0.9% · Bytecode: 2 unch. · avg +0.0%
atomics.js — Interp: 6 unch. · avg -2.4% · Bytecode: 🟢 1, 5 unch. · avg -1.8%
base64.js — Interp: 10 unch. · avg -0.5% · Bytecode: 🔴 1, 9 unch. · avg -0.5%
classes.js — Interp: 🔴 3, 28 unch. · avg -2.1% · Bytecode: 🟢 2, 🔴 1, 28 unch. · avg -0.7%
closures.js — Interp: 11 unch. · avg -0.5% · Bytecode: 🟢 1, 10 unch. · avg +1.7%
collections.js — Interp: 🔴 1, 11 unch. · avg -0.9% · Bytecode: 12 unch. · avg -1.8%
csv.js — Interp: 13 unch. · avg -2.2% · Bytecode: 🔴 1, 12 unch. · avg +0.8%
destructuring.js — Interp: 🟢 1, 🔴 3, 18 unch. · avg -1.4% · Bytecode: 🟢 1, 🔴 4, 17 unch. · avg -0.4%
fibonacci.js — Interp: 8 unch. · avg -0.0% · Bytecode: 🔴 2, 6 unch. · avg -3.5%
float16array.js — Interp: 🟢 1, 🔴 1, 30 unch. · avg +0.8% · Bytecode: 🟢 1, 🔴 5, 26 unch. · avg -2.7%
for-in/for-in.js — Interp: 🟢 1, 2 unch. · avg +5.7% · Bytecode: 3 unch. · avg -1.9%
for-of.js — Interp: 🔴 2, 5 unch. · avg -2.0% · Bytecode: 🟢 1, 6 unch. · avg +0.1%
generators.js — Interp: 4 unch. · avg +1.2% · Bytecode: 🟢 1, 3 unch. · avg +0.7%
intl.js — Interp: 🟢 2, 🔴 1, 3 unch. · avg +3.3% · Bytecode: 6 unch. · avg +2.3%
iterators.js — Interp: 🟢 6, 36 unch. · avg +4.9% · Bytecode: 🔴 7, 35 unch. · avg -2.1%
json.js — Interp: 🔴 6, 17 unch. · avg -4.0% · Bytecode: 🟢 4, 🔴 1, 18 unch. · avg +1.5%
jsx.jsx — Interp: 🟢 1, 🔴 2, 18 unch. · avg -2.5% · Bytecode: 🔴 4, 17 unch. · avg -2.6%
modules.js — Interp: 🟢 1, 8 unch. · avg +5.1% · Bytecode: 9 unch. · avg -1.4%
numbers.js — Interp: 11 unch. · avg -1.7% · Bytecode: 🔴 1, 10 unch. · avg -1.2%
objects.js — Interp: 🔴 1, 6 unch. · avg -4.5% · Bytecode: 7 unch. · avg +0.0%
promises.js — Interp: 🔴 1, 11 unch. · avg -4.4% · Bytecode: 12 unch. · avg +0.6%
property-access.js — Interp: 5 unch. · avg +2.3% · Bytecode: 5 unch. · avg +0.1%
regexp.js — Interp: 🟢 1, 🔴 1, 9 unch. · avg -1.2% · Bytecode: 11 unch. · avg +2.8%
strings.js — Interp: 🟢 1, 18 unch. · avg -0.8% · Bytecode: 🟢 2, 🔴 1, 16 unch. · avg -0.0%
temporal.js — Interp: 6 unch. · avg -2.0% · Bytecode: 6 unch. · avg +1.2%
tsv.js — Interp: 🟢 1, 8 unch. · avg +6.0% · Bytecode: 9 unch. · avg +0.9%
typed-arrays.js — Interp: 🟢 2, 🔴 2, 18 unch. · avg +2.1% · Bytecode: 🔴 6, 16 unch. · avg -9.1%
uint8array-encoding.js — Interp: 🟢 3, 🔴 1, 14 unch. · avg -0.5% · Bytecode: 🟢 10, 🔴 2, 6 unch. · avg +20.2%
weak-collections.js — Interp: 🟢 3, 🔴 1, 11 unch. · avg +5.6% · Bytecode: 🟢 7, 🔴 2, 6 unch. · avg +3.7%
Deterministic profile diffDeterministic profile diff: no significant changes. Measured on ubuntu-latest x64. Each PR run also builds the |
test262 Conformance
Areas closest to 100%
Per-test deltas (+1 / -0)Newly passing (1):
Steady-state failures are non-blocking; regressions vs the cached main baseline (lower total pass count, or any PASS → non-PASS transition) fail the conformance gate. Measured on ubuntu-latest x64, bytecode mode. Areas grouped by the first two test262 path components; minimum 25 attempted tests, areas already at 100% excluded. Δ vs main compares against the most recent cached |
Summary
staging/sm/Date/dst-offset-caching-*-of-8.jstimeouts came from uncached DST-offset recomputation inGoccia.Temporal.TimeZone.pas. Profiling (macOSsample, prod bytecode build, pinned test262 SHA) refuted that:GetUtcOffsetSeconds/GetSystemTimeZoneId/BinarySearchTransitionsappear in 0 of ~9,800 samples, and on a UTC host (GitHub runners) the offset path short-circuits to 0 entirely. The test is the SpiderMonkey SunSpider DST-cache stress test — a 4-deep nested loop doing ~3Mnew Date(NaN); setTime; getTimezoneOffsetper shard — so the cost is the JSDateshim's per-call machinery, not the offset code. Per maintainer direction: fix the genuinely-ineffective machinery, keepDatein JS (hard line), no plaster.source/units/Goccia.Shims.pas). The[[DateValue]]stays in a WeakMap — the only non-observable slot store (a private field isn't viable: the exportedDateis a function constructor so instances aren't class-private-bearing, and a symbol key would leak viagetOwnPropertySymbols). Collapsed the helper fan-out: hot accessors now do a singleWeakMap.getwith=== undefinedas the brand check (a slot is always a number, neverundefined), plus a numeric fast-path in__GocciaDateFromSingleArgumentand a single-read#local/#utc. ~1.52× on the test's exact inner call (new Date(NaN); setTime; getTimezoneOffset: 114.0 → 74.7 µs/call; rigorousgit stashA/B with an identicalm1normalizer).source/units/Goccia.Temporal.TimeZone.pas).IsSupportedCanonicalTimeZoneIdentifierlinearly scanned the ~446-entry available-zone list on every ZonedDateTime time-zone canonicalization. The list is alreadyCompareStr-sorted (SortTimeZoneIdentifiers), so it now binary-searches (O(n) → O(log n)). ~1.20× on named-zone construction (9.10 → 7.52 µs/call). Matches the project's existing O(1)/O(k) lookup pattern (perf(engine): make for-in key dedup O(k) with a hash set #888 / perf(engine): back strong Map/Set with a SameValueZero-keyed ordered hash #890).Dateis not moved native (maintainer hard line). This does not make the 8 dst shards pass — see below.Closes #829— see the divergence note.)Clearing the 20 s deadline needs ~15×, but the test's intrinsic
new Date; setTimeis already ~67 µs/call before any timezone work, and the residual is diffuse, inherent interpreter overhead (is/InheritsFromtype dispatch ~15%, scope-binding lookups ~8.5%, allocation ~12%, exception-frame setup ~8.5%) — already largely optimised by prior PRs. Closing the gap would need either nativeDatehot-ops (excluded) or a larger interpreter-throughput effort (e.g. a value-kind type tag, a global-binding inline cache). The shards still cleanly time out (Wrapper-infra 0, no crash). Recommend tracking the interpreter-throughput residual as a follow-up under the #819 perf umbrella, and leaving the close decision on #829 to the maintainer.Testing
RangeErrorpaths preserved).Details
./format.pas --check: clean (372 files). Pre-commit hooks (trunc-guards, format) pass.benchmarks/temporal.jsboth modes (incl.ZonedDateTime.from named time zoneandsince across DST, which exercise the binary search): run clean, no regression.tests/built-ins/Date/methods.js— slot-requiring methods reject a plain non-Date receiver ({}andObject.create(Date.prototype), which has the prototype but no slot) →TypeError; constructor reads the time value from aDateargument and clips numbers /NaN.tests/built-ins/Temporal/ZonedDateTime/constructor.js— canonical named zones across the sorted lookup are preserved; a bad zone throwsRangeError.run_test262_suite.ts,--mode=bytecode --timeout-ms=20000), dst shard 1 still classifies as timeout/FAIL with Wrapper-infra 0 (no crash) — expected per the divergence note.🤖 Generated with Claude Code