You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
benchmarks/suite/09_method_calls.ts (10M calls to a trivial monomorphic counter.increment() where increment() is this.value = this.value + 1) runs in ~3300 ms vs Node's ~11 ms — still ~290× slower after the two improvements already landed/open. The remaining cost is per-field-access shape-guard calls, and closing it cleanly requires a GC typed-shape-layout change (high risk). This issue records the full analysis so the work can resume from a known state. Putting it in standby for now.
Per counter.increment() iteration, after #5092 the body is inlined but each this.value read/write still goes through a typed-feedback class-field shape guard — a non-inlined cross-crate call:
When typed feedback is disabled (the default), each guard reduces to class_field_fast_contract (guards.rs:284): it validates class_id + keys_array + field_count, and for a number-typed (raw-f64) field additionally calls layout_typed_raw_f64_slot_for_user (crates/perry-runtime/src/gc/layout.rs:743), a thread-local PtrHashMap (TYPED_LAYOUTS) lookup. So per iteration: 2 non-inlined guard calls (+ the slot load/store the fast path already emits). ×10M ≈ the 290× gap.
Codegen emits this at crates/perry-codegen/src/expr/property_get.rs:1551 (class-field GET, known slot index) and the property_set.rs counterpart — both wrap a direct getelementptr+load/store (object header is 24 bytes; ObjectHeader#[repr(C)] at crates/perry-runtime/src/object/mod.rs:2293) behind the guard call.
Why the cheap/safe shortcuts do NOT work (measured)
MRU-caching the raw-f64 layout lookup (1-entry thread-local cache in layout_typed_raw_f64_slot_for_user): measured ~3300 ms → ~7050 ms (2× SLOWER). The PtrHashMap is already fast; adding a second thread-local access costs more TLS overhead than it saves. The dominant cost is the guard CALL itself, not what's inside it.
Conclusion: there is no intermediate win. Either the guard call stays, or the guard is inlined.
The plan (A2) — inline the class-field shape guard
Replace the js_typed_feedback_class_field_{get,set}_guardcall with inline LLVM at the emission sites (property_get.rs:1551, property_set.rs):
Inline the cheap part of the contract: load class_id (i32 @ obj+4) and keys_array (ptr @ obj+16), compare to the expected compile-time constants; plus a plain (hoistable) load of the process-global descriptors flag. Keep the by-name fallback for the guard-fail edge.
Counter.value: number is a raw-f64 candidate, so the guard passes require_raw_f64 = 1 and the contract additionally calls layout_typed_raw_f64_slot_for_user, which is a thread-local PtrHashMap lookup, not an O(1) header field — so it is NOT cheaply inline-able. A correct inline guard for raw-f64 fields requires the per-object raw-f64 slot mask to live somewhere O(1)-loadable (object header or GC header). Getting this wrong = reading a NaN-boxed value as a raw double → silent memory/value corruption. This is the crux and the reason this is high-risk:
The per-object raw-f64 layout can downgrade (a non-number written to a number-typed field via an any alias makes the slot non-raw); the guard's layout check is what catches that. An inline guard must preserve this, or skip the raw-f64 fast path for fields that can downgrade.
descriptors_in_use() (accessor descriptors) must also gate the fast path.
Suggested approach
Move the per-object raw-f64 slot mask into an O(1)-loadable location (GC header / object header), updated wherever TYPED_LAYOUTS is mutated today (~20 sites in gc/layout.rs), so the inline guard can bit-test it cheaply.
Validate against the full local parity suite + cargo test workspace; add a targeted test that stores a non-number into a number-typed field via an any alias and reads it back (the downgrade case).
Target: method_calls ≤ ~50 ms (≤ ~3× Node).
Risk: HIGH (GC typed-shape layout change; memory-corruption class). Effort: L. Recommend a maintainer-driven change, not autonomous.
Summary
benchmarks/suite/09_method_calls.ts(10M calls to a trivial monomorphiccounter.increment()whereincrement()isthis.value = this.value + 1) runs in ~3300 ms vs Node's ~11 ms — still ~290× slower after the two improvements already landed/open. The remaining cost is per-field-access shape-guard calls, and closing it cleanly requires a GC typed-shape-layout change (high risk). This issue records the full analysis so the work can resume from a known state. Putting it in standby for now.What already landed
js_typed_feedback_register_sitecall per property access. Big win for dynamic property access (bench_object_property1127 ms → ~310 ms, 3.6×), but only ~2% onmethod_calls.this.fieldmethods on exact receivers. Two inliner gaps fixed (loop bodies seeded with empty exact-receiver facts;is_inlinablerejecting every method that touchesthis).increment()is now inlined into the loop.method_calls: 5229 ms → ~3300 ms (1.6×).Root cause of the remaining ~290×
Per
counter.increment()iteration, after #5092 the body is inlined but eachthis.valueread/write still goes through a typed-feedback class-field shape guard — a non-inlined cross-crate call:js_typed_feedback_class_field_get_guard(...)(crates/perry-runtime/src/typed_feedback/guards.rs)js_typed_feedback_class_field_set_guard(...)When typed feedback is disabled (the default), each guard reduces to
class_field_fast_contract(guards.rs:284): it validatesclass_id+keys_array+field_count, and for anumber-typed (raw-f64) field additionally callslayout_typed_raw_f64_slot_for_user(crates/perry-runtime/src/gc/layout.rs:743), a thread-localPtrHashMap(TYPED_LAYOUTS) lookup. So per iteration: 2 non-inlined guard calls (+ the slot load/store the fast path already emits). ×10M ≈ the 290× gap.Codegen emits this at
crates/perry-codegen/src/expr/property_get.rs:1551(class-field GET, known slot index) and theproperty_set.rscounterpart — both wrap a directgetelementptr+load/store(object header is 24 bytes;ObjectHeader#[repr(C)]atcrates/perry-runtime/src/object/mod.rs:2293) behind the guard call.Why the cheap/safe shortcuts do NOT work (measured)
register_site(perf(codegen): make typed-feedback site registration opt-in (3.6x on dynamic property access) #5084): ~2% onmethod_calls— register_site was not the cost here.layout_typed_raw_f64_slot_for_user): measured ~3300 ms → ~7050 ms (2× SLOWER). ThePtrHashMapis already fast; adding a second thread-local access costs more TLS overhead than it saves. The dominant cost is the guard CALL itself, not what's inside it.Conclusion: there is no intermediate win. Either the guard call stays, or the guard is inlined.
The plan (A2) — inline the class-field shape guard
Replace the
js_typed_feedback_class_field_{get,set}_guardcall with inline LLVM at the emission sites (property_get.rs:1551,property_set.rs):class_id(i32 @ obj+4) andkeys_array(ptr @ obj+16), compare to the expected compile-time constants; plus a plain (hoistable) load of the process-global descriptors flag. Keep the by-name fallback for the guard-fail edge.load/fadd/store.Correctness trap —
require_raw_f64Counter.value: numberis a raw-f64 candidate, so the guard passesrequire_raw_f64 = 1and the contract additionally callslayout_typed_raw_f64_slot_for_user, which is a thread-localPtrHashMaplookup, not an O(1) header field — so it is NOT cheaply inline-able. A correct inline guard for raw-f64 fields requires the per-object raw-f64 slot mask to live somewhere O(1)-loadable (object header or GC header). Getting this wrong = reading a NaN-boxed value as a rawdouble→ silent memory/value corruption. This is the crux and the reason this is high-risk:number-typed field via ananyalias makes the slot non-raw); the guard's layout check is what catches that. An inline guard must preserve this, or skip the raw-f64 fast path for fields that can downgrade.descriptors_in_use()(accessor descriptors) must also gate the fast path.Suggested approach
TYPED_LAYOUTSis mutated today (~20 sites ingc/layout.rs), so the inline guard can bit-test it cheaply.cargo testworkspace; add a targeted test that stores a non-number into anumber-typed field via ananyalias and reads it back (the downgrade case).method_calls≤ ~50 ms (≤ ~3× Node).Risk: HIGH (GC typed-shape layout change; memory-corruption class). Effort: L. Recommend a maintainer-driven change, not autonomous.
Files
benchmarks/suite/09_method_calls.tscrates/perry-codegen/src/expr/property_get.rs:1551,crates/perry-codegen/src/expr/property_set.rscrates/perry-codegen/src/lower_call/property_get.rs:1314crates/perry-runtime/src/typed_feedback/guards.rs:284,318crates/perry-runtime/src/gc/layout.rs:743crates/perry-runtime/src/object/mod.rs:2293This continues the
method_callsline of work from #5084 and #5092.