perf(method dispatch): method_calls ~290× Node — remaining cost is per-field-access shape-guard calls (plan + standby)

## Summary

`benchmarks/suite/09_method_calls.ts` (10M calls to a trivial monomorphic `counter.increment()` where `increment()` is `this.value = this.value + 1`) runs in **~3300 ms** vs Node's **~11 ms** — still **~290× slower** after the two improvements already landed/open. The remaining cost is **per-field-access shape-guard calls**, and closing it cleanly requires a GC typed-shape-layout change (high risk). This issue records the full analysis so the work can resume from a known state. Putting it in standby for now.

## What already landed

- **#5084** — typed-feedback site *registration* made opt-in. Removed a 14-arg no-op `js_typed_feedback_register_site` call per property access. Big win for dynamic property access (`bench_object_property` 1127 ms → ~310 ms, 3.6×), but only ~2% on `method_calls`.
- **#5092** — inline small `this.field` methods on exact receivers. Two inliner gaps fixed (loop bodies seeded with empty exact-receiver facts; `is_inlinable` rejecting every method that touches `this`). `increment()` is now inlined into the loop. **`method_calls`: 5229 ms → ~3300 ms (1.6×).**

## Root cause of the remaining ~290×

Per `counter.increment()` iteration, after #5092 the body is inlined but each `this.value` read/write still goes through a **typed-feedback class-field shape guard** — a non-inlined cross-crate call:

- read  → `js_typed_feedback_class_field_get_guard(...)` (`crates/perry-runtime/src/typed_feedback/guards.rs`)
- write → `js_typed_feedback_class_field_set_guard(...)`

When typed feedback is disabled (the default), each guard reduces to `class_field_fast_contract` (`guards.rs:284`): it validates `class_id` + `keys_array` + `field_count`, and for a `number`-typed (raw-f64) field additionally calls `layout_typed_raw_f64_slot_for_user` (`crates/perry-runtime/src/gc/layout.rs:743`), a thread-local `PtrHashMap` (`TYPED_LAYOUTS`) lookup. So per iteration: **2 non-inlined guard calls** (+ the slot load/store the fast path already emits). ×10M ≈ the 290× gap.

Codegen emits this at `crates/perry-codegen/src/expr/property_get.rs:1551` (class-field GET, known slot index) and the `property_set.rs` counterpart — both wrap a direct `getelementptr`+`load`/`store` (object header is 24 bytes; `ObjectHeader` `#[repr(C)]` at `crates/perry-runtime/src/object/mod.rs:2293`) behind the guard call.

## Why the cheap/safe shortcuts do NOT work (measured)

- **Removing `register_site`** (#5084): ~2% on `method_calls` — register_site was not the cost here.
- **MRU-caching the raw-f64 layout lookup** (1-entry thread-local cache in `layout_typed_raw_f64_slot_for_user`): measured **~3300 ms → ~7050 ms (2× SLOWER)**. The `PtrHashMap` is already fast; adding a second thread-local access costs more TLS overhead than it saves. **The dominant cost is the guard CALL itself, not what's inside it.**

Conclusion: there is no intermediate win. Either the guard call stays, or the guard is inlined.

## The plan (A2) — inline the class-field shape guard

Replace the `js_typed_feedback_class_field_{get,set}_guard` **call** with inline LLVM at the emission sites (`property_get.rs:1551`, `property_set.rs`):

1. Inline the cheap part of the contract: load `class_id` (i32 @ obj+4) and `keys_array` (ptr @ obj+16), compare to the expected compile-time constants; plus a plain (hoistable) load of the process-global descriptors flag. Keep the by-name fallback for the guard-fail edge.
2. Once both the method (already, #5092) and the guard are inlined into the caller loop, all guard operands are loop-invariant → LLVM LICM can hoist the shape check out of the 10M-iteration loop, collapsing the body to a tight `load`/`fadd`/`store`.

### Correctness trap — `require_raw_f64`

`Counter.value: number` is a raw-f64 candidate, so the guard passes `require_raw_f64 = 1` and the contract additionally calls `layout_typed_raw_f64_slot_for_user`, which is a **thread-local `PtrHashMap` lookup, not an O(1) header field** — so it is NOT cheaply inline-able. A correct inline guard for raw-f64 fields requires the per-object raw-f64 slot mask to live somewhere O(1)-loadable (object header or GC header). Getting this wrong = reading a NaN-boxed value as a raw `double` → **silent memory/value corruption**. This is the crux and the reason this is high-risk:

- The per-object raw-f64 layout can **downgrade** (a non-number written to a `number`-typed field via an `any` alias makes the slot non-raw); the guard's layout check is what catches that. An inline guard must preserve this, or skip the raw-f64 fast path for fields that can downgrade.
- `descriptors_in_use()` (accessor descriptors) must also gate the fast path.

### Suggested approach

- Move the per-object raw-f64 slot mask into an O(1)-loadable location (GC header / object header), updated wherever `TYPED_LAYOUTS` is mutated today (~20 sites in `gc/layout.rs`), so the inline guard can bit-test it cheaply.
- Validate against the full local parity suite + `cargo test` workspace; add a targeted test that stores a non-number into a `number`-typed field via an `any` alias and reads it back (the downgrade case).
- Target: `method_calls` ≤ ~50 ms (≤ ~3× Node).

**Risk: HIGH (GC typed-shape layout change; memory-corruption class). Effort: L.** Recommend a maintainer-driven change, not autonomous.

## Files

- Benchmark: `benchmarks/suite/09_method_calls.ts`
- Guard emission: `crates/perry-codegen/src/expr/property_get.rs:1551`, `crates/perry-codegen/src/expr/property_set.rs`
- Method dispatch: `crates/perry-codegen/src/lower_call/property_get.rs:1314`
- Runtime guard + contract: `crates/perry-runtime/src/typed_feedback/guards.rs:284,318`
- raw-f64 layout (the thread-local hashmap): `crates/perry-runtime/src/gc/layout.rs:743`
- ObjectHeader layout: `crates/perry-runtime/src/object/mod.rs:2293`

This continues the `method_calls` line of work from #5084 and #5092.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(method dispatch): method_calls ~290× Node — remaining cost is per-field-access shape-guard calls (plan + standby) #5093

Summary

What already landed

Root cause of the remaining ~290×

Why the cheap/safe shortcuts do NOT work (measured)

The plan (A2) — inline the class-field shape guard

Correctness trap — `require_raw_f64`

Suggested approach

Files

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

perf(method dispatch): method_calls ~290× Node — remaining cost is per-field-access shape-guard calls (plan + standby) #5093

Description

Summary

What already landed

Root cause of the remaining ~290×

Why the cheap/safe shortcuts do NOT work (measured)

The plan (A2) — inline the class-field shape guard

Correctness trap — require_raw_f64

Suggested approach

Files

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Correctness trap — `require_raw_f64`