Skip to content

Perf next steps: render path is optimized, JS main thread is now the bottleneck #46

@DFearing

Description

@DFearing

Summary

Empirical comparison of the major perf checkpoints since the upstream merge-base, captured 2026-05-02.

Methodology

Results

Drag bench/results/perf-chart.png into this issue to attach the chart.

Stack FPS @ 1× p95 @ 1× FPS @ 4× p95 @ 4× Long-tasks @ 4× Heap peak
Baseline (59ccf4e, 1 canvas) 47 25 ms 9.8 120 ms 87.2 s 23 MB
PR #31 (df3bd94) 61 34 ms ⚠️ 10.4 137 ms ⚠️ 79.9 s 44 MB
PR #36 (680cb12) 115 19 ms 11.2 121 ms 80.6 s 44 MB
PR #43 (ed97e12) 112 19 ms 11.6 115 ms 80.9 s 42 MB
main / PR #45 (c20d844) 112 19 ms 11.5 118 ms 80.9 s 43 MB

⚠️ PR #31's partial Pixi migration regressed p95 frame time at both throttles — half-Pixi/half-DOM was worse than either alone. The win required completing the migration (PR #36).

Where the wins did NOT come from (and why)

The bench shows zero improvement from #43 and #45 — but that's a measurement-design issue, not a sign the work was wasted:

These optimizations may still be load-bearing; the bench just isn't designed to surface them. See "Direction" item 3 below.

Direction for future perf work

The render path is mostly tapped out for this workload. To meaningfully move 4× throttle FPS or scale beyond 3 concurrent canvases, the next gains have to come from the JS main thread.

In priority order:

1. Profile long tasks at 4× throttle (cheap, high-information)

80s of blocking time over a 90s window means almost everything is one long task. Without knowing what's in them, further optimization is guessing. Concrete steps:

  • Run a representative `concurrent` workload at 4× throttle with the Chromium Performance recorder attached
  • Identify the top 3 long-task call stacks
  • Likely suspects: sim event ingestion, React reconciliation, scene-graph diff
  • File a separate sub-issue per hot path with the call-stack screenshot

2. Reduce React reconciliation cost

The bench captures ~3500-3800 React commits per 90s window with 3 canvases — that's ~13 commits/sec/canvas. Worth investigating:

  • Are panel state changes (Messages, Files, $Cost, Timeline toggles) triggering full-tree reconciliation that should be scoped to one panel?
  • Is per-event sim ingestion fanning out into multiple React state writes when one batched write would do?
  • Move panel state into separate context providers so canvas-state changes don't reconcile the panel subtrees and vice versa

3. Move sim event ingestion off the main thread

SimulationManager's shared rAF processes events for all sessions in lockstep on the main thread. Web Worker for ingestion (relay events → diff → postMessage minimal updates to React) would free the main-thread budget that's currently spent on JSON parsing + state diffing.

4. Add bench scenarios that exercise the optimizations this run missed

The current bench understates #43 and #45 because the scenario is "everything visible, all the time". Suggest adding:

5. Measure Pixi's residual per-frame blit

Per #45's caveat: WebGL backend's GlRenderTargetAdaptor.postrender still does canvasSource.context2D.drawImage(contextCanvas, ...) per render target, per frame. Cost is unknown but bounded. A focused micro-bench (mock scene, vary canvas count, measure render() wall time) would tell us whether it's worth chasing WebGPU.

Tracking

  • This bench rerun supersedes Re-run benchmark matrix (track 2nd-pass measurements) #37's first-pass numbers for the recent stacks.
  • Files left in tree (uncommitted): `bench/perf-chart.mjs`, `bench/results/perf-comparison.jsonl`, `bench/results/perf-chart.png`, `bench/results/perf-chart.html`. Worth committing the chart script to a follow-up branch if we want it as standing infrastructure.

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions