You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Empirical comparison of the major perf checkpoints since the upstream merge-base, captured 2026-05-02.
Render-path work is essentially complete. Baseline → main: 47 → 112 FPS at 1× CPU (+138%) while simultaneously rendering 3× as many canvases. Per-canvas effective throughput went up ~7×.
Under high CPU load (4× throttle), the perf work has barely moved the needle (10 → 12 FPS). Long-task blocking time stays at ~80s out of every 90s measurement window. The bottleneck has clearly moved from the render path to the JS main thread.
Methodology
Bench: bench/run-bench.mjs (Playwright + CDP, headless Chromium with --disable-frame-rate-limit --enable-precise-memory-info)
⚠️ PR #31's partial Pixi migration regressed p95 frame time at both throttles — half-Pixi/half-DOM was worse than either alone. The win required completing the migration (PR #36).
Where the wins did NOT come from (and why)
The bench shows zero improvement from #43 and #45 — but that's a measurement-design issue, not a sign the work was wasted:
perf(pixi): shared renderer migrates to multiView (rebased onto main) #45 multiView — drops a readPixels GPU↔CPU sync and per-viewport RenderTexture. Wins are in GPU memory + GPU time, neither of which the bench captures. The heap delta we do see is dominated by canvas count, not RenderTexture.
These optimizations may still be load-bearing; the bench just isn't designed to surface them. See "Direction" item 3 below.
Direction for future perf work
The render path is mostly tapped out for this workload. To meaningfully move 4× throttle FPS or scale beyond 3 concurrent canvases, the next gains have to come from the JS main thread.
In priority order:
1. Profile long tasks at 4× throttle (cheap, high-information)
80s of blocking time over a 90s window means almost everything is one long task. Without knowing what's in them, further optimization is guessing. Concrete steps:
Run a representative `concurrent` workload at 4× throttle with the Chromium Performance recorder attached
File a separate sub-issue per hot path with the call-stack screenshot
2. Reduce React reconciliation cost
The bench captures ~3500-3800 React commits per 90s window with 3 canvases — that's ~13 commits/sec/canvas. Worth investigating:
Are panel state changes (Messages, Files, $Cost, Timeline toggles) triggering full-tree reconciliation that should be scoped to one panel?
Is per-event sim ingestion fanning out into multiple React state writes when one batched write would do?
Move panel state into separate context providers so canvas-state changes don't reconcile the panel subtrees and vice versa
3. Move sim event ingestion off the main thread
SimulationManager's shared rAF processes events for all sessions in lockstep on the main thread. Web Worker for ingestion (relay events → diff → postMessage minimal updates to React) would free the main-thread budget that's currently spent on JSON parsing + state diffing.
4. Add bench scenarios that exercise the optimizations this run missed
The current bench understates #43 and #45 because the scenario is "everything visible, all the time". Suggest adding:
Apples-to-apples + workload-matched in one run — currently we only have one or the other; baseline's 1-canvas datapoint is the only apples-to-apples cell
5. Measure Pixi's residual per-frame blit
Per #45's caveat: WebGL backend's GlRenderTargetAdaptor.postrender still does canvasSource.context2D.drawImage(contextCanvas, ...) per render target, per frame. Cost is unknown but bounded. A focused micro-bench (mock scene, vary canvas count, measure render() wall time) would tell us whether it's worth chasing WebGPU.
Files left in tree (uncommitted): `bench/perf-chart.mjs`, `bench/results/perf-comparison.jsonl`, `bench/results/perf-chart.png`, `bench/results/perf-chart.html`. Worth committing the chart script to a follow-up branch if we want it as standing infrastructure.
Summary
Empirical comparison of the major perf checkpoints since the upstream merge-base, captured 2026-05-02.
Methodology
bench/run-bench.mjs(Playwright + CDP, headless Chromium with--disable-frame-rate-limit --enable-precise-memory-info)59ccf4e), PR Perf: Pixi agents layer + glyph atlas (closes #12) #31 (df3bd94), PR Perf: complete Pixi migration + sim-manager polish (12 issues, 1 PR) #36 (680cb12), PR perf: Pixi follow-ups — visibility hook + per-pixel hit-test (closes #33, closes #34) #43 (ed97e12), main / PR perf(pixi): shared renderer migrates to multiView (rebased onto main) #45 (c20d844)concurrentsim scenario, 3 sessions, workload-matched (3 canvases visible simultaneously)bench/results/perf-comparison.jsonl(10 rows)bench/perf-chart.mjsResults
Where the wins did NOT come from (and why)
The bench shows zero improvement from #43 and #45 — but that's a measurement-design issue, not a sign the work was wasted:
readPixelsGPU↔CPU sync and per-viewport RenderTexture. Wins are in GPU memory + GPU time, neither of which the bench captures. The heap delta we do see is dominated by canvas count, not RenderTexture.These optimizations may still be load-bearing; the bench just isn't designed to surface them. See "Direction" item 3 below.
Direction for future perf work
The render path is mostly tapped out for this workload. To meaningfully move 4× throttle FPS or scale beyond 3 concurrent canvases, the next gains have to come from the JS main thread.
In priority order:
1. Profile long tasks at 4× throttle (cheap, high-information)
80s of blocking time over a 90s window means almost everything is one long task. Without knowing what's in them, further optimization is guessing. Concrete steps:
2. Reduce React reconciliation cost
The bench captures ~3500-3800 React commits per 90s window with 3 canvases — that's ~13 commits/sec/canvas. Worth investigating:
3. Move sim event ingestion off the main thread
SimulationManager's shared rAF processes events for all sessions in lockstep on the main thread. Web Worker for ingestion (relay events → diff → postMessage minimal updates to React) would free the main-thread budget that's currently spent on JSON parsing + state diffing.
4. Add bench scenarios that exercise the optimizations this run missed
The current bench understates #43 and #45 because the scenario is "everything visible, all the time". Suggest adding:
Performance.getMetricsGPU process memory delta per measurement window — measures perf(pixi): shared renderer migrates to multiView (rebased onto main) #45's multiView heap savings5. Measure Pixi's residual per-frame blit
Per #45's caveat: WebGL backend's
GlRenderTargetAdaptor.postrenderstill doescanvasSource.context2D.drawImage(contextCanvas, ...)per render target, per frame. Cost is unknown but bounded. A focused micro-bench (mock scene, vary canvas count, measure render() wall time) would tell us whether it's worth chasing WebGPU.Tracking
🤖 Generated with Claude Code