Skip to content

perf(rendering): VSOut shrink, invisibility cull, volume step cut#170

Merged
rulkens merged 1 commit into
mainfrom
perf-points-and-volume
May 19, 2026
Merged

perf(rendering): VSOut shrink, invisibility cull, volume step cut#170
rulkens merged 1 commit into
mainfrom
perf-points-and-volume

Conversation

@rulkens
Copy link
Copy Markdown
Owner

@rulkens rulkens commented May 19, 2026

Summary

Three focused perf changes targeting the large-tier point-sprite cost plus a one-line volume optimization. Measured on M-series, large tier, fully zoomed out:

Before After
point-sprites 24 ms 17.5 ms
scalar-volume 8 ms 3.5 ms
Total 32 ms 21 ms

~11 ms / ~35% saved.

Points pipeline

  • Drop unused dMpc and camDistMpc VSOut fields (both literally unused by either fragment per their own docstrings).
  • Mark tint, intensity, axisRatio as @interpolate(flat). All three are per-instance constants — smooth interpolation was wasted rasterizer work for an identical result at every fragment.
  • Fold depthFade × schechterRatio × angularDensityWeight into intensity at the vertex stage. Removes 3 varyings and 3 per-pixel select+multiply chains; mathematically identical because all four factors are per-instance constants.
  • Cull invisible billboards (folded intensity < 0.005) via degenerate clip-space output, same pattern as the existing Malmquist gate. The selected galaxy bypasses the cull so the selection halo never vanishes on a faint pick. Pick fragment shares the vertex stage, so culled galaxies become non-pickable — acceptable since they're visually undetectable.

Volume raymarch

  • STEP_COUNT 192 → 128. Texture-cache locality at higher step counts means the per-step cost curve flattens above this knee, so dropping 1/3 of the steps recovers ~half the cost. 128 is the sweet spot before banding becomes perceptible in sparse fields.

GPU adapter

  • Request powerPreference: 'high-performance' so multi-GPU systems (older MacBook Pros with dGPU, desktops with iGPU + dGPU) pick the discrete card. No-op on single-GPU machines including Apple Silicon — shipping for cross-device benefit.

DebugPanel

  • Add scalar-volume to DISPLAY_SLOT_ORDER. The slot was instrumented and timed correctly but missing from the panel's display list because it isn't part of HDR_PASSES (it runs in encodeVolumes before the HDR loop).
  • Use the 60-frame rolling average for the header total instead of the single-frame sum. Makes the readout internally consistent with the per-row values and stops the header from jumping frame-to-frame at sub-ms scale.

Test plan

  • Reload, large tier, fully zoomed out — verify point-sprites lands around 17–18 ms (was ~24 ms).
  • Toggle Rhizome (cosmic web) on — verify scalar-volume lands around 3–4 ms (was ~8 ms) and the volume looks visually unchanged (no banding in sparse filaments).
  • Click on a faint distant galaxy — verify selection ring still appears even if the galaxy would otherwise be culled by the invisibility threshold.
  • Toggle through bias modes (None / VolumeLimited / VMax / Schechter / AngularReweight) — verify each renders identically to before (math has been hoisted into vertex stage but is unchanged).
  • Toggle Real-only mode and Highlight fallback — verify both still behave correctly (they read isFallback which is unchanged).
  • Toggle Depth fade off — verify distant galaxies stop fading (depthFade gate moved into vertex but logic preserved).
  • DebugPanel: verify scalar-volume row now appears, header total reads as 60f average, all rows still update at 60 Hz.

At large tier (M-series, fully zoomed out): 24ms point-sprites + 8ms
volume → 17.5ms point-sprites + 3.5ms volume. ~11ms total saved.

Points pipeline
- Drop unused dMpc and camDistMpc VSOut fields.
- Mark tint, intensity, axisRatio as @interpolate(flat). They're
  per-instance constants — smooth interpolation was wasted rasterizer
  work.
- Fold depthFade × schechterRatio × angularDensityWeight into intensity
  at the vertex stage. Removes three varyings and three per-pixel
  multiplies; mathematically identical because all four factors are
  per-instance constants.
- Cull invisible billboards (folded intensity < 0.005) via degenerate
  clip-space output, same pattern as the existing Malmquist gate. The
  selected galaxy bypasses the cull so the selection halo never
  vanishes on a faint pick. Pick fragment shares the vertex stage,
  so culled galaxies become non-pickable — acceptable since they're
  visually undetectable.

Volume raymarch
- STEP_COUNT 192 → 128. Texture-cache locality at higher counts gave
  sub-linear scaling, so the per-step cost curve flattens above this
  knee; 128 is the sweet spot before banding becomes perceptible in
  sparse fields.

GPU adapter
- Request powerPreference: 'high-performance' so multi-GPU systems
  pick the discrete card. No-op on single-GPU machines.

DebugPanel
- Add scalar-volume to DISPLAY_SLOT_ORDER (was instrumented but
  missing from the panel's display list).
- Use the 60-frame rolling average for the header total instead of
  the single-frame sum, so the readout is internally consistent with
  the per-row values and stops jumping frame-to-frame.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@cloudflare-workers-and-pages
Copy link
Copy Markdown

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
skymap 2846aca Commit Preview URL

Branch Preview URL
May 19 2026, 11:31 PM

@rulkens rulkens merged commit cc421a7 into main May 19, 2026
2 checks passed
@rulkens rulkens deleted the perf-points-and-volume branch May 19, 2026 23:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant