perf(rendering): VSOut shrink, invisibility cull, volume step cut#170
Merged
Conversation
At large tier (M-series, fully zoomed out): 24ms point-sprites + 8ms volume → 17.5ms point-sprites + 3.5ms volume. ~11ms total saved. Points pipeline - Drop unused dMpc and camDistMpc VSOut fields. - Mark tint, intensity, axisRatio as @interpolate(flat). They're per-instance constants — smooth interpolation was wasted rasterizer work. - Fold depthFade × schechterRatio × angularDensityWeight into intensity at the vertex stage. Removes three varyings and three per-pixel multiplies; mathematically identical because all four factors are per-instance constants. - Cull invisible billboards (folded intensity < 0.005) via degenerate clip-space output, same pattern as the existing Malmquist gate. The selected galaxy bypasses the cull so the selection halo never vanishes on a faint pick. Pick fragment shares the vertex stage, so culled galaxies become non-pickable — acceptable since they're visually undetectable. Volume raymarch - STEP_COUNT 192 → 128. Texture-cache locality at higher counts gave sub-linear scaling, so the per-step cost curve flattens above this knee; 128 is the sweet spot before banding becomes perceptible in sparse fields. GPU adapter - Request powerPreference: 'high-performance' so multi-GPU systems pick the discrete card. No-op on single-GPU machines. DebugPanel - Add scalar-volume to DISPLAY_SLOT_ORDER (was instrumented but missing from the panel's display list). - Use the 60-frame rolling average for the header total instead of the single-frame sum, so the readout is internally consistent with the per-row values and stops jumping frame-to-frame. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Deploying with
|
| Status | Name | Latest Commit | Preview URL | Updated (UTC) |
|---|---|---|---|---|
| ✅ Deployment successful! View logs |
skymap | 2846aca | Commit Preview URL Branch Preview URL |
May 19 2026, 11:31 PM |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three focused perf changes targeting the large-tier point-sprite cost plus a one-line volume optimization. Measured on M-series, large tier, fully zoomed out:
~11 ms / ~35% saved.
Points pipeline
dMpcandcamDistMpcVSOut fields (both literally unused by either fragment per their own docstrings).tint,intensity,axisRatioas@interpolate(flat). All three are per-instance constants — smooth interpolation was wasted rasterizer work for an identical result at every fragment.depthFade × schechterRatio × angularDensityWeightintointensityat the vertex stage. Removes 3 varyings and 3 per-pixelselect+multiplychains; mathematically identical because all four factors are per-instance constants.Volume raymarch
STEP_COUNT 192 → 128. Texture-cache locality at higher step counts means the per-step cost curve flattens above this knee, so dropping 1/3 of the steps recovers ~half the cost. 128 is the sweet spot before banding becomes perceptible in sparse fields.GPU adapter
powerPreference: 'high-performance'so multi-GPU systems (older MacBook Pros with dGPU, desktops with iGPU + dGPU) pick the discrete card. No-op on single-GPU machines including Apple Silicon — shipping for cross-device benefit.DebugPanel
scalar-volumetoDISPLAY_SLOT_ORDER. The slot was instrumented and timed correctly but missing from the panel's display list because it isn't part ofHDR_PASSES(it runs inencodeVolumesbefore the HDR loop).Test plan
point-spriteslands around 17–18 ms (was ~24 ms).scalar-volumelands around 3–4 ms (was ~8 ms) and the volume looks visually unchanged (no banding in sparse filaments).Real-only modeandHighlight fallback— verify both still behave correctly (they readisFallbackwhich is unchanged).Depth fadeoff — verify distant galaxies stop fading (depthFade gate moved into vertex but logic preserved).scalar-volumerow now appears, header total reads as 60f average, all rows still update at 60 Hz.