Skip to content

perf(points): shrink VSOut + per-instance buffer#177

Merged
rulkens merged 8 commits into
mainfrom
perf/vsout-drop-sizepx
May 20, 2026
Merged

perf(points): shrink VSOut + per-instance buffer#177
rulkens merged 8 commits into
mainfrom
perf/vsout-drop-sizepx

Conversation

@rulkens
Copy link
Copy Markdown
Owner

@rulkens rulkens commented May 20, 2026

Summary

Two passes of points-pipeline cleanup, riding on the selection-ring extraction that just landed.

VSOut shrinks (varying bandwidth)

  • Drop `sizePx` (location 13). Fragment used it only for the procedural-disk crossfade. Moved smoothstep into `vs`, folded into `out.intensity`.
  • Drop `isFallback` (location 7). Used by realOnlyMode discard + magenta highlight. realOnly now culls at the vertex stage (same trick as Malmquist mode 1); magenta multiplier bakes into `out.tint`.
  • Pack `paCs + paSn` (locations 6, 15) into one `vec2` at location 6.

VSOut: 9 locations → 6, 64 B → 56 B. Fragment loses ~5 ALU ops per pixel.

Per-instance buffer shrink

  • Move `kPerZ` from per-row to per-survey `SourceUniforms`. The K-correction coefficient is a single constant per survey (SDSS=3.0, GLADE=1.0, 2MRS=0.0, …) baked into 2.5M vertex-buffer slots. Repurposed the existing `SourceUniforms._pad0` slot — no buffer-size or alignment churn.

PerVertex: 12 slots → 11, 48 B → 44 B per instance. ~10 MB saved on the GPU at the large tier.

Side benefit

realOnly-gated galaxies are now also non-pickable. Previously they were invisible but the pick fragment still wrote their identity — a pre-existing inconsistency.

Test plan

  • All 1638 tests pass; typecheck + build clean
  • Visually verified: realOnlyMode toggle, highlightFallback toggle, procedural-disk crossfade band, K-correction colours

🤖 Generated with Claude Code

Three independent VSOut cleanups in the same file pair:

- sizePx (location 13): drop. The fragment used it only to compute the
  procedural-disk crossfade alpha multiplier. All inputs are per-instance
  constants, so the smoothstep moves to the vertex stage and folds into
  out.intensity. Fragment loses a smoothstep + saturate.

- isFallback (location 7): drop. Used by realOnlyMode discard and by
  the magenta highlight tint. realOnlyMode now culls at the vertex stage
  (same trick as Malmquist mode 1), and the magenta multiplier bakes
  into out.tint. Fragment loses a per-pixel discard branch and a select.
  Side benefit: realOnly-gated galaxies are now also non-pickable, which
  fixes a pre-existing inconsistency where they were invisible but the
  pick fragment still wrote their identity.

- paCs + paSn (locations 6, 15): pack into one vec2<f32> paRotation at
  location 6. Same wire bytes, frees location 15.

Net: VSOut 9 locations -> 6, 64 B -> 56 B. Fragment loses ~5 ALU ops
per pixel; vertex picks up cheap per-instance work.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@cloudflare-workers-and-pages
Copy link
Copy Markdown

cloudflare-workers-and-pages Bot commented May 20, 2026

Deploying with  Cloudflare Workers  Cloudflare Workers

The latest updates on your project. Learn more about integrating Git with Workers.

Status Name Latest Commit Preview URL Updated (UTC)
✅ Deployment successful!
View logs
skymap 85ba2eb Commit Preview URL

Branch Preview URL
May 20 2026, 03:06 AM

kPerZ is the K-correction coefficient — a single linear factor per survey
(SDSS=3.0, GLADE=1.0, 2MRS=0.0, Famous=0.0, Synthetic=3.0) baked into
every row of the per-instance vertex buffer. Per-row storage paid 2.5M
copies of the same handful of constants.

Move kPerZ into SourceUniforms (the existing @group(2) per-survey uniform
that already carried sourceCode + 12 B padding). Free pad slot at
offset 4 absorbs the f32; no buffer-size or alignment churn. The vertex
shader reads source.kPerZ instead of p.kPerZ.

Verified consumer graph: kPerZ as a value is only consumed by the points
pipeline. pickColourIndex's secondary caller (proceduralDiskSubsystem)
already discards the kPerZ field — only the bake site used it, and that
write now goes away.

Sentinel-colour rows (colorIndex >= 100) previously wrote kPerZ = 0;
the shader's select gates the K-correction off via the sentinel check,
so the per-row value never mattered. After the move, all rows use the
survey constant; sentinel rows still get the 1.05 substitution.

Net:
- Vertex buffer: 12 slots -> 11, 48 B -> 44 B per instance.
  At 2.5M galaxies that's ~10 MB saved on the GPU.
- One fewer per-vertex attribute fetch.
- Slot indices for axisRatio (6 -> 5), positionAngleDeg (7 -> 6),
  diameterKpc (8 -> 7), vMaxWeight (9 -> 8), schechterRatio (10 -> 9),
  angularDensityWeight (11 -> 10).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@rulkens rulkens changed the title perf(points): shrink VSOut by folding sizePx + isFallback + paCs/paSn perf(points): shrink VSOut + per-instance buffer May 20, 2026
rulkens and others added 6 commits May 20, 2026 03:53
Two more VSOut tightenings:

- Pre-compute the elliptical-mask coefficient (safeAB) at the vertex
  stage and pack into the unused alpha channel of tint. Fragment reads
  in.tint.w directly with zero per-pixel axis-ratio work — saves a
  select + max + sign-check per pixel. The axisRatio location goes
  away entirely.
- Renumber paRotation from location 6 to location 4 so the VSOut
  locations are contiguous (0..4 with no gaps).

Net: VSOut 6 locations -> 5, 56 B -> 52 B. Fragment shader keeps shrinking.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Drop the standalone intensity varying by pre-multiplying it into the
rgb channels of the per-instance colour vec4 at the vertex stage.
Fragment reads in.shaded.rgb directly with no per-pixel mul.

Renamed tint -> shaded since the field no longer carries a 'tint'
(modifier) but a fully-lit RGB premultiplied with intensity, plus the
safeAB ellipse-mask coefficient packed into .w (unchanged by this
commit).

The invisibility cull now reads the local intensity scalar; behaviour
unchanged.

Net: VSOut 5 locations -> 4, 52 B -> 48 B.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The same galaxy was rendering different hues in the points pass vs the
procedural-disk impostor pass — two divergences:

1. Points applied K-correction in the vertex shader; procDisks didn't
   apply it at all. At non-trivial redshift, hues drifted apart.

2. The unknown-band fallback was 1.05 in the points shader and 1.0 in
   the procDisk subsystem. Different ramp positions = different hue.

Move K-correction into pickColourIndex() so both consumers get the same
rest-frame value with the shared UNKNOWN_COLOUR_RAMP_POSITION fallback.
The shader drops its K-correction block (HUBBLE_DISTANCE_MPC + zRedshift
+ sentinel check + select) entirely.

Function signature collapses from { colourIndex, kPerZ } | null to
number. Neither caller distinguished null from "got data" — they both
substituted the same fallback — so the nullable was paying for an
option nobody exercised. Both call sites now read identically:

  const colourIndex = pickColourIndex(source, magU..magZ, dMpc);

Side effects:
- SourceUniforms.kPerZ slot reverts to padding (no longer read by GPU).
- pointRenderer.ts drops the per-survey kPerZ write.
- NO_COLOUR_SENTINEL constant goes away (1.05 is baked directly).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The 4× thumbnail-footprint padding + 30-kpc synthetic-fallback floor +
kpc->Mpc unit conversion was inlined at three sites:

- buildPointInterleavedBuffer (points bake, was kpc — shader converted)
- proceduralDiskSubsystem (full-extent in Mpc)
- texturedImpostorSubsystem (full-extent in Mpc)

A change to any of those constants had to land in all three in lockstep.
Centralise into src/utils/galaxySize.ts as paddedRadiusMpc(diameterKpc).
The two subsystems multiply by 2 at the call site for their full-quad-
extent convention (vertex shader halves at corner expansion); the points
bake uses the helper output directly as half-extent.

While in the neighbourhood, switch the points pipeline to Mpc units to
match every other shader:

- Vertex buffer slot 7 was raw diameterKpc; shader applied
  '* 2 / 1000' to convert. Now pre-baked as padded radius in Mpc.
- PerVertex field renamed diameterKpc -> radiusMpc.
- Shader drops the safeDiameterKpc select + GALAXY_RADIUS_MPC compute
  and reads p.radiusMpc directly.

Raw cloud.diameterKpc (the catalog's source-of-truth in kpc) is
unchanged — only the GPU interleaved buffer's slot semantics shifted.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The textured-galaxy-thumbnail pipeline had inconsistent names along its
chain: shaders/disks/ (folder), texturedDiskRenderer.ts (GPU consumer),
texturedImpostorSubsystem.ts (engine driver). 'Impostor' is legitimate
graphics jargon for a billboard-as-3D-approximation, but the three-way
naming mismatch obscured the relationship between the layers.

Rename:
- shaders/disks/ -> shaders/texturedDisks/
  Parallels the existing shaders/proceduralDisks/ sibling.
- texturedImpostorSubsystem -> texturedDiskSubsystem
  Aligns with texturedDiskRenderer.ts and texturedDisks/ shaders.

All identifiers (PascalCase + camelCase + plural field name) renamed
across 32 files. WESL imports updated. Stale 'disks.wesl' cross-
references in lib/* shader comments cleaned up.

No behaviour change.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sweep of the leftovers the sed didn't catch (compound terms like
'textured-impostor' separator-style, plus historical narrative
comments referring to the pre-rename layout). Generic uses of
'impostor' as a graphics term (e.g. proceduralDisks docblock
describing what texturedDisks IS) are left intact — those are
legitimate jargon, not subsystem references.

While in the neighbourhood, trim a handful of historical comments
('post-split', 'Task 11/12', 'legacy textured-impostors slot',
'2026-05-18 quad-removal') per the project's comment-style
convention against history notes in code.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@rulkens rulkens merged commit b2feafb into main May 20, 2026
2 checks passed
@rulkens rulkens deleted the perf/vsout-drop-sizepx branch May 20, 2026 03:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant