Skip to content

renderer/directx12: chunk image uploads into 8 MiB row-bands#386

Merged
deblasis merged 6 commits into
windowsfrom
kitten-into-wintty/dx12-chunked-uploads
May 20, 2026
Merged

renderer/directx12: chunk image uploads into 8 MiB row-bands#386
deblasis merged 6 commits into
windowsfrom
kitten-into-wintty/dx12-chunked-uploads

Conversation

@deblasis
Copy link
Copy Markdown
Owner

Summary

  • Split Texture.uploadRegion into ~8 MiB row-bands so each CopyTextureRegion allocates a smaller UPLOAD-heap staging buffer instead of one exact-fit allocation for the whole image.
  • pending_staging becomes a std.ArrayListUnmanaged(*ID3D12Resource) backed by std.heap.c_allocator; all bands released together on the next replaceRegion or in deinit. The deinit value-receiver signature stays compatible with existing call sites (Image switch captures, front_texture/back_texture field deinits).
  • Adds rowsPerBand + Bands pure helpers with 9 unit tests (including the exact-fit boundary case).
  • Adds a 5-line comment near prepImage (image.zig:566) referencing the 65 MiB multipart cap.
  • Documents the partial-write semantics on replaceRegion: if a later band's staging/Map fails, earlier bands' CopyTextureRegion calls have already been recorded and will execute. Texture.init's errdefer Releases the destination so partial writes are harmless; replaceRegion's destination survives, so a multi-band atlas update could leave a mix of new and previous rows until the next call (acceptable for atlases).

Followups #2 + #5 from project_dx12_image_upload_audit.md. Followup #3 (per-frame upload-budget guard) layers on this and is the natural next PR.

Originally opened as #385 (stacked on #384); reopened against windows after #384 squash-merged and auto-closed #385.

ghostty-reviewer feedback folded in (last commit)

  • Trimmed deinit's value-receiver comment to lead with the cross-backend signature reason.
  • Documented partial-write semantics on replaceRegion.
  • Added a comment near the band loop noting the log-level asymmetry (null guards = warn; createStagingBuffer/Map failures = err).
  • Flagged the y + band.start_row arithmetic at CopyTextureRegion as the iterator→DX12 integration point not covered by pure unit tests.
  • Added the rowsPerBand exact-fit boundary test (row == budget).

Test plan

  • zig build test passes on Windows
  • All rowsPerBand + Bands tests pass (9 total)
  • Manual smoke: large kitty/iTerm2 image transmit via existing probes, confirm rendering unchanged

🤖 Generated with Claude Code

deblasis added 6 commits May 20, 2026 07:45
Pure helper that returns how many source rows fit in a single upload
band, with a min-1 floor so a single very-wide row still makes progress.
Used by the upcoming chunked uploadRegion to size each staging buffer
at ~8 MiB instead of one buffer for the full image.

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>
Yields (start_row, row_count) tuples for splitting an upload region into
row-bands. Last band is truncated. Pure data, unit-tested, used by the
upcoming uploadRegion refactor.

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>
Replace the single ?*ID3D12Resource with ArrayListUnmanaged backed by
c_allocator so a chunked upload can keep all of its band staging buffers
alive until the GPU finishes the CopyTextureRegion calls. deinit, the
replaceRegion prelude, and the existing single-buffer uploadRegion line
all migrate to the list-based shape; behavior is unchanged for now
because uploadRegion still does a single append per call. The band-loop
lands in the next commit.

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>
Large image uploads previously allocated a single exact-fit UPLOAD-heap
staging buffer for the whole region (e.g. ~48 MiB for a 4096x3072 RGBA
image after the iTerm2 multipart cap was raised to 65 MiB). Split into
~8 MiB row-bands so each CopyTextureRegion uses a smaller staging
allocation, reducing per-call heap pressure under fragmentation.

Bands are computed via the rowsPerBand+Bands helpers. y + start_row
shifts each band's destination. All bands' staging buffers stay alive
until the next replaceRegion or deinit -- the frame fence still gates
release in beginFrame. No behavior change for sub-band uploads (atlas
updates, render targets, small images); they still allocate a single
band-sized staging buffer.

Followup to #382/#383. Documented in project_dx12_image_upload_audit.md.

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>
Tie the alloc.dupe to the kitty graphics / iTerm2 multipart 65 MiB
payload cap and point at the DX12 row-band chunking that handles the
staging-heap pressure. Closes audit followup #5 from
project_dx12_image_upload_audit.md.

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>
Addresses ghostty-reviewer feedback on #385 (will be squash-merged):

* Trim deinit's value-receiver comment to lead with the cross-backend
  signature reason; drop the lecture on Zig value-parameter mechanics.
* Document the partial-write semantics on replaceRegion: if a later
  band's staging/Map fails, earlier bands' CopyTextureRegion calls have
  already been recorded and will execute. Texture.init's errdefer
  Releases the destination so the partial writes are harmless;
  replaceRegion's destination survives, so a multi-band atlas update
  could leave a mix of new+previous rows until the next call.
* Note the log-level asymmetry near the band loop: null guards stay at
  warn (defensive checks under correct callers), createStagingBuffer
  and Map failures stay at err (real GPU/driver failures with context).
* Flag the y + band.start_row arithmetic at the CopyTextureRegion call
  as the integration point between the pure Bands iterator and DX12
  that no unit test exercises -- a bug there shows up as vertically
  shifted/overlapping bands in a multi-band image.
* Add an exact-fit boundary test for rowsPerBand (row == budget).

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>
@deblasis deblasis merged commit 1cc11cf into windows May 20, 2026
99 checks passed
@deblasis deblasis deleted the kitten-into-wintty/dx12-chunked-uploads branch May 20, 2026 06:20
deblasis added a commit that referenced this pull request May 20, 2026
* renderer/directx12: track running pending_staging_bytes

Mirror the pending_staging list contents with a u64 running sum so
callers can cheaply query a texture's current UPLOAD-heap pressure
without COM-querying each ID3D12Resource. Wires the increment in
uploadRegion's band loop (per band_size) and the reset in
replaceRegion's release loop.

No behavior change yet; the budget gate that consumes this lands
in the next commit.

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>

* renderer/image: add imageStagingBytes helper

Pure helper that computes the UPLOAD-heap staging-buffer size for an
RGBA image of given dimensions. Mirrors Texture.uploadRegion's row-
pitch math (post-swizzle bpp=4, 256-byte alignment) so State.upload
can size-check images before delegating to the DX12 layer.

Also introduces the budget constant TEXTURE_UPLOAD_BUDGET_DEFAULT_BYTES
that the upcoming State.upload guard will use as its default ceiling.

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>

* renderer/image: add wouldExceedBudget boundary check

Trivial pure helper extracted for unit-testing the budget-gate
boundary semantics: strict greater-than so an exact-fit (in_flight +
est == budget) is allowed. Used by the upcoming State.upload guard.

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>

* renderer/image: add Image.pendingStagingBytes + estimate

Two Image union methods that surface the staging-heap accounting State
will need to make budget decisions:

* pendingStagingBytes reads texture.pending_staging_bytes for variants
  that hold a texture; returns 0 for .pending/.unload_pending.
* estimatedUploadStagingBytes uses imageStagingBytes(w, h) for variants
  that will trigger a new upload; returns 0 for already-uploaded or
  unload-marked variants.

State.upload will sum the former across all images to get current
in-flight bytes, and add the latter to budget-check pending uploads.

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>

* renderer/image: gate State.upload with per-frame budget

Cap peak DX12 UPLOAD-heap staging pressure regardless of how many
kitty/iTerm2 images arrive at once. State.upload now computes the
current in-flight staging bytes across all already-uploaded textures
at the start of the call, then walks the image map gating each
pending image against the budget:

* Image whose estimatedUploadStagingBytes would push the running
  in-flight total past upload_budget_bytes is skipped (logged at
  debug); it stays .pending and retries on the next frame.
* Successful upload increments the running total so subsequent
  images in the same frame see the updated pressure.
* Failed upload (markForUnload path) is unchanged in semantics; only
  added a continue to skip the increment.
* Unloaded image decrements the running total via saturating sub.

The deferral path is NOT counted as failure -- the bool return still
only tracks markForUnload failures. The caller (generic.zig:1667)
discards the value either way.

Closes audit followup #3 (per-frame upload-budget guard) from
project_dx12_image_upload_audit.md, layering on PR #386's chunked
upload infrastructure.

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>

* renderer: fold review feedback into upload-budget guard

Addresses ghostty-reviewer findings on #388 before squash-merge:

* Critical: image.zig is renderer-generic. The previous version
  unconditionally read Texture.pending_staging_bytes (DX12-only),
  breaking Metal/OpenGL builds at compile time. Now both Image
  staging helpers and State.upload comptime-gate the budget logic
  on @Hasfield(Texture, "pending_staging_bytes"); non-DX12 backends
  see the original unguarded upload loop with zero overhead.
* Verified by `zig build test -Drenderer=opengl` (passes) in
  addition to the default DX12 build.

* imageStagingBytes() now does u64 arithmetic throughout. The u32
  intermediate could in theory wrap on absurd widths (width >= 2^30);
  not reachable today but cheap to harden. Doc-comment clarified that
  the bpp=4 assumption is the DX12 post-swizzle destination.

* Cross-module drift assert: directx12/Texture.zig
  TEXTURE_DATA_PITCH_ALIGNMENT promoted to pub, and image.zig adds a
  comptime block that asserts the two constants match on the DX12
  build (gracefully skipped via @hasDecl on other renderers).

* State.upload restructured to single-pass-per-concern: pass 1
  sweeps unloads + accumulates in-flight from survivors; pass 2
  uploads pending images with optional budget gating. Eliminates the
  saturating-subtract on unload (reviewer flagged: it could mask a
  real bookkeeping bug) because unloaded images are gone before the
  in-flight total is finalized. Dropped the `est > 0` short-circuit
  because wouldExceedBudget(0, _, _) handles it identically. The
  pre-pass comment trimmed.

* DX12-only tests gated on @Hasfield via the if/else-pattern so
  Metal/OpenGL builds analyze only the SkipZigTest branch and never
  see the pending_staging_bytes field access or the `Texture{}`
  default-init (OpenGL's Texture has no field defaults).

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>
deblasis added a commit that referenced this pull request May 21, 2026
* renderer/directx12: add rowsPerBand helper for chunked uploads

Pure helper that returns how many source rows fit in a single upload
band, with a min-1 floor so a single very-wide row still makes progress.
Used by the upcoming chunked uploadRegion to size each staging buffer
at ~8 MiB instead of one buffer for the full image.

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>

* renderer/directx12: add Bands iterator for chunked uploads

Yields (start_row, row_count) tuples for splitting an upload region into
row-bands. Last band is truncated. Pure data, unit-tested, used by the
upcoming uploadRegion refactor.

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>

* renderer/directx12: pending_staging holds a list of buffers

Replace the single ?*ID3D12Resource with ArrayListUnmanaged backed by
c_allocator so a chunked upload can keep all of its band staging buffers
alive until the GPU finishes the CopyTextureRegion calls. deinit, the
replaceRegion prelude, and the existing single-buffer uploadRegion line
all migrate to the list-based shape; behavior is unchanged for now
because uploadRegion still does a single append per call. The band-loop
lands in the next commit.

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>

* renderer/directx12: chunk uploadRegion into 8 MiB row-bands

Large image uploads previously allocated a single exact-fit UPLOAD-heap
staging buffer for the whole region (e.g. ~48 MiB for a 4096x3072 RGBA
image after the iTerm2 multipart cap was raised to 65 MiB). Split into
~8 MiB row-bands so each CopyTextureRegion uses a smaller staging
allocation, reducing per-call heap pressure under fragmentation.

Bands are computed via the rowsPerBand+Bands helpers. y + start_row
shifts each band's destination. All bands' staging buffers stay alive
until the next replaceRegion or deinit -- the frame fence still gates
release in beginFrame. No behavior change for sub-band uploads (atlas
updates, render targets, small images); they still allocate a single
band-sized staging buffer.

Followup to #382/#383. Documented in project_dx12_image_upload_audit.md.

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>

* renderer/image: note the image-size ceiling at prepImage

Tie the alloc.dupe to the kitty graphics / iTerm2 multipart 65 MiB
payload cap and point at the DX12 row-band chunking that handles the
staging-heap pressure. Closes audit followup #5 from
project_dx12_image_upload_audit.md.

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>

* renderer/directx12: fold review feedback into chunked uploads

Addresses ghostty-reviewer feedback on #385 (will be squash-merged):

* Trim deinit's value-receiver comment to lead with the cross-backend
  signature reason; drop the lecture on Zig value-parameter mechanics.
* Document the partial-write semantics on replaceRegion: if a later
  band's staging/Map fails, earlier bands' CopyTextureRegion calls have
  already been recorded and will execute. Texture.init's errdefer
  Releases the destination so the partial writes are harmless;
  replaceRegion's destination survives, so a multi-band atlas update
  could leave a mix of new+previous rows until the next call.
* Note the log-level asymmetry near the band loop: null guards stay at
  warn (defensive checks under correct callers), createStagingBuffer
  and Map failures stay at err (real GPU/driver failures with context).
* Flag the y + band.start_row arithmetic at the CopyTextureRegion call
  as the integration point between the pure Bands iterator and DX12
  that no unit test exercises -- a bug there shows up as vertically
  shifted/overlapping bands in a multi-band image.
* Add an exact-fit boundary test for rowsPerBand (row == budget).

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>
deblasis added a commit that referenced this pull request May 21, 2026
* renderer/directx12: track running pending_staging_bytes

Mirror the pending_staging list contents with a u64 running sum so
callers can cheaply query a texture's current UPLOAD-heap pressure
without COM-querying each ID3D12Resource. Wires the increment in
uploadRegion's band loop (per band_size) and the reset in
replaceRegion's release loop.

No behavior change yet; the budget gate that consumes this lands
in the next commit.

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>

* renderer/image: add imageStagingBytes helper

Pure helper that computes the UPLOAD-heap staging-buffer size for an
RGBA image of given dimensions. Mirrors Texture.uploadRegion's row-
pitch math (post-swizzle bpp=4, 256-byte alignment) so State.upload
can size-check images before delegating to the DX12 layer.

Also introduces the budget constant TEXTURE_UPLOAD_BUDGET_DEFAULT_BYTES
that the upcoming State.upload guard will use as its default ceiling.

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>

* renderer/image: add wouldExceedBudget boundary check

Trivial pure helper extracted for unit-testing the budget-gate
boundary semantics: strict greater-than so an exact-fit (in_flight +
est == budget) is allowed. Used by the upcoming State.upload guard.

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>

* renderer/image: add Image.pendingStagingBytes + estimate

Two Image union methods that surface the staging-heap accounting State
will need to make budget decisions:

* pendingStagingBytes reads texture.pending_staging_bytes for variants
  that hold a texture; returns 0 for .pending/.unload_pending.
* estimatedUploadStagingBytes uses imageStagingBytes(w, h) for variants
  that will trigger a new upload; returns 0 for already-uploaded or
  unload-marked variants.

State.upload will sum the former across all images to get current
in-flight bytes, and add the latter to budget-check pending uploads.

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>

* renderer/image: gate State.upload with per-frame budget

Cap peak DX12 UPLOAD-heap staging pressure regardless of how many
kitty/iTerm2 images arrive at once. State.upload now computes the
current in-flight staging bytes across all already-uploaded textures
at the start of the call, then walks the image map gating each
pending image against the budget:

* Image whose estimatedUploadStagingBytes would push the running
  in-flight total past upload_budget_bytes is skipped (logged at
  debug); it stays .pending and retries on the next frame.
* Successful upload increments the running total so subsequent
  images in the same frame see the updated pressure.
* Failed upload (markForUnload path) is unchanged in semantics; only
  added a continue to skip the increment.
* Unloaded image decrements the running total via saturating sub.

The deferral path is NOT counted as failure -- the bool return still
only tracks markForUnload failures. The caller (generic.zig:1667)
discards the value either way.

Closes audit followup #3 (per-frame upload-budget guard) from
project_dx12_image_upload_audit.md, layering on PR #386's chunked
upload infrastructure.

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>

* renderer: fold review feedback into upload-budget guard

Addresses ghostty-reviewer findings on #388 before squash-merge:

* Critical: image.zig is renderer-generic. The previous version
  unconditionally read Texture.pending_staging_bytes (DX12-only),
  breaking Metal/OpenGL builds at compile time. Now both Image
  staging helpers and State.upload comptime-gate the budget logic
  on @Hasfield(Texture, "pending_staging_bytes"); non-DX12 backends
  see the original unguarded upload loop with zero overhead.
* Verified by `zig build test -Drenderer=opengl` (passes) in
  addition to the default DX12 build.

* imageStagingBytes() now does u64 arithmetic throughout. The u32
  intermediate could in theory wrap on absurd widths (width >= 2^30);
  not reachable today but cheap to harden. Doc-comment clarified that
  the bpp=4 assumption is the DX12 post-swizzle destination.

* Cross-module drift assert: directx12/Texture.zig
  TEXTURE_DATA_PITCH_ALIGNMENT promoted to pub, and image.zig adds a
  comptime block that asserts the two constants match on the DX12
  build (gracefully skipped via @hasDecl on other renderers).

* State.upload restructured to single-pass-per-concern: pass 1
  sweeps unloads + accumulates in-flight from survivors; pass 2
  uploads pending images with optional budget gating. Eliminates the
  saturating-subtract on unload (reviewer flagged: it could mask a
  real bookkeeping bug) because unloaded images are gone before the
  in-flight total is finalized. Dropped the `est > 0` short-circuit
  because wouldExceedBudget(0, _, _) handles it identically. The
  pre-pass comment trimmed.

* DX12-only tests gated on @Hasfield via the if/else-pattern so
  Metal/OpenGL builds analyze only the SkipZigTest branch and never
  see the pending_staging_bytes field access or the `Texture{}`
  default-init (OpenGL's Texture has no field defaults).

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant