Skip to content

renderer/directx12: chunk image uploads into 8 MiB row-bands#385

Closed
deblasis wants to merge 6 commits into
kitten-into-wintty/dx12-sentinel-alignfrom
kitten-into-wintty/dx12-chunked-uploads
Closed

renderer/directx12: chunk image uploads into 8 MiB row-bands#385
deblasis wants to merge 6 commits into
kitten-into-wintty/dx12-sentinel-alignfrom
kitten-into-wintty/dx12-chunked-uploads

Conversation

@deblasis
Copy link
Copy Markdown
Owner

Summary

  • Split Texture.uploadRegion into ~8 MiB row-bands so each CopyTextureRegion allocates a smaller UPLOAD-heap staging buffer instead of one exact-fit allocation for the whole image.
  • pending_staging becomes a std.ArrayListUnmanaged(*ID3D12Resource) backed by std.heap.c_allocator; all bands released together on the next replaceRegion or in deinit. The deinit value-receiver signature stays compatible with existing call sites (Image switch captures, front_texture/back_texture field deinits).
  • Adds rowsPerBand + Bands pure helpers with 8 unit tests.
  • Adds a one-line comment near prepImage (image.zig:566) referencing the 65 MiB multipart cap.

Followups #2 + #5 from project_dx12_image_upload_audit.md. Followup #3 (per-frame upload-budget guard) layers on this and is the natural next PR.

Stacked on #384 — that PR fixes the pre-existing pointer-alignment + log.err-in-tests issues in #382's null-guard tests so the Texture test suite can actually compile and run on Windows. Merge order: #384 first, then this.

Test plan

  • zig build test passes on Windows
  • zig build (the libghostty binary) fails with src\lib\union.zig:90:27: error: extern unions cannot contain fields of type 'terminal.osc.Command.Iterm2ImageTransmit'. Reproduced clean on origin/windows with no local changes — this is a pre-existing extern-union compatibility issue introduced by terminal/osc: honor iTerm2 OSC 1337 File= geometry hints #377/terminal/osc: support iTerm2 OSC 1337 multipart File= transfers #378's iTerm2 transmit struct that surfaces only when zig build (not zig build test) reaches the combine_archives step. Tracked separately; not introduced or worsened by this PR.
  • Manual smoke: large kitty/iTerm2 image transmit via the existing probes, confirm rendering unchanged.

🤖 Generated with Claude Code

@deblasis deblasis force-pushed the kitten-into-wintty/dx12-sentinel-align branch from 911060e to 7bce6b3 Compare May 20, 2026 05:28
deblasis added 6 commits May 20, 2026 07:28
Pure helper that returns how many source rows fit in a single upload
band, with a min-1 floor so a single very-wide row still makes progress.
Used by the upcoming chunked uploadRegion to size each staging buffer
at ~8 MiB instead of one buffer for the full image.

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>
Yields (start_row, row_count) tuples for splitting an upload region into
row-bands. Last band is truncated. Pure data, unit-tested, used by the
upcoming uploadRegion refactor.

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>
Replace the single ?*ID3D12Resource with ArrayListUnmanaged backed by
c_allocator so a chunked upload can keep all of its band staging buffers
alive until the GPU finishes the CopyTextureRegion calls. deinit, the
replaceRegion prelude, and the existing single-buffer uploadRegion line
all migrate to the list-based shape; behavior is unchanged for now
because uploadRegion still does a single append per call. The band-loop
lands in the next commit.

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>
Large image uploads previously allocated a single exact-fit UPLOAD-heap
staging buffer for the whole region (e.g. ~48 MiB for a 4096x3072 RGBA
image after the iTerm2 multipart cap was raised to 65 MiB). Split into
~8 MiB row-bands so each CopyTextureRegion uses a smaller staging
allocation, reducing per-call heap pressure under fragmentation.

Bands are computed via the rowsPerBand+Bands helpers. y + start_row
shifts each band's destination. All bands' staging buffers stay alive
until the next replaceRegion or deinit -- the frame fence still gates
release in beginFrame. No behavior change for sub-band uploads (atlas
updates, render targets, small images); they still allocate a single
band-sized staging buffer.

Followup to #382/#383. Documented in project_dx12_image_upload_audit.md.

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>
Tie the alloc.dupe to the kitty graphics / iTerm2 multipart 65 MiB
payload cap and point at the DX12 row-band chunking that handles the
staging-heap pressure. Closes audit followup #5 from
project_dx12_image_upload_audit.md.

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>
Addresses ghostty-reviewer feedback on #385 (will be squash-merged):

* Trim deinit's value-receiver comment to lead with the cross-backend
  signature reason; drop the lecture on Zig value-parameter mechanics.
* Document the partial-write semantics on replaceRegion: if a later
  band's staging/Map fails, earlier bands' CopyTextureRegion calls have
  already been recorded and will execute. Texture.init's errdefer
  Releases the destination so the partial writes are harmless;
  replaceRegion's destination survives, so a multi-band atlas update
  could leave a mix of new+previous rows until the next call.
* Note the log-level asymmetry near the band loop: null guards stay at
  warn (defensive checks under correct callers), createStagingBuffer
  and Map failures stay at err (real GPU/driver failures with context).
* Flag the y + band.start_row arithmetic at the CopyTextureRegion call
  as the integration point between the pure Bands iterator and DX12
  that no unit test exercises -- a bug there shows up as vertically
  shifted/overlapping bands in a multi-band image.
* Add an exact-fit boundary test for rowsPerBand (row == budget).

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>
@deblasis deblasis force-pushed the kitten-into-wintty/dx12-chunked-uploads branch from 2fb0330 to d4ec33c Compare May 20, 2026 05:44
@deblasis deblasis deleted the branch kitten-into-wintty/dx12-sentinel-align May 20, 2026 05:44
@deblasis deblasis closed this May 20, 2026
deblasis added a commit that referenced this pull request May 20, 2026
Addresses ghostty-reviewer feedback on #385 (will be squash-merged):

* Trim deinit's value-receiver comment to lead with the cross-backend
  signature reason; drop the lecture on Zig value-parameter mechanics.
* Document the partial-write semantics on replaceRegion: if a later
  band's staging/Map fails, earlier bands' CopyTextureRegion calls have
  already been recorded and will execute. Texture.init's errdefer
  Releases the destination so the partial writes are harmless;
  replaceRegion's destination survives, so a multi-band atlas update
  could leave a mix of new+previous rows until the next call.
* Note the log-level asymmetry near the band loop: null guards stay at
  warn (defensive checks under correct callers), createStagingBuffer
  and Map failures stay at err (real GPU/driver failures with context).
* Flag the y + band.start_row arithmetic at the CopyTextureRegion call
  as the integration point between the pure Bands iterator and DX12
  that no unit test exercises -- a bug there shows up as vertically
  shifted/overlapping bands in a multi-band image.
* Add an exact-fit boundary test for rowsPerBand (row == budget).

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>
deblasis added a commit that referenced this pull request May 20, 2026
* renderer/directx12: add rowsPerBand helper for chunked uploads

Pure helper that returns how many source rows fit in a single upload
band, with a min-1 floor so a single very-wide row still makes progress.
Used by the upcoming chunked uploadRegion to size each staging buffer
at ~8 MiB instead of one buffer for the full image.

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>

* renderer/directx12: add Bands iterator for chunked uploads

Yields (start_row, row_count) tuples for splitting an upload region into
row-bands. Last band is truncated. Pure data, unit-tested, used by the
upcoming uploadRegion refactor.

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>

* renderer/directx12: pending_staging holds a list of buffers

Replace the single ?*ID3D12Resource with ArrayListUnmanaged backed by
c_allocator so a chunked upload can keep all of its band staging buffers
alive until the GPU finishes the CopyTextureRegion calls. deinit, the
replaceRegion prelude, and the existing single-buffer uploadRegion line
all migrate to the list-based shape; behavior is unchanged for now
because uploadRegion still does a single append per call. The band-loop
lands in the next commit.

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>

* renderer/directx12: chunk uploadRegion into 8 MiB row-bands

Large image uploads previously allocated a single exact-fit UPLOAD-heap
staging buffer for the whole region (e.g. ~48 MiB for a 4096x3072 RGBA
image after the iTerm2 multipart cap was raised to 65 MiB). Split into
~8 MiB row-bands so each CopyTextureRegion uses a smaller staging
allocation, reducing per-call heap pressure under fragmentation.

Bands are computed via the rowsPerBand+Bands helpers. y + start_row
shifts each band's destination. All bands' staging buffers stay alive
until the next replaceRegion or deinit -- the frame fence still gates
release in beginFrame. No behavior change for sub-band uploads (atlas
updates, render targets, small images); they still allocate a single
band-sized staging buffer.

Followup to #382/#383. Documented in project_dx12_image_upload_audit.md.

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>

* renderer/image: note the image-size ceiling at prepImage

Tie the alloc.dupe to the kitty graphics / iTerm2 multipart 65 MiB
payload cap and point at the DX12 row-band chunking that handles the
staging-heap pressure. Closes audit followup #5 from
project_dx12_image_upload_audit.md.

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>

* renderer/directx12: fold review feedback into chunked uploads

Addresses ghostty-reviewer feedback on #385 (will be squash-merged):

* Trim deinit's value-receiver comment to lead with the cross-backend
  signature reason; drop the lecture on Zig value-parameter mechanics.
* Document the partial-write semantics on replaceRegion: if a later
  band's staging/Map fails, earlier bands' CopyTextureRegion calls have
  already been recorded and will execute. Texture.init's errdefer
  Releases the destination so the partial writes are harmless;
  replaceRegion's destination survives, so a multi-band atlas update
  could leave a mix of new+previous rows until the next call.
* Note the log-level asymmetry near the band loop: null guards stay at
  warn (defensive checks under correct callers), createStagingBuffer
  and Map failures stay at err (real GPU/driver failures with context).
* Flag the y + band.start_row arithmetic at the CopyTextureRegion call
  as the integration point between the pure Bands iterator and DX12
  that no unit test exercises -- a bug there shows up as vertically
  shifted/overlapping bands in a multi-band image.
* Add an exact-fit boundary test for rowsPerBand (row == budget).

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>
deblasis added a commit that referenced this pull request May 21, 2026
* renderer/directx12: add rowsPerBand helper for chunked uploads

Pure helper that returns how many source rows fit in a single upload
band, with a min-1 floor so a single very-wide row still makes progress.
Used by the upcoming chunked uploadRegion to size each staging buffer
at ~8 MiB instead of one buffer for the full image.

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>

* renderer/directx12: add Bands iterator for chunked uploads

Yields (start_row, row_count) tuples for splitting an upload region into
row-bands. Last band is truncated. Pure data, unit-tested, used by the
upcoming uploadRegion refactor.

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>

* renderer/directx12: pending_staging holds a list of buffers

Replace the single ?*ID3D12Resource with ArrayListUnmanaged backed by
c_allocator so a chunked upload can keep all of its band staging buffers
alive until the GPU finishes the CopyTextureRegion calls. deinit, the
replaceRegion prelude, and the existing single-buffer uploadRegion line
all migrate to the list-based shape; behavior is unchanged for now
because uploadRegion still does a single append per call. The band-loop
lands in the next commit.

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>

* renderer/directx12: chunk uploadRegion into 8 MiB row-bands

Large image uploads previously allocated a single exact-fit UPLOAD-heap
staging buffer for the whole region (e.g. ~48 MiB for a 4096x3072 RGBA
image after the iTerm2 multipart cap was raised to 65 MiB). Split into
~8 MiB row-bands so each CopyTextureRegion uses a smaller staging
allocation, reducing per-call heap pressure under fragmentation.

Bands are computed via the rowsPerBand+Bands helpers. y + start_row
shifts each band's destination. All bands' staging buffers stay alive
until the next replaceRegion or deinit -- the frame fence still gates
release in beginFrame. No behavior change for sub-band uploads (atlas
updates, render targets, small images); they still allocate a single
band-sized staging buffer.

Followup to #382/#383. Documented in project_dx12_image_upload_audit.md.

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>

* renderer/image: note the image-size ceiling at prepImage

Tie the alloc.dupe to the kitty graphics / iTerm2 multipart 65 MiB
payload cap and point at the DX12 row-band chunking that handles the
staging-heap pressure. Closes audit followup #5 from
project_dx12_image_upload_audit.md.

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>

* renderer/directx12: fold review feedback into chunked uploads

Addresses ghostty-reviewer feedback on #385 (will be squash-merged):

* Trim deinit's value-receiver comment to lead with the cross-backend
  signature reason; drop the lecture on Zig value-parameter mechanics.
* Document the partial-write semantics on replaceRegion: if a later
  band's staging/Map fails, earlier bands' CopyTextureRegion calls have
  already been recorded and will execute. Texture.init's errdefer
  Releases the destination so the partial writes are harmless;
  replaceRegion's destination survives, so a multi-band atlas update
  could leave a mix of new+previous rows until the next call.
* Note the log-level asymmetry near the band loop: null guards stay at
  warn (defensive checks under correct callers), createStagingBuffer
  and Map failures stay at err (real GPU/driver failures with context).
* Flag the y + band.start_row arithmetic at the CopyTextureRegion call
  as the integration point between the pure Bands iterator and DX12
  that no unit test exercises -- a bug there shows up as vertically
  shifted/overlapping bands in a multi-band image.
* Add an exact-fit boundary test for rowsPerBand (row == budget).

Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant