renderer/directx12: chunk image uploads into 8 MiB row-bands#385
Closed
deblasis wants to merge 6 commits into
Closed
renderer/directx12: chunk image uploads into 8 MiB row-bands#385deblasis wants to merge 6 commits into
deblasis wants to merge 6 commits into
Conversation
911060e to
7bce6b3
Compare
Pure helper that returns how many source rows fit in a single upload band, with a min-1 floor so a single very-wide row still makes progress. Used by the upcoming chunked uploadRegion to size each staging buffer at ~8 MiB instead of one buffer for the full image. Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>
Yields (start_row, row_count) tuples for splitting an upload region into row-bands. Last band is truncated. Pure data, unit-tested, used by the upcoming uploadRegion refactor. Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>
Replace the single ?*ID3D12Resource with ArrayListUnmanaged backed by c_allocator so a chunked upload can keep all of its band staging buffers alive until the GPU finishes the CopyTextureRegion calls. deinit, the replaceRegion prelude, and the existing single-buffer uploadRegion line all migrate to the list-based shape; behavior is unchanged for now because uploadRegion still does a single append per call. The band-loop lands in the next commit. Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>
Large image uploads previously allocated a single exact-fit UPLOAD-heap staging buffer for the whole region (e.g. ~48 MiB for a 4096x3072 RGBA image after the iTerm2 multipart cap was raised to 65 MiB). Split into ~8 MiB row-bands so each CopyTextureRegion uses a smaller staging allocation, reducing per-call heap pressure under fragmentation. Bands are computed via the rowsPerBand+Bands helpers. y + start_row shifts each band's destination. All bands' staging buffers stay alive until the next replaceRegion or deinit -- the frame fence still gates release in beginFrame. No behavior change for sub-band uploads (atlas updates, render targets, small images); they still allocate a single band-sized staging buffer. Followup to #382/#383. Documented in project_dx12_image_upload_audit.md. Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>
Tie the alloc.dupe to the kitty graphics / iTerm2 multipart 65 MiB payload cap and point at the DX12 row-band chunking that handles the staging-heap pressure. Closes audit followup #5 from project_dx12_image_upload_audit.md. Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>
Addresses ghostty-reviewer feedback on #385 (will be squash-merged): * Trim deinit's value-receiver comment to lead with the cross-backend signature reason; drop the lecture on Zig value-parameter mechanics. * Document the partial-write semantics on replaceRegion: if a later band's staging/Map fails, earlier bands' CopyTextureRegion calls have already been recorded and will execute. Texture.init's errdefer Releases the destination so the partial writes are harmless; replaceRegion's destination survives, so a multi-band atlas update could leave a mix of new+previous rows until the next call. * Note the log-level asymmetry near the band loop: null guards stay at warn (defensive checks under correct callers), createStagingBuffer and Map failures stay at err (real GPU/driver failures with context). * Flag the y + band.start_row arithmetic at the CopyTextureRegion call as the integration point between the pure Bands iterator and DX12 that no unit test exercises -- a bug there shows up as vertically shifted/overlapping bands in a multi-band image. * Add an exact-fit boundary test for rowsPerBand (row == budget). Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>
2fb0330 to
d4ec33c
Compare
deblasis
added a commit
that referenced
this pull request
May 20, 2026
Addresses ghostty-reviewer feedback on #385 (will be squash-merged): * Trim deinit's value-receiver comment to lead with the cross-backend signature reason; drop the lecture on Zig value-parameter mechanics. * Document the partial-write semantics on replaceRegion: if a later band's staging/Map fails, earlier bands' CopyTextureRegion calls have already been recorded and will execute. Texture.init's errdefer Releases the destination so the partial writes are harmless; replaceRegion's destination survives, so a multi-band atlas update could leave a mix of new+previous rows until the next call. * Note the log-level asymmetry near the band loop: null guards stay at warn (defensive checks under correct callers), createStagingBuffer and Map failures stay at err (real GPU/driver failures with context). * Flag the y + band.start_row arithmetic at the CopyTextureRegion call as the integration point between the pure Bands iterator and DX12 that no unit test exercises -- a bug there shows up as vertically shifted/overlapping bands in a multi-band image. * Add an exact-fit boundary test for rowsPerBand (row == budget). Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>
3 tasks
deblasis
added a commit
that referenced
this pull request
May 20, 2026
* renderer/directx12: add rowsPerBand helper for chunked uploads Pure helper that returns how many source rows fit in a single upload band, with a min-1 floor so a single very-wide row still makes progress. Used by the upcoming chunked uploadRegion to size each staging buffer at ~8 MiB instead of one buffer for the full image. Co-Authored-By: Alessandro De Blasis <alex@deblasis.net> * renderer/directx12: add Bands iterator for chunked uploads Yields (start_row, row_count) tuples for splitting an upload region into row-bands. Last band is truncated. Pure data, unit-tested, used by the upcoming uploadRegion refactor. Co-Authored-By: Alessandro De Blasis <alex@deblasis.net> * renderer/directx12: pending_staging holds a list of buffers Replace the single ?*ID3D12Resource with ArrayListUnmanaged backed by c_allocator so a chunked upload can keep all of its band staging buffers alive until the GPU finishes the CopyTextureRegion calls. deinit, the replaceRegion prelude, and the existing single-buffer uploadRegion line all migrate to the list-based shape; behavior is unchanged for now because uploadRegion still does a single append per call. The band-loop lands in the next commit. Co-Authored-By: Alessandro De Blasis <alex@deblasis.net> * renderer/directx12: chunk uploadRegion into 8 MiB row-bands Large image uploads previously allocated a single exact-fit UPLOAD-heap staging buffer for the whole region (e.g. ~48 MiB for a 4096x3072 RGBA image after the iTerm2 multipart cap was raised to 65 MiB). Split into ~8 MiB row-bands so each CopyTextureRegion uses a smaller staging allocation, reducing per-call heap pressure under fragmentation. Bands are computed via the rowsPerBand+Bands helpers. y + start_row shifts each band's destination. All bands' staging buffers stay alive until the next replaceRegion or deinit -- the frame fence still gates release in beginFrame. No behavior change for sub-band uploads (atlas updates, render targets, small images); they still allocate a single band-sized staging buffer. Followup to #382/#383. Documented in project_dx12_image_upload_audit.md. Co-Authored-By: Alessandro De Blasis <alex@deblasis.net> * renderer/image: note the image-size ceiling at prepImage Tie the alloc.dupe to the kitty graphics / iTerm2 multipart 65 MiB payload cap and point at the DX12 row-band chunking that handles the staging-heap pressure. Closes audit followup #5 from project_dx12_image_upload_audit.md. Co-Authored-By: Alessandro De Blasis <alex@deblasis.net> * renderer/directx12: fold review feedback into chunked uploads Addresses ghostty-reviewer feedback on #385 (will be squash-merged): * Trim deinit's value-receiver comment to lead with the cross-backend signature reason; drop the lecture on Zig value-parameter mechanics. * Document the partial-write semantics on replaceRegion: if a later band's staging/Map fails, earlier bands' CopyTextureRegion calls have already been recorded and will execute. Texture.init's errdefer Releases the destination so the partial writes are harmless; replaceRegion's destination survives, so a multi-band atlas update could leave a mix of new+previous rows until the next call. * Note the log-level asymmetry near the band loop: null guards stay at warn (defensive checks under correct callers), createStagingBuffer and Map failures stay at err (real GPU/driver failures with context). * Flag the y + band.start_row arithmetic at the CopyTextureRegion call as the integration point between the pure Bands iterator and DX12 that no unit test exercises -- a bug there shows up as vertically shifted/overlapping bands in a multi-band image. * Add an exact-fit boundary test for rowsPerBand (row == budget). Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>
deblasis
added a commit
that referenced
this pull request
May 21, 2026
* renderer/directx12: add rowsPerBand helper for chunked uploads Pure helper that returns how many source rows fit in a single upload band, with a min-1 floor so a single very-wide row still makes progress. Used by the upcoming chunked uploadRegion to size each staging buffer at ~8 MiB instead of one buffer for the full image. Co-Authored-By: Alessandro De Blasis <alex@deblasis.net> * renderer/directx12: add Bands iterator for chunked uploads Yields (start_row, row_count) tuples for splitting an upload region into row-bands. Last band is truncated. Pure data, unit-tested, used by the upcoming uploadRegion refactor. Co-Authored-By: Alessandro De Blasis <alex@deblasis.net> * renderer/directx12: pending_staging holds a list of buffers Replace the single ?*ID3D12Resource with ArrayListUnmanaged backed by c_allocator so a chunked upload can keep all of its band staging buffers alive until the GPU finishes the CopyTextureRegion calls. deinit, the replaceRegion prelude, and the existing single-buffer uploadRegion line all migrate to the list-based shape; behavior is unchanged for now because uploadRegion still does a single append per call. The band-loop lands in the next commit. Co-Authored-By: Alessandro De Blasis <alex@deblasis.net> * renderer/directx12: chunk uploadRegion into 8 MiB row-bands Large image uploads previously allocated a single exact-fit UPLOAD-heap staging buffer for the whole region (e.g. ~48 MiB for a 4096x3072 RGBA image after the iTerm2 multipart cap was raised to 65 MiB). Split into ~8 MiB row-bands so each CopyTextureRegion uses a smaller staging allocation, reducing per-call heap pressure under fragmentation. Bands are computed via the rowsPerBand+Bands helpers. y + start_row shifts each band's destination. All bands' staging buffers stay alive until the next replaceRegion or deinit -- the frame fence still gates release in beginFrame. No behavior change for sub-band uploads (atlas updates, render targets, small images); they still allocate a single band-sized staging buffer. Followup to #382/#383. Documented in project_dx12_image_upload_audit.md. Co-Authored-By: Alessandro De Blasis <alex@deblasis.net> * renderer/image: note the image-size ceiling at prepImage Tie the alloc.dupe to the kitty graphics / iTerm2 multipart 65 MiB payload cap and point at the DX12 row-band chunking that handles the staging-heap pressure. Closes audit followup #5 from project_dx12_image_upload_audit.md. Co-Authored-By: Alessandro De Blasis <alex@deblasis.net> * renderer/directx12: fold review feedback into chunked uploads Addresses ghostty-reviewer feedback on #385 (will be squash-merged): * Trim deinit's value-receiver comment to lead with the cross-backend signature reason; drop the lecture on Zig value-parameter mechanics. * Document the partial-write semantics on replaceRegion: if a later band's staging/Map fails, earlier bands' CopyTextureRegion calls have already been recorded and will execute. Texture.init's errdefer Releases the destination so the partial writes are harmless; replaceRegion's destination survives, so a multi-band atlas update could leave a mix of new+previous rows until the next call. * Note the log-level asymmetry near the band loop: null guards stay at warn (defensive checks under correct callers), createStagingBuffer and Map failures stay at err (real GPU/driver failures with context). * Flag the y + band.start_row arithmetic at the CopyTextureRegion call as the integration point between the pure Bands iterator and DX12 that no unit test exercises -- a bug there shows up as vertically shifted/overlapping bands in a multi-band image. * Add an exact-fit boundary test for rowsPerBand (row == budget). Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Texture.uploadRegioninto ~8 MiB row-bands so eachCopyTextureRegionallocates a smaller UPLOAD-heap staging buffer instead of one exact-fit allocation for the whole image.pending_stagingbecomes astd.ArrayListUnmanaged(*ID3D12Resource)backed bystd.heap.c_allocator; all bands released together on the nextreplaceRegionor indeinit. Thedeinitvalue-receiver signature stays compatible with existing call sites (Imageswitch captures,front_texture/back_texturefield deinits).rowsPerBand+Bandspure helpers with 8 unit tests.prepImage(image.zig:566) referencing the 65 MiB multipart cap.Followups #2 + #5 from
project_dx12_image_upload_audit.md. Followup #3 (per-frame upload-budget guard) layers on this and is the natural next PR.Stacked on #384 — that PR fixes the pre-existing pointer-alignment + log.err-in-tests issues in #382's null-guard tests so the Texture test suite can actually compile and run on Windows. Merge order: #384 first, then this.
Test plan
zig build testpasses on Windowszig build(the libghostty binary) fails withsrc\lib\union.zig:90:27: error: extern unions cannot contain fields of type 'terminal.osc.Command.Iterm2ImageTransmit'. Reproduced clean onorigin/windowswith no local changes — this is a pre-existing extern-union compatibility issue introduced by terminal/osc: honor iTerm2 OSC 1337 File= geometry hints #377/terminal/osc: support iTerm2 OSC 1337 multipart File= transfers #378's iTerm2 transmit struct that surfaces only whenzig build(notzig build test) reaches thecombine_archivesstep. Tracked separately; not introduced or worsened by this PR.🤖 Generated with Claude Code