renderer/directx12: chunk image uploads into 8 MiB row-bands#386
Merged
Conversation
Pure helper that returns how many source rows fit in a single upload band, with a min-1 floor so a single very-wide row still makes progress. Used by the upcoming chunked uploadRegion to size each staging buffer at ~8 MiB instead of one buffer for the full image. Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>
Yields (start_row, row_count) tuples for splitting an upload region into row-bands. Last band is truncated. Pure data, unit-tested, used by the upcoming uploadRegion refactor. Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>
Replace the single ?*ID3D12Resource with ArrayListUnmanaged backed by c_allocator so a chunked upload can keep all of its band staging buffers alive until the GPU finishes the CopyTextureRegion calls. deinit, the replaceRegion prelude, and the existing single-buffer uploadRegion line all migrate to the list-based shape; behavior is unchanged for now because uploadRegion still does a single append per call. The band-loop lands in the next commit. Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>
Large image uploads previously allocated a single exact-fit UPLOAD-heap staging buffer for the whole region (e.g. ~48 MiB for a 4096x3072 RGBA image after the iTerm2 multipart cap was raised to 65 MiB). Split into ~8 MiB row-bands so each CopyTextureRegion uses a smaller staging allocation, reducing per-call heap pressure under fragmentation. Bands are computed via the rowsPerBand+Bands helpers. y + start_row shifts each band's destination. All bands' staging buffers stay alive until the next replaceRegion or deinit -- the frame fence still gates release in beginFrame. No behavior change for sub-band uploads (atlas updates, render targets, small images); they still allocate a single band-sized staging buffer. Followup to #382/#383. Documented in project_dx12_image_upload_audit.md. Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>
Tie the alloc.dupe to the kitty graphics / iTerm2 multipart 65 MiB payload cap and point at the DX12 row-band chunking that handles the staging-heap pressure. Closes audit followup #5 from project_dx12_image_upload_audit.md. Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>
Addresses ghostty-reviewer feedback on #385 (will be squash-merged): * Trim deinit's value-receiver comment to lead with the cross-backend signature reason; drop the lecture on Zig value-parameter mechanics. * Document the partial-write semantics on replaceRegion: if a later band's staging/Map fails, earlier bands' CopyTextureRegion calls have already been recorded and will execute. Texture.init's errdefer Releases the destination so the partial writes are harmless; replaceRegion's destination survives, so a multi-band atlas update could leave a mix of new+previous rows until the next call. * Note the log-level asymmetry near the band loop: null guards stay at warn (defensive checks under correct callers), createStagingBuffer and Map failures stay at err (real GPU/driver failures with context). * Flag the y + band.start_row arithmetic at the CopyTextureRegion call as the integration point between the pure Bands iterator and DX12 that no unit test exercises -- a bug there shows up as vertically shifted/overlapping bands in a multi-band image. * Add an exact-fit boundary test for rowsPerBand (row == budget). Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>
3 tasks
deblasis
added a commit
that referenced
this pull request
May 20, 2026
* renderer/directx12: track running pending_staging_bytes Mirror the pending_staging list contents with a u64 running sum so callers can cheaply query a texture's current UPLOAD-heap pressure without COM-querying each ID3D12Resource. Wires the increment in uploadRegion's band loop (per band_size) and the reset in replaceRegion's release loop. No behavior change yet; the budget gate that consumes this lands in the next commit. Co-Authored-By: Alessandro De Blasis <alex@deblasis.net> * renderer/image: add imageStagingBytes helper Pure helper that computes the UPLOAD-heap staging-buffer size for an RGBA image of given dimensions. Mirrors Texture.uploadRegion's row- pitch math (post-swizzle bpp=4, 256-byte alignment) so State.upload can size-check images before delegating to the DX12 layer. Also introduces the budget constant TEXTURE_UPLOAD_BUDGET_DEFAULT_BYTES that the upcoming State.upload guard will use as its default ceiling. Co-Authored-By: Alessandro De Blasis <alex@deblasis.net> * renderer/image: add wouldExceedBudget boundary check Trivial pure helper extracted for unit-testing the budget-gate boundary semantics: strict greater-than so an exact-fit (in_flight + est == budget) is allowed. Used by the upcoming State.upload guard. Co-Authored-By: Alessandro De Blasis <alex@deblasis.net> * renderer/image: add Image.pendingStagingBytes + estimate Two Image union methods that surface the staging-heap accounting State will need to make budget decisions: * pendingStagingBytes reads texture.pending_staging_bytes for variants that hold a texture; returns 0 for .pending/.unload_pending. * estimatedUploadStagingBytes uses imageStagingBytes(w, h) for variants that will trigger a new upload; returns 0 for already-uploaded or unload-marked variants. State.upload will sum the former across all images to get current in-flight bytes, and add the latter to budget-check pending uploads. Co-Authored-By: Alessandro De Blasis <alex@deblasis.net> * renderer/image: gate State.upload with per-frame budget Cap peak DX12 UPLOAD-heap staging pressure regardless of how many kitty/iTerm2 images arrive at once. State.upload now computes the current in-flight staging bytes across all already-uploaded textures at the start of the call, then walks the image map gating each pending image against the budget: * Image whose estimatedUploadStagingBytes would push the running in-flight total past upload_budget_bytes is skipped (logged at debug); it stays .pending and retries on the next frame. * Successful upload increments the running total so subsequent images in the same frame see the updated pressure. * Failed upload (markForUnload path) is unchanged in semantics; only added a continue to skip the increment. * Unloaded image decrements the running total via saturating sub. The deferral path is NOT counted as failure -- the bool return still only tracks markForUnload failures. The caller (generic.zig:1667) discards the value either way. Closes audit followup #3 (per-frame upload-budget guard) from project_dx12_image_upload_audit.md, layering on PR #386's chunked upload infrastructure. Co-Authored-By: Alessandro De Blasis <alex@deblasis.net> * renderer: fold review feedback into upload-budget guard Addresses ghostty-reviewer findings on #388 before squash-merge: * Critical: image.zig is renderer-generic. The previous version unconditionally read Texture.pending_staging_bytes (DX12-only), breaking Metal/OpenGL builds at compile time. Now both Image staging helpers and State.upload comptime-gate the budget logic on @Hasfield(Texture, "pending_staging_bytes"); non-DX12 backends see the original unguarded upload loop with zero overhead. * Verified by `zig build test -Drenderer=opengl` (passes) in addition to the default DX12 build. * imageStagingBytes() now does u64 arithmetic throughout. The u32 intermediate could in theory wrap on absurd widths (width >= 2^30); not reachable today but cheap to harden. Doc-comment clarified that the bpp=4 assumption is the DX12 post-swizzle destination. * Cross-module drift assert: directx12/Texture.zig TEXTURE_DATA_PITCH_ALIGNMENT promoted to pub, and image.zig adds a comptime block that asserts the two constants match on the DX12 build (gracefully skipped via @hasDecl on other renderers). * State.upload restructured to single-pass-per-concern: pass 1 sweeps unloads + accumulates in-flight from survivors; pass 2 uploads pending images with optional budget gating. Eliminates the saturating-subtract on unload (reviewer flagged: it could mask a real bookkeeping bug) because unloaded images are gone before the in-flight total is finalized. Dropped the `est > 0` short-circuit because wouldExceedBudget(0, _, _) handles it identically. The pre-pass comment trimmed. * DX12-only tests gated on @Hasfield via the if/else-pattern so Metal/OpenGL builds analyze only the SkipZigTest branch and never see the pending_staging_bytes field access or the `Texture{}` default-init (OpenGL's Texture has no field defaults). Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>
deblasis
added a commit
that referenced
this pull request
May 21, 2026
* renderer/directx12: add rowsPerBand helper for chunked uploads Pure helper that returns how many source rows fit in a single upload band, with a min-1 floor so a single very-wide row still makes progress. Used by the upcoming chunked uploadRegion to size each staging buffer at ~8 MiB instead of one buffer for the full image. Co-Authored-By: Alessandro De Blasis <alex@deblasis.net> * renderer/directx12: add Bands iterator for chunked uploads Yields (start_row, row_count) tuples for splitting an upload region into row-bands. Last band is truncated. Pure data, unit-tested, used by the upcoming uploadRegion refactor. Co-Authored-By: Alessandro De Blasis <alex@deblasis.net> * renderer/directx12: pending_staging holds a list of buffers Replace the single ?*ID3D12Resource with ArrayListUnmanaged backed by c_allocator so a chunked upload can keep all of its band staging buffers alive until the GPU finishes the CopyTextureRegion calls. deinit, the replaceRegion prelude, and the existing single-buffer uploadRegion line all migrate to the list-based shape; behavior is unchanged for now because uploadRegion still does a single append per call. The band-loop lands in the next commit. Co-Authored-By: Alessandro De Blasis <alex@deblasis.net> * renderer/directx12: chunk uploadRegion into 8 MiB row-bands Large image uploads previously allocated a single exact-fit UPLOAD-heap staging buffer for the whole region (e.g. ~48 MiB for a 4096x3072 RGBA image after the iTerm2 multipart cap was raised to 65 MiB). Split into ~8 MiB row-bands so each CopyTextureRegion uses a smaller staging allocation, reducing per-call heap pressure under fragmentation. Bands are computed via the rowsPerBand+Bands helpers. y + start_row shifts each band's destination. All bands' staging buffers stay alive until the next replaceRegion or deinit -- the frame fence still gates release in beginFrame. No behavior change for sub-band uploads (atlas updates, render targets, small images); they still allocate a single band-sized staging buffer. Followup to #382/#383. Documented in project_dx12_image_upload_audit.md. Co-Authored-By: Alessandro De Blasis <alex@deblasis.net> * renderer/image: note the image-size ceiling at prepImage Tie the alloc.dupe to the kitty graphics / iTerm2 multipart 65 MiB payload cap and point at the DX12 row-band chunking that handles the staging-heap pressure. Closes audit followup #5 from project_dx12_image_upload_audit.md. Co-Authored-By: Alessandro De Blasis <alex@deblasis.net> * renderer/directx12: fold review feedback into chunked uploads Addresses ghostty-reviewer feedback on #385 (will be squash-merged): * Trim deinit's value-receiver comment to lead with the cross-backend signature reason; drop the lecture on Zig value-parameter mechanics. * Document the partial-write semantics on replaceRegion: if a later band's staging/Map fails, earlier bands' CopyTextureRegion calls have already been recorded and will execute. Texture.init's errdefer Releases the destination so the partial writes are harmless; replaceRegion's destination survives, so a multi-band atlas update could leave a mix of new+previous rows until the next call. * Note the log-level asymmetry near the band loop: null guards stay at warn (defensive checks under correct callers), createStagingBuffer and Map failures stay at err (real GPU/driver failures with context). * Flag the y + band.start_row arithmetic at the CopyTextureRegion call as the integration point between the pure Bands iterator and DX12 that no unit test exercises -- a bug there shows up as vertically shifted/overlapping bands in a multi-band image. * Add an exact-fit boundary test for rowsPerBand (row == budget). Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>
deblasis
added a commit
that referenced
this pull request
May 21, 2026
* renderer/directx12: track running pending_staging_bytes Mirror the pending_staging list contents with a u64 running sum so callers can cheaply query a texture's current UPLOAD-heap pressure without COM-querying each ID3D12Resource. Wires the increment in uploadRegion's band loop (per band_size) and the reset in replaceRegion's release loop. No behavior change yet; the budget gate that consumes this lands in the next commit. Co-Authored-By: Alessandro De Blasis <alex@deblasis.net> * renderer/image: add imageStagingBytes helper Pure helper that computes the UPLOAD-heap staging-buffer size for an RGBA image of given dimensions. Mirrors Texture.uploadRegion's row- pitch math (post-swizzle bpp=4, 256-byte alignment) so State.upload can size-check images before delegating to the DX12 layer. Also introduces the budget constant TEXTURE_UPLOAD_BUDGET_DEFAULT_BYTES that the upcoming State.upload guard will use as its default ceiling. Co-Authored-By: Alessandro De Blasis <alex@deblasis.net> * renderer/image: add wouldExceedBudget boundary check Trivial pure helper extracted for unit-testing the budget-gate boundary semantics: strict greater-than so an exact-fit (in_flight + est == budget) is allowed. Used by the upcoming State.upload guard. Co-Authored-By: Alessandro De Blasis <alex@deblasis.net> * renderer/image: add Image.pendingStagingBytes + estimate Two Image union methods that surface the staging-heap accounting State will need to make budget decisions: * pendingStagingBytes reads texture.pending_staging_bytes for variants that hold a texture; returns 0 for .pending/.unload_pending. * estimatedUploadStagingBytes uses imageStagingBytes(w, h) for variants that will trigger a new upload; returns 0 for already-uploaded or unload-marked variants. State.upload will sum the former across all images to get current in-flight bytes, and add the latter to budget-check pending uploads. Co-Authored-By: Alessandro De Blasis <alex@deblasis.net> * renderer/image: gate State.upload with per-frame budget Cap peak DX12 UPLOAD-heap staging pressure regardless of how many kitty/iTerm2 images arrive at once. State.upload now computes the current in-flight staging bytes across all already-uploaded textures at the start of the call, then walks the image map gating each pending image against the budget: * Image whose estimatedUploadStagingBytes would push the running in-flight total past upload_budget_bytes is skipped (logged at debug); it stays .pending and retries on the next frame. * Successful upload increments the running total so subsequent images in the same frame see the updated pressure. * Failed upload (markForUnload path) is unchanged in semantics; only added a continue to skip the increment. * Unloaded image decrements the running total via saturating sub. The deferral path is NOT counted as failure -- the bool return still only tracks markForUnload failures. The caller (generic.zig:1667) discards the value either way. Closes audit followup #3 (per-frame upload-budget guard) from project_dx12_image_upload_audit.md, layering on PR #386's chunked upload infrastructure. Co-Authored-By: Alessandro De Blasis <alex@deblasis.net> * renderer: fold review feedback into upload-budget guard Addresses ghostty-reviewer findings on #388 before squash-merge: * Critical: image.zig is renderer-generic. The previous version unconditionally read Texture.pending_staging_bytes (DX12-only), breaking Metal/OpenGL builds at compile time. Now both Image staging helpers and State.upload comptime-gate the budget logic on @Hasfield(Texture, "pending_staging_bytes"); non-DX12 backends see the original unguarded upload loop with zero overhead. * Verified by `zig build test -Drenderer=opengl` (passes) in addition to the default DX12 build. * imageStagingBytes() now does u64 arithmetic throughout. The u32 intermediate could in theory wrap on absurd widths (width >= 2^30); not reachable today but cheap to harden. Doc-comment clarified that the bpp=4 assumption is the DX12 post-swizzle destination. * Cross-module drift assert: directx12/Texture.zig TEXTURE_DATA_PITCH_ALIGNMENT promoted to pub, and image.zig adds a comptime block that asserts the two constants match on the DX12 build (gracefully skipped via @hasDecl on other renderers). * State.upload restructured to single-pass-per-concern: pass 1 sweeps unloads + accumulates in-flight from survivors; pass 2 uploads pending images with optional budget gating. Eliminates the saturating-subtract on unload (reviewer flagged: it could mask a real bookkeeping bug) because unloaded images are gone before the in-flight total is finalized. Dropped the `est > 0` short-circuit because wouldExceedBudget(0, _, _) handles it identically. The pre-pass comment trimmed. * DX12-only tests gated on @Hasfield via the if/else-pattern so Metal/OpenGL builds analyze only the SkipZigTest branch and never see the pending_staging_bytes field access or the `Texture{}` default-init (OpenGL's Texture has no field defaults). Co-Authored-By: Alessandro De Blasis <alex@deblasis.net>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Texture.uploadRegioninto ~8 MiB row-bands so eachCopyTextureRegionallocates a smaller UPLOAD-heap staging buffer instead of one exact-fit allocation for the whole image.pending_stagingbecomes astd.ArrayListUnmanaged(*ID3D12Resource)backed bystd.heap.c_allocator; all bands released together on the nextreplaceRegionor indeinit. Thedeinitvalue-receiver signature stays compatible with existing call sites (Imageswitch captures,front_texture/back_texturefield deinits).rowsPerBand+Bandspure helpers with 9 unit tests (including the exact-fit boundary case).prepImage(image.zig:566) referencing the 65 MiB multipart cap.replaceRegion: if a later band's staging/Map fails, earlier bands'CopyTextureRegioncalls have already been recorded and will execute.Texture.init's errdefer Releases the destination so partial writes are harmless;replaceRegion's destination survives, so a multi-band atlas update could leave a mix of new and previous rows until the next call (acceptable for atlases).Followups #2 + #5 from
project_dx12_image_upload_audit.md. Followup #3 (per-frame upload-budget guard) layers on this and is the natural next PR.Originally opened as #385 (stacked on #384); reopened against
windowsafter #384 squash-merged and auto-closed #385.ghostty-reviewer feedback folded in (last commit)
deinit's value-receiver comment to lead with the cross-backend signature reason.replaceRegion.y + band.start_rowarithmetic atCopyTextureRegionas the iterator→DX12 integration point not covered by pure unit tests.rowsPerBandexact-fit boundary test (row == budget).Test plan
zig build testpasses on WindowsrowsPerBand+Bandstests pass (9 total)🤖 Generated with Claude Code