Skip to content

vello_hybrid: Avoid unnecessarily allocating Uint32Arrays#1498

Merged
laurenz-canva merged 2 commits into
mainfrom
laurenz/copy
Mar 18, 2026
Merged

vello_hybrid: Avoid unnecessarily allocating Uint32Arrays#1498
laurenz-canva merged 2 commits into
mainfrom
laurenz/copy

Conversation

@laurenz-canva
Copy link
Copy Markdown
Contributor

For the GhostScript tiger scene:

Before: texImage2D as well as UInt32Array::copy_from_slice are visible in the profile.
image

After: Only texImage2D is visibile in the profile.
image

@laurenz-canva laurenz-canva requested review from grebmeg and taj-p and removed request for taj-p March 10, 2026 14:24
Comment on lines +1087 to +1096
// See also: https://wikis.khronos.org/opengl/Synchronization
// >> There are several OpenGL functions that can pull data directly from client-side memory,
// >> or push data directly into client-side memory. Functions like `glTexSubImage2D`,
// >> `glReadPixels`, `glBufferSubData` and so forth.
//
// >> Because OpenGL is defined to be synchronous, when any of these functions have
// >> returned, they must have finished with the client memory. When `glReadPixels` returns,
// >> the pixel data is in your client memory (unless you are reading into a buffer object).
// >> When `glBufferSubData` returns, you can immediately modify or delete whatever memory
// >> pointer you gave it, as OpenGL has already read as much as it wants.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume this applies to WebGL too, then, right?

Copy link
Copy Markdown
Contributor

@taj-p taj-p left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@LaurenzV
Copy link
Copy Markdown
Collaborator

Would you like to have this soon or shall I wait for Alex's review as well?

@taj-p
Copy link
Copy Markdown
Contributor

taj-p commented Mar 10, 2026

Would you like to have this soon or shall I wait for Alex's review as well?

Thanks for asking 🙏 . I'll see how we go today building our feature branch and either cherry-pick this in or ask @grebmeg for a review. I believe we might cherry-pick #1493 in too (those gains look so good!)

@nicoburns nicoburns added the C-hybrid Applies to the vello_hybrid crate label Mar 11, 2026
Comment on lines +2219 to +2257
fn upload_data_to_rgba32_texture(
gl: &WebGl2RenderingContext,
data: &[u32],
texture_width: u32,
texture_height: u32,
) {
// Safety: This calling `Uint32Array::view` is unsafe because it provides a view into
// WASM linear memory, and any additional allocations might invalidate that view.
// In our case, this is not an issue because we only use this view once for uploading
// data to the GPU below, and no allocations happen between that.
// The `tex_image_2d` method is synchronous in the sense that once it returns, it is guaranteed
// that all necessary data has already been read, so any allocations that happen
// after this block don't affect this anymore.
//
// See also: https://wikis.khronos.org/opengl/Synchronization
// >> There are several OpenGL functions that can pull data directly from client-side memory,
// >> or push data directly into client-side memory. Functions like `glTexSubImage2D`,
// >> `glReadPixels`, `glBufferSubData` and so forth.
//
// >> Because OpenGL is defined to be synchronous, when any of these functions have
// >> returned, they must have finished with the client memory. When `glReadPixels` returns,
// >> the pixel data is in your client memory (unless you are reading into a buffer object).
// >> When `glBufferSubData` returns, you can immediately modify or delete whatever memory
// >> pointer you gave it, as OpenGL has already read as much as it wants.
let packed_array = unsafe { js_sys::Uint32Array::view(data) };

gl.tex_image_2d_with_i32_and_i32_and_i32_and_format_and_type_and_opt_array_buffer_view(
WebGl2RenderingContext::TEXTURE_2D,
0,
WebGl2RenderingContext::RGBA32UI as i32,
texture_width as i32,
texture_height as i32,
0,
WebGl2RenderingContext::RGBA_INTEGER,
WebGl2RenderingContext::UNSIGNED_INT,
Some(&packed_array),
)
.unwrap();
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thought: I think maybe you can avoid the unsafe entirely by using the _opt_u8_array overload instead. As I understand it when web-sys receives a &[u8] parameter, wasm-bindgen passes it as a zero-copy Uint8Array view into wasm linear memory, same perf as the manual Uint32Array::view(). WebGL interprets the raw bytes according to the type parameter (UNSIGNED_INT), not the JS typed array type, so a Uint8Array view works correctly for RGBA32UI textures.

This would simplify the helper to:

fn upload_data_to_rgba32_texture(
    gl: &WebGl2RenderingContext,
    data: &[u8],
    texture_width: u32,
    texture_height: u32,
) {
    gl.tex_image_2d_with_i32_and_i32_and_i32_and_format_and_type_and_opt_u8_array(
        WebGl2RenderingContext::TEXTURE_2D,
        0,
        WebGl2RenderingContext::RGBA32UI as i32,
        texture_width as i32,
        texture_height as i32,
        0,
        WebGl2RenderingContext::RGBA_INTEGER,
        WebGl2RenderingContext::UNSIGNED_INT,
        Some(data),
    )
    .unwrap();
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! Lemme try this out!

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, that doesn't seem to work correctly due to the fact that we are using RGBA32 textures.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks for checking! I think the next option might work, but I’m not sure that making two calls is better than the current approach.

fn upload_data_to_rgba32_texture(
    gl: &WebGl2RenderingContext,
    data: &[u8],
    texture_width: u32,
    texture_height: u32,
    pixel_unpack_buffer: &WebGlBuffer,
) {
    gl.bind_buffer(
        WebGl2RenderingContext::PIXEL_UNPACK_BUFFER,
        Some(pixel_unpack_buffer),
    );
    gl.buffer_data_with_u8_array(
        WebGl2RenderingContext::PIXEL_UNPACK_BUFFER,
        data,
        WebGl2RenderingContext::STREAM_DRAW,
    );

    gl.tex_image_2d_with_i32_and_i32_and_i32_and_format_and_type_and_f64(
        WebGl2RenderingContext::TEXTURE_2D,
        0,
        WebGl2RenderingContext::RGBA32UI as i32,
        texture_width as i32,
        texture_height as i32,
        0,
        WebGl2RenderingContext::RGBA_INTEGER,
        WebGl2RenderingContext::UNSIGNED_INT,
        0.0,
    )
    .unwrap();

    gl.bind_buffer(WebGl2RenderingContext::PIXEL_UNPACK_BUFFER, None);
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, but wouldn't this still invoke another copy to the unpack buffer first before copying it to the GPU, whch is what we've been trying to avoid?

Since you both approved I think I'll merge it in its current state, but let me know if you would like to see adjustments.

Comment on lines +2219 to +2257
fn upload_data_to_rgba32_texture(
gl: &WebGl2RenderingContext,
data: &[u32],
texture_width: u32,
texture_height: u32,
) {
// Safety: This calling `Uint32Array::view` is unsafe because it provides a view into
// WASM linear memory, and any additional allocations might invalidate that view.
// In our case, this is not an issue because we only use this view once for uploading
// data to the GPU below, and no allocations happen between that.
// The `tex_image_2d` method is synchronous in the sense that once it returns, it is guaranteed
// that all necessary data has already been read, so any allocations that happen
// after this block don't affect this anymore.
//
// See also: https://wikis.khronos.org/opengl/Synchronization
// >> There are several OpenGL functions that can pull data directly from client-side memory,
// >> or push data directly into client-side memory. Functions like `glTexSubImage2D`,
// >> `glReadPixels`, `glBufferSubData` and so forth.
//
// >> Because OpenGL is defined to be synchronous, when any of these functions have
// >> returned, they must have finished with the client memory. When `glReadPixels` returns,
// >> the pixel data is in your client memory (unless you are reading into a buffer object).
// >> When `glBufferSubData` returns, you can immediately modify or delete whatever memory
// >> pointer you gave it, as OpenGL has already read as much as it wants.
let packed_array = unsafe { js_sys::Uint32Array::view(data) };

gl.tex_image_2d_with_i32_and_i32_and_i32_and_format_and_type_and_opt_array_buffer_view(
WebGl2RenderingContext::TEXTURE_2D,
0,
WebGl2RenderingContext::RGBA32UI as i32,
texture_width as i32,
texture_height as i32,
0,
WebGl2RenderingContext::RGBA_INTEGER,
WebGl2RenderingContext::UNSIGNED_INT,
Some(&packed_array),
)
.unwrap();
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks for checking! I think the next option might work, but I’m not sure that making two calls is better than the current approach.

fn upload_data_to_rgba32_texture(
    gl: &WebGl2RenderingContext,
    data: &[u8],
    texture_width: u32,
    texture_height: u32,
    pixel_unpack_buffer: &WebGlBuffer,
) {
    gl.bind_buffer(
        WebGl2RenderingContext::PIXEL_UNPACK_BUFFER,
        Some(pixel_unpack_buffer),
    );
    gl.buffer_data_with_u8_array(
        WebGl2RenderingContext::PIXEL_UNPACK_BUFFER,
        data,
        WebGl2RenderingContext::STREAM_DRAW,
    );

    gl.tex_image_2d_with_i32_and_i32_and_i32_and_format_and_type_and_f64(
        WebGl2RenderingContext::TEXTURE_2D,
        0,
        WebGl2RenderingContext::RGBA32UI as i32,
        texture_width as i32,
        texture_height as i32,
        0,
        WebGl2RenderingContext::RGBA_INTEGER,
        WebGl2RenderingContext::UNSIGNED_INT,
        0.0,
    )
    .unwrap();

    gl.bind_buffer(WebGl2RenderingContext::PIXEL_UNPACK_BUFFER, None);
}

@laurenz-canva laurenz-canva added this pull request to the merge queue Mar 18, 2026
Merged via the queue into main with commit 8f2782f Mar 18, 2026
17 checks passed
@laurenz-canva laurenz-canva deleted the laurenz/copy branch March 18, 2026 07:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C-hybrid Applies to the vello_hybrid crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants