Skip to content

shader: implement bicubic image sampling#1557

Merged
waywardmonkeys merged 1 commit into
linebender:mainfrom
waywardmonkeys:vello-classic-bicubic-sampling
Apr 14, 2026
Merged

shader: implement bicubic image sampling#1557
waywardmonkeys merged 1 commit into
linebender:mainfrom
waywardmonkeys:vello-classic-bicubic-sampling

Conversation

@waywardmonkeys
Copy link
Copy Markdown
Collaborator

Adapt the sparse-strips bicubic kernel to main Vello's single 2D atlas and premultiplied image path, and add a dedicated snapshot scene that distinguishes Low, Medium, and High image sampling.

This addresses part of #1545.

@raphlinus
Copy link
Copy Markdown
Contributor

I've skimmed it and it looks pretty good overall; I hope to do a full review shortly. One question I have: is there a particular reason there isn't a corresponding change to the CPU shaders?

@waywardmonkeys
Copy link
Copy Markdown
Collaborator Author

I've skimmed it and it looks pretty good overall; I hope to do a full review shortly. One question I have: is there a particular reason there isn't a corresponding change to the CPU shaders?

Don't let this nerd snipe us into implementing this ... but:

fine_main is noted with:

#[expect(unused, reason = "Draft code as textures not wired up")]

And there's no support for CMD_IMAGE at all in there.

@nicoburns nicoburns added the C-classic Applies to the classic `vello` crate label Apr 7, 2026
Copy link
Copy Markdown
Contributor

@raphlinus raphlinus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. Inside see a potential optimisation (not huge in the grand scheme of things) and some thoughts how I'd want to see handling of premultiplied alpha evolve.

Comment thread vello_shaders/shader/fine.wgsl Outdated
// Interpolate in y direction
let result = cy.x * row0 + cy.y * row1 + cy.z * row2 + cy.w * row3;

// Clamp each component to [0,1] and ensure color components don't exceed alpha
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to compute a = clamp(result.a, 0.0, 1.0) then clamp each of r, g, b to a.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also a correctness fix, so the new version is:

// Clamp alpha first, then clamp premultiplied color channels against it.
let a = clamp(result.a, 0.0, 1.0);
return vec4<f32>(clamp(result.rgb, vec3(0.0), vec3(a)), a);

let cy = cubic_weights(frac_coords.y);

// Sample 4x4 grid around coords
let s00 = maybe_premul_alpha(textureLoad(image_atlas, vec2<i32>(clamp(coords + vec2(-1.5, -1.5), atlas_offset, atlas_max)), 0), alpha_type);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outside the scope of this PR, but looking at this, I think I would handle "maybe premul" a bit differently. The atlas would always be premultiplied, so the upload of the image into the atlas would do premultiplication when needed. That would get rid of the branch here, which is especially significant for bicubic because there are so many of them. That said, I see the complication – uploads are currently done with queue.write_texture, which doesn't have any mechanism to perform the premultiplication, so there's an additional pass that would be required in the separate alpha case. I'm thinking about ways to reduce the additional memory traffic, and what comes to mind is uploading the data from the CPU into a staging texture (or buffer; I'm not sure it matters much), then using either a compute shader or a draw call to read from the staging texture and write the premultiplied pixels to the atlas.

More important than the branch, storing the atlas as always-premultiplied opens the door to using textureSample rather than textureLoad, as is being explored in #1547 (and likely other related work). That's a clear win for bilinear filtering, but less so for bicubic as it doesn't work with the negative weights. There are various tricks (Bicubic Filtering in Fewer Taps is one), but not clear it's worth the lift in the bicubic case.

I should probably capture this in an issue, especially because similar considerations apply across hybrid and classic.

Adapt the sparse-strips bicubic kernel to main Vello's single 2D atlas and premultiplied image path, and add a dedicated snapshot scene that distinguishes Low, Medium, and High image sampling.

This addresses part of linebender#1545.
@waywardmonkeys waywardmonkeys force-pushed the vello-classic-bicubic-sampling branch from b300160 to ca9157d Compare April 14, 2026 08:21
@waywardmonkeys
Copy link
Copy Markdown
Collaborator Author

Agree that we should improve the premul behavior later.

Updated for the clamp change, to fix the merge conflict, and to use the new example pattern to avoid image atlas churn.

@waywardmonkeys waywardmonkeys added this pull request to the merge queue Apr 14, 2026
Merged via the queue into linebender:main with commit 1ba4668 Apr 14, 2026
17 checks passed
@waywardmonkeys waywardmonkeys deleted the vello-classic-bicubic-sampling branch April 14, 2026 08:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C-classic Applies to the classic `vello` crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants