shader: implement bicubic image sampling#1557
Conversation
|
I've skimmed it and it looks pretty good overall; I hope to do a full review shortly. One question I have: is there a particular reason there isn't a corresponding change to the CPU shaders? |
Don't let this nerd snipe us into implementing this ... but:
#[expect(unused, reason = "Draft code as textures not wired up")]And there's no support for |
raphlinus
left a comment
There was a problem hiding this comment.
Overall looks good. Inside see a potential optimisation (not huge in the grand scheme of things) and some thoughts how I'd want to see handling of premultiplied alpha evolve.
| // Interpolate in y direction | ||
| let result = cy.x * row0 + cy.y * row1 + cy.z * row2 + cy.w * row3; | ||
|
|
||
| // Clamp each component to [0,1] and ensure color components don't exceed alpha |
There was a problem hiding this comment.
It would be better to compute a = clamp(result.a, 0.0, 1.0) then clamp each of r, g, b to a.
There was a problem hiding this comment.
This is also a correctness fix, so the new version is:
// Clamp alpha first, then clamp premultiplied color channels against it.
let a = clamp(result.a, 0.0, 1.0);
return vec4<f32>(clamp(result.rgb, vec3(0.0), vec3(a)), a);| let cy = cubic_weights(frac_coords.y); | ||
|
|
||
| // Sample 4x4 grid around coords | ||
| let s00 = maybe_premul_alpha(textureLoad(image_atlas, vec2<i32>(clamp(coords + vec2(-1.5, -1.5), atlas_offset, atlas_max)), 0), alpha_type); |
There was a problem hiding this comment.
Outside the scope of this PR, but looking at this, I think I would handle "maybe premul" a bit differently. The atlas would always be premultiplied, so the upload of the image into the atlas would do premultiplication when needed. That would get rid of the branch here, which is especially significant for bicubic because there are so many of them. That said, I see the complication – uploads are currently done with queue.write_texture, which doesn't have any mechanism to perform the premultiplication, so there's an additional pass that would be required in the separate alpha case. I'm thinking about ways to reduce the additional memory traffic, and what comes to mind is uploading the data from the CPU into a staging texture (or buffer; I'm not sure it matters much), then using either a compute shader or a draw call to read from the staging texture and write the premultiplied pixels to the atlas.
More important than the branch, storing the atlas as always-premultiplied opens the door to using textureSample rather than textureLoad, as is being explored in #1547 (and likely other related work). That's a clear win for bilinear filtering, but less so for bicubic as it doesn't work with the negative weights. There are various tricks (Bicubic Filtering in Fewer Taps is one), but not clear it's worth the lift in the bicubic case.
I should probably capture this in an issue, especially because similar considerations apply across hybrid and classic.
Adapt the sparse-strips bicubic kernel to main Vello's single 2D atlas and premultiplied image path, and add a dedicated snapshot scene that distinguishes Low, Medium, and High image sampling. This addresses part of linebender#1545.
b300160 to
ca9157d
Compare
|
Agree that we should improve the premul behavior later. Updated for the clamp change, to fix the merge conflict, and to use the new example pattern to avoid image atlas churn. |
Adapt the sparse-strips bicubic kernel to main Vello's single 2D atlas and premultiplied image path, and add a dedicated snapshot scene that distinguishes Low, Medium, and High image sampling.
This addresses part of #1545.