sparse_strips: Add `ColorMatrix` filter by waywardmonkeys · Pull Request #1623 · linebender/vello

waywardmonkeys · 2026-05-06T04:17:18Z

This adds ColorMatrix filter support to the sparse-strip renderers.

Adds CPU fast paths for:
- identity matrix no-op
- tracking may_have_transparency during the color-matrix pass
- opaque premultiplied input pixels
- / 255.0 normalization via reciprocal multiply

Performance Notes

I profiled a temporary release-mode harness applying matrices::SEPIA repeatedly over a 1024x1024 pixmap for 20 seconds.

After the reciprocal multiply change:

Opaque input: 3610 full-pixmap passes
Translucent input: 3233 full-pixmap passes

waywardmonkeys · 2026-05-06T04:34:19Z

This was done with the assistance of Codex (GPT 5.5, xhigh).

It did some of the boring stuff but also wrote benchmarks, ran them under a profiler, disassembled code to verify certain things were happening, and carried out a number of experiments to see if some stuff would result in appreciable performance improvements or not.

waywardmonkeys · 2026-05-06T04:40:59Z

After talking to @raphlinus, Codex and I did another optimization that was amended into the 3rd commit in this series to avoid a lot of the un-premultiply / re-premultiply work with compatible matrices.

The fast path applies matrices directly in premultiplied space when they:

preserve alpha exactly
only mix RGB into RGB
have no RGB offsets

That covers GRAYSCALE and SEPIA, but not things like ALPHA_TO_BLACK.

Benchmark using the same 20s 1024x1024 SEPIA release harness:

Path	Before	After
Opaque	3610 passes	5705 passes
Translucent	3233 passes	5706 passes

waywardmonkeys · 2026-05-07T03:51:51Z

Some of this code didn't really belong in Vello, so I've gone and created a new crate in the color repo, color_operations: linebender/color#221

I've got a version of this set of changes that builds on that instead, and I'll update this PR with it later today.

grebmeg · 2026-05-20T03:32:19Z

Some of this code didn't really belong in Vello, so I've gone and created a new crate in the color repo, color_operations: linebender/color#221

I've got a version of this set of changes that builds on that instead, and I'll update this PR with it later today.

Hey @waywardmonkeys, sorry for the slow review. Now that I’ve got review capacity back, I’m ready to take a look. Before I start, just wanted to confirm whether this PR is ready for review or if you’re still planning to make more updates to it?

waywardmonkeys · 2026-05-20T07:26:05Z

@grebmeg I think most of this apart from the color math is useful to take a look at and make sure there's no objections to it overall.

The color math here is fine, but really belongs elsewhere, hence linebender/color#221. Unfortunately, Tom brought up some good points over there and maybe some people who care about this topic should have a meeting to discuss things.

If you'd like to set up such a meeting with me, please feel free to do so.

I won't need much notice about having it, as long as I'm available.

waywardmonkeys · 2026-05-20T07:27:42Z

FYI: @tomcur (See above)

grebmeg

I think this PR is in pretty good shape overall. It might be worth adding the optimizations I mentioned here so we can avoid the additional memory overhead for each filter.

There are also a few other things we haven’t wired up yet, but hopefully we’ll be able to tackle those over time as well.

Thanks for introducing the color filter support! 🎉

grebmeg · 2026-05-21T03:52:20Z

+    let t3 = textureLoad(filter_data, vec2((texel_offset + 3u) % w, (texel_offset + 3u) / w), 0);
+    let t4 = textureLoad(filter_data, vec2((texel_offset + 4u) % w, (texel_offset + 4u) / w), 0);
+    let t5 = textureLoad(filter_data, vec2((texel_offset + 5u) % w, (texel_offset + 5u) / w), 0);


With adding three more texture loads, I think it’s time to address this comment and introduce a variable-stride index for filter data. This would be similar to how sparse strips are handled in encoded_paints, where paint_idxs stores the texel offset for where the current filter starts in the texture. Would it be possible to do this, if not in this PR, then at least as a follow-up?

grebmeg · 2026-05-21T04:03:38Z

+    if is_premul_compatible_matrix(matrix) {
+        // For RGB-only, alpha-preserving matrices, applying the RGB matrix to
+        // premultiplied channels is equivalent to unpremultiply -> matrix ->
+        // premultiply. This avoids the per-pixel alpha division.
+        for pixel in pixmap.data_mut() {
+            let transformed = apply_premul_color_matrix_to_pixel(*pixel, matrix);
+            may_have_transparency |= transformed.a != 255;
+            *pixel = transformed;
+        }
+    } else {
+        for pixel in pixmap.data_mut() {
+            let transformed = apply_color_matrix_to_pixel(*pixel, matrix);
+            may_have_transparency |= transformed.a != 255;
+            *pixel = transformed;
+        }
+    }


Color matrix is a great candidate for SIMD here 😅 We haven’t wired filters up with SIMD yet, but it’d be nice to do at some point since it could bring significant performance improvements. Just sharing my thoughts, it might be worth adding a TODO somewhere so we can revisit this later.

Actually, I just realized we also haven’t wired up the faster non-spatial filter path at all, things like color matrix, opacity, flood, basically filters that only require per-pixel work.

I don’t think that blocks this PR since we already have the same situation with Flood, but I wanted to highlight it in case you’re aiming for significantly better performance. Right now those filters still go through the spatial filter path, which allocates a pixmap and does some additional work that could be avoided entirely. This could likely be handled much more efficiently here in vello_cpu fine.

grebmeg · 2026-05-21T04:24:07Z

+    )
+}
+
+fn is_premul_compatible_matrix(matrix: &[f32; 20]) -> bool {


Would it be worth inlining this and the functions above as well?

grebmeg · 2026-05-21T04:30:57Z

+#[vello_test(skip_multithreaded, width = 120, height = 80, hybrid_tolerance = 1)]
+fn filter_color_matrix_full_sepia(ctx: &mut impl Renderer) {
+    let filter = Filter::from_primitive(FilterPrimitive::ColorMatrix {
+        matrix: matrices::SEPIA,


Could you add at least one snapshot test for an alpha-affecting matrix (e.g. matrices::ALPHA_TO_BLACK) or a matrix with non-zero offsets (the 5th column in a row)?

grebmeg · 2026-05-21T04:35:27Z

+fn is_premul_compatible_matrix(matrix: &[f32; 20]) -> bool {
+    matrix[3] == 0.0
+        && matrix[4] == 0.0
+        && matrix[8] == 0.0
+        && matrix[9] == 0.0
+        && matrix[13] == 0.0
+        && matrix[14] == 0.0
+        && matrix[15] == 0.0
+        && matrix[16] == 0.0
+        && matrix[17] == 0.0
+        && matrix[18] == 1.0
+        && matrix[19] == 0.0
+}


Suggested change

fn is_premul_compatible_matrix(matrix: &[f32; 20]) -> bool {

matrix[3] == 0.0

&& matrix[4] == 0.0

&& matrix[8] == 0.0

&& matrix[9] == 0.0

&& matrix[13] == 0.0

&& matrix[14] == 0.0

&& matrix[15] == 0.0

&& matrix[16] == 0.0

&& matrix[17] == 0.0

&& matrix[18] == 1.0

&& matrix[19] == 0.0

}

fn is_premul_compatible_matrix(matrix: &[f32; 20]) -> bool {

let row_r = &matrix[0..5];

let row_g = &matrix[5..10];

let row_b = &matrix[10..15];

let row_a = &matrix[15..20];

// RGB rows must not depend on alpha (col 3) or carry a constant offset (col 4):

// applying the matrix directly to premultiplied channels then yields the same

// result as unpremultiply -> matrix -> premultiply.

let rgb_alpha_independent = row_r[3] == 0.0

&& row_r[4] == 0.0

&& row_g[3] == 0.0

&& row_g[4] == 0.0

&& row_b[3] == 0.0

&& row_b[4] == 0.0;

// And alpha must be preserved exactly.

let alpha_preserved = row_a[0] == 0.0

&& row_a[1] == 0.0

&& row_a[2] == 0.0

&& row_a[3] == 1.0

&& row_a[4] == 0.0;

rgb_alpha_independent && alpha_preserved

}

grebmeg · 2026-05-21T04:44:44Z

    return color;
 }

+fn color_matrix_row(data: GpuFilterData, base: u32, color: vec4<f32>) -> f32 {


Just wanted to pay you attention that unfortunately we had to change the way how we write wgsl shaders and avoid using structs (PRs #1619 #1620), so this's likly why you have merge conflicts.

grebmeg · 2026-05-21T05:01:14Z

The color math here is fine, but really belongs elsewhere, hence linebender/color#221. Unfortunately, Tom brought up some good points over there and maybe some people who care about this topic should have a meeting to discuss things.

If you'd like to set up such a meeting with me, please feel free to do so.

I won't need much notice about having it, as long as I'm available.

I have a feeling it might be useful to start the discussion on Zulip first, so we can all better explore the problem space together?

I think this is really important, especially since in the future we’ll likely need to consider not just sRGB. I’m not even sure how much of the discussion so far has touched on that. I also honestly don’t know how actively the color crate is being developed or what direction it’s heading in, so I’m definitely keen to hear more thoughts on it.

waywardmonkeys requested review from LaurenzV, grebmeg and laurenz-canva May 6, 2026 04:17

waywardmonkeys added 2 commits May 6, 2026 11:21

vello_cpu: support color matrix filters

9b69011

vello_hybrid: support color matrix filters

157e2d8

waywardmonkeys force-pushed the filter/colormatrix branch from a20a91a to e72325b Compare May 6, 2026 04:25

waywardmonkeys force-pushed the filter/colormatrix branch from e72325b to b01cf5f Compare May 6, 2026 04:39

vello_cpu: fast-path color matrix filters

fc26e75

waywardmonkeys force-pushed the filter/colormatrix branch from b01cf5f to fc26e75 Compare May 6, 2026 04:47

This was referenced May 6, 2026

spase_strips: Implement some FilterFunction variants with ColorMatrix #1624

Draft

vello_hybrid: fast-path premultiplied color matrices #1625

Draft

grebmeg approved these changes May 21, 2026

View reviewed changes

-fn is_premul_compatible_matrix(matrix: &[f32; 20]) -> bool {
-    matrix[3] == 0.0
-        && matrix[4] == 0.0
-        && matrix[8] == 0.0
-        && matrix[9] == 0.0
-        && matrix[13] == 0.0
-        && matrix[14] == 0.0
-        && matrix[15] == 0.0
-        && matrix[16] == 0.0
-        && matrix[17] == 0.0
-        && matrix[18] == 1.0
-        && matrix[19] == 0.0
-}
+fn is_premul_compatible_matrix(matrix: &[f32; 20]) -> bool {
+    let row_r = &matrix[0..5];
+    let row_g = &matrix[5..10];
+    let row_b = &matrix[10..15];
+    let row_a = &matrix[15..20];
+    // RGB rows must not depend on alpha (col 3) or carry a constant offset (col 4):
+    // applying the matrix directly to premultiplied channels then yields the same
+    // result as unpremultiply -> matrix -> premultiply.
+    let rgb_alpha_independent = row_r[3] == 0.0
+        && row_r[4] == 0.0
+        && row_g[3] == 0.0
+        && row_g[4] == 0.0
+        && row_b[3] == 0.0
+        && row_b[4] == 0.0;
+    // And alpha must be preserved exactly.
+    let alpha_preserved = row_a[0] == 0.0
+        && row_a[1] == 0.0
+        && row_a[2] == 0.0
+        && row_a[3] == 1.0
+        && row_a[4] == 0.0;
+    rgb_alpha_independent && alpha_preserved
+}

Conversation

waywardmonkeys commented May 6, 2026

Performance Notes

Uh oh!

waywardmonkeys commented May 6, 2026

Uh oh!

waywardmonkeys commented May 6, 2026

Uh oh!

waywardmonkeys commented May 7, 2026

Uh oh!

grebmeg commented May 20, 2026

Uh oh!

waywardmonkeys commented May 20, 2026

Uh oh!

waywardmonkeys commented May 20, 2026

Uh oh!

grebmeg left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

grebmeg May 21, 2026

Choose a reason for hiding this comment

Uh oh!

grebmeg May 21, 2026

Choose a reason for hiding this comment

Uh oh!

grebmeg May 21, 2026

Choose a reason for hiding this comment

Uh oh!

grebmeg May 21, 2026

Choose a reason for hiding this comment

Uh oh!

grebmeg May 21, 2026

Choose a reason for hiding this comment

Uh oh!

grebmeg May 21, 2026

Choose a reason for hiding this comment

Uh oh!

grebmeg May 21, 2026

Choose a reason for hiding this comment

Uh oh!

grebmeg commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

grebmeg left a comment •

edited

Loading