Skip to content

sparse_strips: Add ColorMatrix filter#1623

Open
waywardmonkeys wants to merge 3 commits into
linebender:mainfrom
waywardmonkeys:filter/colormatrix
Open

sparse_strips: Add ColorMatrix filter#1623
waywardmonkeys wants to merge 3 commits into
linebender:mainfrom
waywardmonkeys:filter/colormatrix

Conversation

@waywardmonkeys
Copy link
Copy Markdown
Collaborator

This adds ColorMatrix filter support to the sparse-strip renderers.

  • Adds CPU fast paths for:
    • identity matrix no-op
    • tracking may_have_transparency during the color-matrix pass
    • opaque premultiplied input pixels
    • / 255.0 normalization via reciprocal multiply

Performance Notes

I profiled a temporary release-mode harness applying matrices::SEPIA repeatedly over a 1024x1024 pixmap for 20 seconds.

After the reciprocal multiply change:

  • Opaque input: 3610 full-pixmap passes
  • Translucent input: 3233 full-pixmap passes

@waywardmonkeys
Copy link
Copy Markdown
Collaborator Author

This was done with the assistance of Codex (GPT 5.5, xhigh).

It did some of the boring stuff but also wrote benchmarks, ran them under a profiler, disassembled code to verify certain things were happening, and carried out a number of experiments to see if some stuff would result in appreciable performance improvements or not.

@waywardmonkeys
Copy link
Copy Markdown
Collaborator Author

After talking to @raphlinus, Codex and I did another optimization that was amended into the 3rd commit in this series to avoid a lot of the un-premultiply / re-premultiply work with compatible matrices.

The fast path applies matrices directly in premultiplied space when they:

  • preserve alpha exactly
  • only mix RGB into RGB
  • have no RGB offsets

That covers GRAYSCALE and SEPIA, but not things like ALPHA_TO_BLACK.

Benchmark using the same 20s 1024x1024 SEPIA release harness:

Path Before After
Opaque 3610 passes 5705 passes
Translucent 3233 passes 5706 passes

@waywardmonkeys
Copy link
Copy Markdown
Collaborator Author

Some of this code didn't really belong in Vello, so I've gone and created a new crate in the color repo, color_operations: linebender/color#221

I've got a version of this set of changes that builds on that instead, and I'll update this PR with it later today.

@grebmeg
Copy link
Copy Markdown
Collaborator

grebmeg commented May 20, 2026

Some of this code didn't really belong in Vello, so I've gone and created a new crate in the color repo, color_operations: linebender/color#221

I've got a version of this set of changes that builds on that instead, and I'll update this PR with it later today.

Hey @waywardmonkeys, sorry for the slow review. Now that I’ve got review capacity back, I’m ready to take a look. Before I start, just wanted to confirm whether this PR is ready for review or if you’re still planning to make more updates to it?

@waywardmonkeys
Copy link
Copy Markdown
Collaborator Author

@grebmeg I think most of this apart from the color math is useful to take a look at and make sure there's no objections to it overall.

The color math here is fine, but really belongs elsewhere, hence linebender/color#221. Unfortunately, Tom brought up some good points over there and maybe some people who care about this topic should have a meeting to discuss things.

If you'd like to set up such a meeting with me, please feel free to do so.

I won't need much notice about having it, as long as I'm available.

@waywardmonkeys
Copy link
Copy Markdown
Collaborator Author

FYI: @tomcur (See above)

Copy link
Copy Markdown
Collaborator

@grebmeg grebmeg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR is in pretty good shape overall. It might be worth adding the optimizations I mentioned here so we can avoid the additional memory overhead for each filter.

There are also a few other things we haven’t wired up yet, but hopefully we’ll be able to tackle those over time as well.

Thanks for introducing the color filter support! 🎉

Comment on lines +119 to +121
let t3 = textureLoad(filter_data, vec2((texel_offset + 3u) % w, (texel_offset + 3u) / w), 0);
let t4 = textureLoad(filter_data, vec2((texel_offset + 4u) % w, (texel_offset + 4u) / w), 0);
let t5 = textureLoad(filter_data, vec2((texel_offset + 5u) % w, (texel_offset + 5u) / w), 0);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With adding three more texture loads, I think it’s time to address this comment and introduce a variable-stride index for filter data. This would be similar to how sparse strips are handled in encoded_paints, where paint_idxs stores the texel offset for where the current filter starts in the texture. Would it be possible to do this, if not in this PR, then at least as a follow-up?

Comment on lines +32 to +47
if is_premul_compatible_matrix(matrix) {
// For RGB-only, alpha-preserving matrices, applying the RGB matrix to
// premultiplied channels is equivalent to unpremultiply -> matrix ->
// premultiply. This avoids the per-pixel alpha division.
for pixel in pixmap.data_mut() {
let transformed = apply_premul_color_matrix_to_pixel(*pixel, matrix);
may_have_transparency |= transformed.a != 255;
*pixel = transformed;
}
} else {
for pixel in pixmap.data_mut() {
let transformed = apply_color_matrix_to_pixel(*pixel, matrix);
may_have_transparency |= transformed.a != 255;
*pixel = transformed;
}
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Color matrix is a great candidate for SIMD here 😅 We haven’t wired filters up with SIMD yet, but it’d be nice to do at some point since it could bring significant performance improvements. Just sharing my thoughts, it might be worth adding a TODO somewhere so we can revisit this later.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I just realized we also haven’t wired up the faster non-spatial filter path at all, things like color matrix, opacity, flood, basically filters that only require per-pixel work.

I don’t think that blocks this PR since we already have the same situation with Flood, but I wanted to highlight it in case you’re aiming for significantly better performance. Right now those filters still go through the spatial filter path, which allocates a pixmap and does some additional work that could be avoided entirely. This could likely be handled much more efficiently here in vello_cpu fine.

)
}

fn is_premul_compatible_matrix(matrix: &[f32; 20]) -> bool {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be worth inlining this and the functions above as well?

#[vello_test(skip_multithreaded, width = 120, height = 80, hybrid_tolerance = 1)]
fn filter_color_matrix_full_sepia(ctx: &mut impl Renderer) {
let filter = Filter::from_primitive(FilterPrimitive::ColorMatrix {
matrix: matrices::SEPIA,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add at least one snapshot test for an alpha-affecting matrix (e.g. matrices::ALPHA_TO_BLACK) or a matrix with non-zero offsets (the 5th column in a row)?

Comment on lines +93 to +105
fn is_premul_compatible_matrix(matrix: &[f32; 20]) -> bool {
matrix[3] == 0.0
&& matrix[4] == 0.0
&& matrix[8] == 0.0
&& matrix[9] == 0.0
&& matrix[13] == 0.0
&& matrix[14] == 0.0
&& matrix[15] == 0.0
&& matrix[16] == 0.0
&& matrix[17] == 0.0
&& matrix[18] == 1.0
&& matrix[19] == 0.0
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn is_premul_compatible_matrix(matrix: &[f32; 20]) -> bool {
matrix[3] == 0.0
&& matrix[4] == 0.0
&& matrix[8] == 0.0
&& matrix[9] == 0.0
&& matrix[13] == 0.0
&& matrix[14] == 0.0
&& matrix[15] == 0.0
&& matrix[16] == 0.0
&& matrix[17] == 0.0
&& matrix[18] == 1.0
&& matrix[19] == 0.0
}
fn is_premul_compatible_matrix(matrix: &[f32; 20]) -> bool {
let row_r = &matrix[0..5];
let row_g = &matrix[5..10];
let row_b = &matrix[10..15];
let row_a = &matrix[15..20];
// RGB rows must not depend on alpha (col 3) or carry a constant offset (col 4):
// applying the matrix directly to premultiplied channels then yields the same
// result as unpremultiply -> matrix -> premultiply.
let rgb_alpha_independent = row_r[3] == 0.0
&& row_r[4] == 0.0
&& row_g[3] == 0.0
&& row_g[4] == 0.0
&& row_b[3] == 0.0
&& row_b[4] == 0.0;
// And alpha must be preserved exactly.
let alpha_preserved = row_a[0] == 0.0
&& row_a[1] == 0.0
&& row_a[2] == 0.0
&& row_a[3] == 1.0
&& row_a[4] == 0.0;
rgb_alpha_independent && alpha_preserved
}

return color;
}

fn color_matrix_row(data: GpuFilterData, base: u32, color: vec4<f32>) -> f32 {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to pay you attention that unfortunately we had to change the way how we write wgsl shaders and avoid using structs (PRs #1619 #1620), so this's likly why you have merge conflicts.

@grebmeg
Copy link
Copy Markdown
Collaborator

grebmeg commented May 21, 2026

The color math here is fine, but really belongs elsewhere, hence linebender/color#221. Unfortunately, Tom brought up some good points over there and maybe some people who care about this topic should have a meeting to discuss things.

If you'd like to set up such a meeting with me, please feel free to do so.

I won't need much notice about having it, as long as I'm available.

I have a feeling it might be useful to start the discussion on Zulip first, so we can all better explore the problem space together?

I think this is really important, especially since in the future we’ll likely need to consider not just sRGB. I’m not even sure how much of the discussion so far has touched on that. I also honestly don’t know how actively the color crate is being developed or what direction it’s heading in, so I’m definitely keen to hear more thoughts on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants