vello_cpu: Add u8 fast path for some blend modes#1653
Conversation
37cf4ab to
d46b2c0
Compare
| } | ||
|
|
||
| impl MixExt for BlendMode { | ||
| fn mix<S: Simd>(&self, src_c: u8x32<S>, bg_c: u8x32<S>) -> Option<u8x32<S>> { |
There was a problem hiding this comment.
Nit: the trait pattern from highp/blend.rs doesn't earn its keep here — it's private, has a single impl, a single call site, and its method name shadows the BlendMode::mix field, so blend_mode.mix(src_c, bg_c) (method) and blend_mode.mix (field) look identical at a glance. A free function reads more straightforwardly:
fn try_u8_mix<S: Simd>(
blend_mode: BlendMode,
src_c: u8x32<S>,
bg_c: u8x32<S>,
) -> Option<u8x32<S>> {
// We implement the u8 fast path for blend modes that
// 1) are separable.
// 2) don't have too many divisions, since integer normalization is
// relatively expensive.
// In the future, it's possible to do further experimentation to see whether
// some more blend modes are worth doing in integer space.
Some(match blend_mode.mix {
Mix::Normal => src_c,
Mix::Multiply => Multiply::mix(src_c, bg_c),
Mix::Screen => Screen::mix(src_c, bg_c),
Mix::Overlay => Overlay::mix(src_c, bg_c),
Mix::Darken => Darken::mix(src_c, bg_c),
Mix::Lighten => Lighten::mix(src_c, bg_c),
Mix::HardLight => HardLight::mix(src_c, bg_c),
Mix::Difference => Difference::mix(src_c, bg_c),
Mix::Exclusion => Exclusion::mix(src_c, bg_c),
Mix::ColorDodge
| Mix::ColorBurn
| Mix::SoftLight
| Mix::Luminosity
| Mix::Color
| Mix::Hue
| Mix::Saturation => return None,
})
}| use vello_common::fearless_simd::*; | ||
| use vello_common::util::{Div255Ext, f32_to_u8, normalized_mul_u8x32}; | ||
|
|
||
| pub(crate) fn mix<S: Simd>(src_c: u8x32<S>, bg_c: u8x32<S>, blend_mode: BlendMode) -> u8x32<S> { |
There was a problem hiding this comment.
Would it make sense to add #[inline(always)] here?
There was a problem hiding this comment.
Yes, but I will do this in a follow-up since we need to fix this up in a couple of places anyway (see #1579).
| next_src | ||
| } else { | ||
| mix(next_src, bg_v, blend_mode) | ||
| blend::mix(next_src, bg_v, blend_mode) |
There was a problem hiding this comment.
Just thinking from a performance perspective, would it make sense to combine mix and compose into a single fused implementation? Could that improve performance even further?
One downside I can see is that you’d need implementations for every Mix * Compose combination. Still, it might make sense for a few commonly used subsets.
There was a problem hiding this comment.
It might help a bit (especially for u8), but I don't think it carries it's weight due to the large number of combinations you get (as you mentioned, not good for code size). And blending by itself is already pretty slow. I also don't think it's common at all to have a non-default blend mode + composition mode set.
d46b2c0 to
847b894
Compare
Mostly generated with codex, but I did look at it myself and make some adjustments, so I hope it's good now. Since we do have quite a few tests for blending (both manual ones as well as via COLR), not too concerned about correctness issues here.
Note that this does not address #1579 yet so it's possible this won't have much effect on AVX2. However, on NEON I'm seeing 4x-5x speedups for blending now: