Add vectorisation to some suspicious blend related missing parts by DJMcNab · Pull Request #1579 · linebender/vello

DJMcNab · 2026-04-15T06:42:32Z

In writing #1578, I stumbled upon these missing instances.

I'm aware of the comment at

vello/sparse_strips/vello_cpu/src/fine/highp/mod.rs

Lines 339 to 342 in 6d23bf5

    
           // IMPORTANT: The inlining attributes (#[inline(always)], #[inline(never)]) in this 
        
           // module have been carefully tuned through benchmarking. Changing them can cause 
        
           // significant performance regressions.

, and so haven't been aggressive as I might otherwise have expected to be.

I probably won't have time to drive this further (i.e. do the benchmarking myself), so feel free to push to this as appropriate.

LaurenzV · 2026-04-15T07:03:15Z

Back then, IIRC I purposefully decided not to inline blending-related methods to reduce potential code bloat, and since I assumed that blending itself is so slow that inlining itself won't give us that much benefit. Definitely open to changing this, but yes we should try to 1) check the impact on binary size and 2) see whether it actually yields any performance improvements.

Though the vectorize calls definitely make sense! I presume especially on x86 this should yield improvements.

DJMcNab · 2026-04-15T21:31:34Z

All of the methods I changed (in theory) wouldn't be optimised properly/at all on x86.
That is, each change needs addressing, but the choice between vectorise and inline(always) is one which needs to be made. I made my choices based on whether the method was already in a vector context.
It would be reasonable to not inline these, and instead insert a vectorise call - i'm happy to make that change, although I still won't have time for benchmarking.

LaurenzV · 2026-04-16T10:53:05Z

Will take a look when I find the time, thanks!

Mostly generated with codex, but I did look at it myself and make some adjustments, so I hope it's good now. Since we do have quite a few tests for blending (both manual ones as well as via COLR), not too concerned about correctness issues here. Note that this does not address linebender#1579 yet so it's possible this won't have much effect on AVX2. However, on NEON I'm seeing 4x-5x speedups for blending now: ``` fine/blend/normal_u8_neon time: [40.096 ns 40.297 ns 40.521 ns] change: [-1.1748% +0.8265% +3.1392%] (p = 0.45 > 0.05) No change in performance detected. Found 8 outliers among 100 measurements (8.00%) 3 (3.00%) high mild 5 (5.00%) high severe fine/blend/multiply_u8_neon time: [976.39 ns 978.78 ns 981.34 ns] change: [-80.790% -80.728% -80.664%] (p = 0.00 < 0.05) Performance has improved. fine/blend/screen_u8_neon time: [1.0140 µs 1.0167 µs 1.0199 µs] change: [-80.496% -80.407% -80.320%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild fine/blend/overlay_u8_neon time: [1.3631 µs 1.3667 µs 1.3701 µs] change: [-74.682% -74.592% -74.499%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe fine/blend/darken_u8_neon time: [1.1359 µs 1.1385 µs 1.1412 µs] change: [-77.273% -77.197% -77.125%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild fine/blend/lighten_u8_neon time: [1.1535 µs 1.1557 µs 1.1582 µs] change: [-77.013% -76.936% -76.857%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe fine/blend/color_dodge_u8_neon time: [5.6951 µs 5.7070 µs 5.7195 µs] change: [+1.6232% +1.9789% +2.3529%] (p = 0.00 < 0.05) Performance has regressed. Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) high mild 1 (1.00%) high severe fine/blend/color_burn_u8_neon time: [5.6208 µs 5.6334 µs 5.6466 µs] change: [+1.5668% +1.9000% +2.2646%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild fine/blend/hard_light_u8_neon time: [1.3581 µs 1.3602 µs 1.3626 µs] change: [-75.426% -75.345% -75.265%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 4 (4.00%) high mild 1 (1.00%) high severe fine/blend/soft_light_u8_neon time: [6.0497 µs 6.0630 µs 6.0768 µs] change: [+1.2844% +1.7849% +2.2700%] (p = 0.00 < 0.05) Performance has regressed. Found 5 outliers among 100 measurements (5.00%) 3 (3.00%) high mild 2 (2.00%) high severe fine/blend/difference_u8_neon time: [1.2694 µs 1.2720 µs 1.2747 µs] change: [-75.605% -75.514% -75.423%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) high mild 2 (2.00%) high severe fine/blend/exclusion_u8_neon time: [1.0596 µs 1.0614 µs 1.0634 µs] change: [-80.316% -80.250% -80.184%] (p = 0.00 < 0.05) Performance has improved. Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) high mild 3 (3.00%) high severe fine/blend/hue_u8_neon time: [8.5128 µs 8.5387 µs 8.5659 µs] change: [+1.5041% +1.9534% +2.4143%] (p = 0.00 < 0.05) Performance has regressed. Found 5 outliers among 100 measurements (5.00%) 5 (5.00%) high mild fine/blend/saturation_u8_neon time: [8.5693 µs 8.6052 µs 8.6431 µs] change: [+1.7844% +2.2460% +2.6872%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe fine/blend/color_u8_neon time: [7.5338 µs 7.5591 µs 7.5869 µs] change: [+0.7948% +1.2293% +1.6653%] (p = 0.00 < 0.05) Change within noise threshold. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild fine/blend/luminosity_u8_neon time: [7.5325 µs 7.5531 µs 7.5772 µs] change: [+0.8659% +1.3864% +1.8684%] (p = 0.00 < 0.05) Change within noise threshold. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe ```

Add vectorisation to some suspicious blend related missing parts

6d23bf5

DJMcNab requested a review from LaurenzV April 15, 2026 06:42

LaurenzV mentioned this pull request May 16, 2026

vello_cpu: Add u8 fast path for some blend modes #1653

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add vectorisation to some suspicious blend related missing parts#1579

Add vectorisation to some suspicious blend related missing parts#1579
DJMcNab wants to merge 1 commit into
linebender:mainfrom
DJMcNab:fix-blend-vectorisation

DJMcNab commented Apr 15, 2026

Uh oh!

LaurenzV commented Apr 15, 2026 •

edited

Loading

Uh oh!

DJMcNab commented Apr 15, 2026

Uh oh!

LaurenzV commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	// IMPORTANT: The inlining attributes (#[inline(always)], #[inline(never)]) in this
	// module have been carefully tuned through benchmarking. Changing them can cause
	// significant performance regressions.

Conversation

DJMcNab commented Apr 15, 2026

Uh oh!

LaurenzV commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DJMcNab commented Apr 15, 2026

Uh oh!

LaurenzV commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

LaurenzV commented Apr 15, 2026 •

edited

Loading