Skip to content

Skip-ahead for UTF8 encoding#289

Merged
nickva merged 1 commit into
masterfrom
encode-utf8-scan-through
Apr 24, 2026
Merged

Skip-ahead for UTF8 encoding#289
nickva merged 1 commit into
masterfrom
encode-utf8-scan-through

Conversation

@nickva

@nickva nickva commented Apr 23, 2026

Copy link
Copy Markdown
Collaborator

Just like we do it for ASCII-only input, this should help unicode-heavy input.

Turned out better than expected:

===== With input Encode Twitter =====
Name                                       ips        average  deviation         median         99th %
jiffy (encode-utf8-scan-through)        306.51        3.26 ms    +/-15.30%        3.21 ms        4.31 ms
jiffy (master)                          276.38        3.62 ms    +/-13.79%        3.54 ms        4.58 ms
Comparison:
jiffy (encode-utf8-scan-through)        306.51
jiffy (master)                          276.38 - 1.11x slower +0.36 ms
===== With input Encode UTF-8 escaped =====
Name                                       ips        average  deviation         median         99th %
jiffy (encode-utf8-scan-through)       22.41 K       44.62 us     +/-1.54%       44.49 us       46.63 us
jiffy (master)                         10.12 K       98.78 us     +/-0.75%       98.68 us      100.95 us
Comparison:
jiffy (encode-utf8-scan-through)       22.41 K
jiffy (master)                         10.12 K - 2.21x slower +54.16 us
===== With input Encode UTF-8 unescaped =====
Name                                       ips        average  deviation         median         99th %
jiffy (encode-utf8-scan-through)       23.35 K       42.83 us     +/-1.70%       42.68 us       45.00 us
jiffy (master)                         10.59 K       94.42 us     +/-0.72%       94.30 us       96.42 us
Comparison:
jiffy (encode-utf8-scan-through)       23.35 K
jiffy (master)                         10.59 K - 2.20x slower +51.60 us

@nickva nickva force-pushed the encode-utf8-scan-through branch from b0a0be1 to d82dedb Compare April 24, 2026 03:45

@davisp davisp left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 Nice find as always!

Comment thread c_src/encoder.c Outdated
Comment thread c_src/jiffy_simd.h Outdated
Just like we do it for ASCII-only input, this should help unicode-heavy input.
@nickva nickva force-pushed the encode-utf8-scan-through branch from d82dedb to 57b24a3 Compare April 24, 2026 18:32
@nickva nickva merged commit 81f1918 into master Apr 24, 2026
22 checks passed
@nickva nickva deleted the encode-utf8-scan-through branch April 24, 2026 18:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants