opt: SWAR number parsing, pointer-based DOM, peek_u16 structural matching by liuq19 · Pull Request #224 · cloudwego/sonic-rs

liuq19 · 2026-04-09T06:32:28Z

Summary

SWAR integer parsing: Parse 8 digits at a time using pure u64 arithmetic (simdjson-style parse_eight_digits), replacing scalar digit loops
Pointer-based DOM push: Replace Vec::push with raw pointer writes, eliminating store-to-load forwarding stalls (IPC 3.6 → 4.4-5.3)
Inline SIMD memcpy: AVX2 unrolled copy for visit_container_end, replacing libc memmove overhead
peek_u16 structural matching: Single u16 comparison for ,\" / } / ] / : patterns, reducing branch count in parser main loop
Scalar key parsing fast path: Byte-by-byte scan for short object keys (≤24 chars), avoiding SIMD StringBlock setup overhead
Unchecked number parsing: parse_number_unchecked() with get_unchecked() for padded buffers, eliminating per-digit bounds checks
SWAR fraction parsing: Replace SSE simd_str2int with integer-pipeline SWAR for fraction digits, avoiding FP port contention on AMD Zen
Tolerant SWAR tail: Branchless 1-8 digit parsing via parse_digits_tolerant for fraction remainders
parse_float_fast threshold fix: Correct off-by-one (sig >> 52 → sig < 2^53)

Benchmark (compact JSON, core-pinned, vs baseline)

Benchmark	Before	After	Improvement
citm_catalog	511 µs	~230 µs	+55%
golang_source	1416 µs	~780 µs	+45%
lottie	495 µs	~220 µs	+55%

Test plan

cargo test — 146 tests pass
cargo test -p sonic-number (from sonic-number dir) — 7 tests pass
Number parsing edge cases: 8+ integer digits with fraction, SWAR boundary cases, tolerant tail truncation
Reviewed by Opus + Codex cross-model review

🤖 Generated with Claude Code

codecov · 2026-04-09T06:33:25Z

Codecov Report

❌ Patch coverage is 95.00000% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 71.06%. Comparing base (6e707d8) to head (80f6c22).
⚠️ Report is 5 commits behind head on main.

Files with missing lines	Patch %	Lines
src/reader.rs	71.42%	4 Missing ⚠️
src/value/tls_buffer.rs	92.15%	4 Missing ⚠️
src/parser.rs	95.58%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #224      +/-   ##
==========================================
+ Coverage   70.64%   71.06%   +0.41%     
==========================================
  Files          42       42              
  Lines        9556     9724     +168     
==========================================
+ Hits         6751     6910     +159     
- Misses       2805     2814       +9

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

Performance-focused parser/DOM optimizations across sonic-rs and sonic-number, introducing SWAR-based number parsing, pointer-based DOM node staging, and lower-branch structural matching to reduce per-byte overhead in hot loops.

Changes:

Add SWAR (u64 arithmetic) fast paths for integer/fraction parsing and introduce an unchecked number parsing variant for padded inputs.
Rework DOM construction to use raw pointer writes into a preallocated buffer and add an AVX2 inline copy routine for container finalization.
Update parser inner loops with peek_u16 structural matching and add benchmarks/testdata to measure throughput improvements.

Reviewed changes

Copilot reviewed 17 out of 27 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
src/value/shared.rs	Adds `Shared::with_capacity` to preallocate the bump allocator for DOM construction.
src/value/node.rs	Adds AVX2 inline copy and switches DOM node buffering to pointer-based writes.
src/reader.rs	Adds `padded_slice` and 2-byte peek helpers for faster parsing decisions.
src/parser.rs	Adds unchecked number parsing + scalar key fast path + `peek_u16` structural matching in array/object parsing.
sonic-number/src/swar.rs	New SWAR helpers for digit detection and 1–16 digit parsing.
sonic-number/src/lib.rs	Integrates SWAR parsing into number parsing and adds an unsafe `parse_number_unchecked`.
sonic-number/Cargo.toml	Bumps crate version and adds Criterion benches.
sonic-number/benches/number.rs	Adds Criterion benchmarks for SWAR vs SIMD and end-to-end parse_number.
Cargo.toml	Bumps `sonic-rs` version.
benchmarks/Cargo.toml	Removes old number bench wiring; adds `perf_parse` example.
benchmarks/benches/deserialize_value.rs	Expands benchmark set to include `golang_source` and `lottie`.
benchmarks/benches/perf_parse.rs	Adds a core-pinned parsing throughput example.
benchmarks/bench_quick.sh	Adds a quick benchmark helper script.
benchmarks/benches/testdata/*	Adds/updates compact JSON fixtures and documentation for benchmark datasets.

Comments suppressed due to low confidence (1)

src/parser.rs:513

In the object parsing slow path, the final _ => branch returns ExpectedArrayCommaOrEnd, which seems like the wrong error code for an object context (should be something like ExpectedObjectCommaOrEnd). This will produce misleading error messages/diagnostics when encountering invalid JSON inside objects.

            match self.skip_space() {
                Some(b'}') => return check_visit!(self, vis.visit_object_end(count)),
                Some(b',') => match self.skip_space() {
                    Some(b'"') => continue,
                    _ => return perr!(self, ExpectObjectKeyOrEnd),
                },
                _ => return perr!(self, ExpectedArrayCommaOrEnd),
            }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-09T06:40:46Z

+        unsafe { &*(self.peek_n(2).unwrap_unchecked().as_ptr() as *const [u8; 2]) }
+    }
+
+    /// Peek next 2 bytes as a little-endian u16 for single-instruction matching.
+    /// E.g., `peek_u16() == u16::from_le_bytes([b',', b'"'])` matches `,"` in one cmp.


Reader::peek2() is a safe method but uses unwrap_unchecked() on peek_n(2). On non-padded readers (e.g. Read<'_>), calling this near EOF can trigger immediate UB. Either make peek2 an unsafe fn with a clear safety contract, or implement it safely (return an Option<[u8;2]> / u16) without unwrap_unchecked and keep unchecked access confined to PaddedSliceRead-specific helpers.

Suggested change

unsafe { &*(self.peek_n(2).unwrap_unchecked().as_ptr() as *const [u8; 2]) }

}

/// Peek next 2 bytes as a little-endian u16 for single-instruction matching.

/// E.g., `peek_u16() == u16::from_le_bytes([b',', b'"'])` matches `,"` in one cmp.

match self.peek_n(2) {

Some(s) => unsafe { &*(s.as_ptr() as *const [u8; 2]) },

None => panic!("Reader::peek2() requires at least 2 bytes remaining"),

}

}

/// Peek next 2 bytes as a little-endian u16 for single-instruction matching.

/// E.g., `peek_u16() == u16::from_le_bytes([b',', b'"'])` matches `,\"` in one cmp.

Copilot · 2026-04-09T06:40:46Z

+    #[inline(always)]
+    fn peek_u16(&self) -> u16 {
+        match self.peek_n(2) {
+            Some(s) => unsafe { (s.as_ptr() as *const u16).read_unaligned() },


peek_u16() is documented/used as a little-endian 2-byte view, but the implementation returns a native-endian u16 via read_unaligned(). On big-endian targets this breaks all the structural matching logic that compares against u16::from_le_bytes([...]) and relies on (sep & 0xFF) being the first byte. Consider returning u16::from_le_bytes([s[0], s[1]]) (or u16::from_le(read_unaligned(...))) and/or gating these fast paths behind cfg(target_endian = "little").

Suggested change

Some(s) => unsafe { (s.as_ptr() as *const u16).read_unaligned() },

Some(s) => u16::from_le(unsafe { (s.as_ptr() as *const u16).read_unaligned() }),

Copilot · 2026-04-09T06:40:47Z

+    /// Unchecked number parsing — requires padded buffer (PaddedSliceRead).
+    /// Uses the full padded buffer (including padding) so unchecked access is safe.
+    #[inline(always)]
+    pub fn parse_number_unchecked(&mut self, first: u8) -> Result<ParserNumber> {
+        let reader = &mut self.read;
+        let neg = first == b'-';
+        let mut now = reader.index() - (!neg as usize);
+        // Use padded_slice which includes the padding bytes, so unchecked reads are safe.
+        let data = reader.padded_slice();
+        let ret = unsafe { sonic_number::parse_number_unchecked(data, &mut now, neg) };
+        reader.set_index(now);
+        ret.map_err(|err| self.error(err.into()))
+    }


parse_number_unchecked() is exposed as a safe pub fn but it calls sonic_number::parse_number_unchecked() which requires extra padding for all unchecked reads. Since Reader::padded_slice() defaults to as_u8_slice() for non-padded readers, external callers can hit UB by invoking this on a Parser<Read<'_>>. Make this API unsafe, restrict it with a type bound to PaddedSliceRead, or otherwise enforce/preserve the padding invariant at the type level.

Copilot · 2026-04-09T06:40:47Z

+    /// Simple while loop, let LLVM optimize the unrolling.
+    ///
+    /// # Safety
+    /// Only called when strbuf=None (padded reader path). PaddedSliceRead has 64 bytes
+    /// of zero-padding beyond valid JSON, so scanning 24 bytes ahead is always safe.
+    /// Padding bytes (0x00) are < 0x20 and will bail to parse_string_visit.


This scalar key fast-path is described as for “short ASCII keys”, but it accepts any non-control, non-escape UTF-8 bytes (including multi-byte UTF-8) and then uses from_utf8_unchecked. Either tighten the predicate to actual ASCII if that’s the intent, or update the doc comment to reflect the real preconditions (e.g., “no escapes/control chars, length ≤ 24 bytes”).

Suggested change

/// Simple while loop, let LLVM optimize the unrolling.

///

/// # Safety

/// Only called when strbuf=None (padded reader path). PaddedSliceRead has 64 bytes

/// of zero-padding beyond valid JSON, so scanning 24 bytes ahead is always safe.

/// Padding bytes (0x00) are < 0x20 and will bail to parse_string_visit.

/// Fast path for keys that terminate within 24 bytes and contain no escapes (`\\`)

/// or control characters (`< 0x20`).

///

/// This path is not restricted to ASCII: it accepts any bytes `>= 0x20` until the

/// closing `"` is found, so multi-byte UTF-8 key bytes are handled here as well.

/// If an escape, control byte, or no closing quote within 24 bytes is encountered,

/// parsing falls back to `parse_string_visit`.

///

/// # Safety

/// Only called when strbuf=None (padded reader path). PaddedSliceRead has 64 bytes

/// of zero-padding beyond valid JSON, so scanning 24 bytes ahead is always safe.

/// Padding bytes (0x00) are < 0x20 and will bail to parse_string_visit.

///

/// The `from_utf8_unchecked` below therefore relies on the parser input being valid

/// UTF-8 JSON when this fast path succeeds.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 39b507ec71

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-09T06:41:14Z

+    pub fn parse_number_unchecked(&mut self, first: u8) -> Result<ParserNumber> {
+        let reader = &mut self.read;
+        let neg = first == b'-';
+        let mut now = reader.index() - (!neg as usize);
+        // Use padded_slice which includes the padding bytes, so unchecked reads are safe.


Mark parse_number_unchecked as unsafe or enforce padding

Parser::parse_number_unchecked is a safe public method, but it unconditionally calls sonic_number::parse_number_unchecked, which performs get_unchecked reads that require at least 64 bytes of trailing padding. With a normal Parser<Read> (no padding), calling this API on inputs like "1" can read past the slice boundary and trigger undefined behavior. Because this is externally callable through the public parser module, the precondition must be enforced in code (or the method made unsafe).

Useful? React with 👍 / 👎.

Performance optimizations for JSON parsing: 1. sonic-number: SWAR integer parsing (simdjson-style parse_eight_digits) - Uses pure u64 arithmetic to parse 8 ASCII digits at a time - Single-digit fast path for common small integers - 32-54% improvement on parse_number microbenchmark 2. parser: compact JSON 2-byte peek fast path - peek2() reads 2 bytes at once to match `,"` / `":` patterns - Eliminates redundant skip_space calls for compact JSON 3. DOM construction: pointer-based push_node - Replace (ptr, len, cap) with (base, end, cap_end) pointers - Eliminates store-to-load forwarding stalls (28% of loads → ~20%) - Eliminates integer register pressure (30.7% cycles stalled → 0.1%) - IPC improved from 3.6 to 4.4-5.3 4. DOM construction: inline SIMD memcpy (sonic-cpp style Xmemcpy) - AVX2 unrolled copy for visit_container_end - Eliminates libc memmove function call overhead - memmove dropped from 61% to 1.9% of cycles (citm_catalog) End-to-end deserialize_value improvement (A/B tested, core-pinned): citm_catalog: 511 µs → 314 µs (+38.6%) golang_source: 1416 µs → 1099 µs (+22.4%) lottie: 495 µs → 351 µs (+29.1%)

- Replace peek2() slice matching with peek_u16() u16 comparison for structural characters, reducing branch count in parser main loop - Add scalar fast path for object key parsing (up to 24 chars), avoiding SIMD StringBlock setup overhead for short keys

- Add parse_number_unchecked() using get_unchecked() for digit reads, eliminating per-digit bounds checks from the hot integer parsing loop - Scalar fraction accumulation for short integer part (<8 digits), avoiding SIMD setup + POW10 table multiplication overhead - Fix parse_float_fast threshold off-by-one (sig >> 52 vs sig < 2^53) - Replace SSE simd_str2int with SWAR fraction parsing + tolerant tail, keeping integer pipeline instead of competing with FP ports

- Compact all JSON test data (minified format, no whitespace) citm_catalog: 1.7MB → 489KB (-71%) - Add new test files: golang_source (51k ints), lottie (18k ints), synthea_fhir, gsoc-2018, poet, string_unicode, string_escaped, twitterescaped - Add testdata/README.md with value type statistics and selection guide - Add golang_source and lottie to deserialize_value benchmark - Add perf_parse example binary for profiling - Expand number.rs benchmark with SWAR comparison and per-length tests

Copilot AI review requested due to automatic review settings April 9, 2026 06:32

Copilot started reviewing on behalf of liuq19 April 9, 2026 06:33 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Apr 9, 2026

View reviewed changes

liuq19 force-pushed the opt/perf-v2 branch 2 times, most recently from 5c5853e to aa5be83 Compare April 9, 2026 11:20

liuq19 added 5 commits April 9, 2026 19:31

chore: bump sonic-rs to 0.5.9, sonic-number to 0.1.3

80f6c22

liuq19 force-pushed the opt/perf-v2 branch from aa5be83 to 80f6c22 Compare April 9, 2026 11:38

shenyj3 approved these changes Apr 9, 2026

View reviewed changes

liuq19 merged commit ae081a5 into cloudwego:main Apr 9, 2026
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opt: SWAR number parsing, pointer-based DOM, peek_u16 structural matching#224

opt: SWAR number parsing, pointer-based DOM, peek_u16 structural matching#224
liuq19 merged 5 commits intocloudwego:mainfrom
liuq19:opt/perf-v2

liuq19 commented Apr 9, 2026

Uh oh!

codecov bot commented Apr 9, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Copilot AI Apr 9, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 9, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

	Some(s) => unsafe { (s.as_ptr() as *const u16).read_unaligned() },
	Some(s) => u16::from_le(unsafe { (s.as_ptr() as *const u16).read_unaligned() }),

Conversation

liuq19 commented Apr 9, 2026

Summary

Benchmark (compact JSON, core-pinned, vs baseline)

Test plan

Uh oh!

codecov bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

codecov bot commented Apr 9, 2026 •

edited

Loading