opt: SWAR number parsing, pointer-based DOM, peek_u16 structural matching#224
opt: SWAR number parsing, pointer-based DOM, peek_u16 structural matching#224liuq19 merged 5 commits intocloudwego:mainfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #224 +/- ##
==========================================
+ Coverage 70.64% 71.06% +0.41%
==========================================
Files 42 42
Lines 9556 9724 +168
==========================================
+ Hits 6751 6910 +159
- Misses 2805 2814 +9 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Performance-focused parser/DOM optimizations across sonic-rs and sonic-number, introducing SWAR-based number parsing, pointer-based DOM node staging, and lower-branch structural matching to reduce per-byte overhead in hot loops.
Changes:
- Add SWAR (
u64arithmetic) fast paths for integer/fraction parsing and introduce an unchecked number parsing variant for padded inputs. - Rework DOM construction to use raw pointer writes into a preallocated buffer and add an AVX2 inline copy routine for container finalization.
- Update parser inner loops with
peek_u16structural matching and add benchmarks/testdata to measure throughput improvements.
Reviewed changes
Copilot reviewed 17 out of 27 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/value/shared.rs | Adds Shared::with_capacity to preallocate the bump allocator for DOM construction. |
| src/value/node.rs | Adds AVX2 inline copy and switches DOM node buffering to pointer-based writes. |
| src/reader.rs | Adds padded_slice and 2-byte peek helpers for faster parsing decisions. |
| src/parser.rs | Adds unchecked number parsing + scalar key fast path + peek_u16 structural matching in array/object parsing. |
| sonic-number/src/swar.rs | New SWAR helpers for digit detection and 1–16 digit parsing. |
| sonic-number/src/lib.rs | Integrates SWAR parsing into number parsing and adds an unsafe parse_number_unchecked. |
| sonic-number/Cargo.toml | Bumps crate version and adds Criterion benches. |
| sonic-number/benches/number.rs | Adds Criterion benchmarks for SWAR vs SIMD and end-to-end parse_number. |
| Cargo.toml | Bumps sonic-rs version. |
| benchmarks/Cargo.toml | Removes old number bench wiring; adds perf_parse example. |
| benchmarks/benches/deserialize_value.rs | Expands benchmark set to include golang_source and lottie. |
| benchmarks/benches/perf_parse.rs | Adds a core-pinned parsing throughput example. |
| benchmarks/bench_quick.sh | Adds a quick benchmark helper script. |
| benchmarks/benches/testdata/* | Adds/updates compact JSON fixtures and documentation for benchmark datasets. |
Comments suppressed due to low confidence (1)
src/parser.rs:513
- In the object parsing slow path, the final
_ =>branch returnsExpectedArrayCommaOrEnd, which seems like the wrong error code for an object context (should be something likeExpectedObjectCommaOrEnd). This will produce misleading error messages/diagnostics when encountering invalid JSON inside objects.
match self.skip_space() {
Some(b'}') => return check_visit!(self, vis.visit_object_end(count)),
Some(b',') => match self.skip_space() {
Some(b'"') => continue,
_ => return perr!(self, ExpectObjectKeyOrEnd),
},
_ => return perr!(self, ExpectedArrayCommaOrEnd),
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| unsafe { &*(self.peek_n(2).unwrap_unchecked().as_ptr() as *const [u8; 2]) } | ||
| } | ||
|
|
||
| /// Peek next 2 bytes as a little-endian u16 for single-instruction matching. | ||
| /// E.g., `peek_u16() == u16::from_le_bytes([b',', b'"'])` matches `,"` in one cmp. |
There was a problem hiding this comment.
Reader::peek2() is a safe method but uses unwrap_unchecked() on peek_n(2). On non-padded readers (e.g. Read<'_>), calling this near EOF can trigger immediate UB. Either make peek2 an unsafe fn with a clear safety contract, or implement it safely (return an Option<[u8;2]> / u16) without unwrap_unchecked and keep unchecked access confined to PaddedSliceRead-specific helpers.
| unsafe { &*(self.peek_n(2).unwrap_unchecked().as_ptr() as *const [u8; 2]) } | |
| } | |
| /// Peek next 2 bytes as a little-endian u16 for single-instruction matching. | |
| /// E.g., `peek_u16() == u16::from_le_bytes([b',', b'"'])` matches `,"` in one cmp. | |
| match self.peek_n(2) { | |
| Some(s) => unsafe { &*(s.as_ptr() as *const [u8; 2]) }, | |
| None => panic!("Reader::peek2() requires at least 2 bytes remaining"), | |
| } | |
| } | |
| /// Peek next 2 bytes as a little-endian u16 for single-instruction matching. | |
| /// E.g., `peek_u16() == u16::from_le_bytes([b',', b'"'])` matches `,\"` in one cmp. |
| #[inline(always)] | ||
| fn peek_u16(&self) -> u16 { | ||
| match self.peek_n(2) { | ||
| Some(s) => unsafe { (s.as_ptr() as *const u16).read_unaligned() }, |
There was a problem hiding this comment.
peek_u16() is documented/used as a little-endian 2-byte view, but the implementation returns a native-endian u16 via read_unaligned(). On big-endian targets this breaks all the structural matching logic that compares against u16::from_le_bytes([...]) and relies on (sep & 0xFF) being the first byte. Consider returning u16::from_le_bytes([s[0], s[1]]) (or u16::from_le(read_unaligned(...))) and/or gating these fast paths behind cfg(target_endian = "little").
| Some(s) => unsafe { (s.as_ptr() as *const u16).read_unaligned() }, | |
| Some(s) => u16::from_le(unsafe { (s.as_ptr() as *const u16).read_unaligned() }), |
| /// Unchecked number parsing — requires padded buffer (PaddedSliceRead). | ||
| /// Uses the full padded buffer (including padding) so unchecked access is safe. | ||
| #[inline(always)] | ||
| pub fn parse_number_unchecked(&mut self, first: u8) -> Result<ParserNumber> { | ||
| let reader = &mut self.read; | ||
| let neg = first == b'-'; | ||
| let mut now = reader.index() - (!neg as usize); | ||
| // Use padded_slice which includes the padding bytes, so unchecked reads are safe. | ||
| let data = reader.padded_slice(); | ||
| let ret = unsafe { sonic_number::parse_number_unchecked(data, &mut now, neg) }; | ||
| reader.set_index(now); | ||
| ret.map_err(|err| self.error(err.into())) | ||
| } |
There was a problem hiding this comment.
parse_number_unchecked() is exposed as a safe pub fn but it calls sonic_number::parse_number_unchecked() which requires extra padding for all unchecked reads. Since Reader::padded_slice() defaults to as_u8_slice() for non-padded readers, external callers can hit UB by invoking this on a Parser<Read<'_>>. Make this API unsafe, restrict it with a type bound to PaddedSliceRead, or otherwise enforce/preserve the padding invariant at the type level.
| /// Simple while loop, let LLVM optimize the unrolling. | ||
| /// | ||
| /// # Safety | ||
| /// Only called when strbuf=None (padded reader path). PaddedSliceRead has 64 bytes | ||
| /// of zero-padding beyond valid JSON, so scanning 24 bytes ahead is always safe. | ||
| /// Padding bytes (0x00) are < 0x20 and will bail to parse_string_visit. |
There was a problem hiding this comment.
This scalar key fast-path is described as for “short ASCII keys”, but it accepts any non-control, non-escape UTF-8 bytes (including multi-byte UTF-8) and then uses from_utf8_unchecked. Either tighten the predicate to actual ASCII if that’s the intent, or update the doc comment to reflect the real preconditions (e.g., “no escapes/control chars, length ≤ 24 bytes”).
| /// Simple while loop, let LLVM optimize the unrolling. | |
| /// | |
| /// # Safety | |
| /// Only called when strbuf=None (padded reader path). PaddedSliceRead has 64 bytes | |
| /// of zero-padding beyond valid JSON, so scanning 24 bytes ahead is always safe. | |
| /// Padding bytes (0x00) are < 0x20 and will bail to parse_string_visit. | |
| /// Fast path for keys that terminate within 24 bytes and contain no escapes (`\\`) | |
| /// or control characters (`< 0x20`). | |
| /// | |
| /// This path is not restricted to ASCII: it accepts any bytes `>= 0x20` until the | |
| /// closing `"` is found, so multi-byte UTF-8 key bytes are handled here as well. | |
| /// If an escape, control byte, or no closing quote within 24 bytes is encountered, | |
| /// parsing falls back to `parse_string_visit`. | |
| /// | |
| /// # Safety | |
| /// Only called when strbuf=None (padded reader path). PaddedSliceRead has 64 bytes | |
| /// of zero-padding beyond valid JSON, so scanning 24 bytes ahead is always safe. | |
| /// Padding bytes (0x00) are < 0x20 and will bail to parse_string_visit. | |
| /// | |
| /// The `from_utf8_unchecked` below therefore relies on the parser input being valid | |
| /// UTF-8 JSON when this fast path succeeds. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 39b507ec71
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| pub fn parse_number_unchecked(&mut self, first: u8) -> Result<ParserNumber> { | ||
| let reader = &mut self.read; | ||
| let neg = first == b'-'; | ||
| let mut now = reader.index() - (!neg as usize); | ||
| // Use padded_slice which includes the padding bytes, so unchecked reads are safe. |
There was a problem hiding this comment.
Mark parse_number_unchecked as unsafe or enforce padding
Parser::parse_number_unchecked is a safe public method, but it unconditionally calls sonic_number::parse_number_unchecked, which performs get_unchecked reads that require at least 64 bytes of trailing padding. With a normal Parser<Read> (no padding), calling this API on inputs like "1" can read past the slice boundary and trigger undefined behavior. Because this is externally callable through the public parser module, the precondition must be enforced in code (or the method made unsafe).
Useful? React with 👍 / 👎.
5c5853e to
aa5be83
Compare
Performance optimizations for JSON parsing: 1. sonic-number: SWAR integer parsing (simdjson-style parse_eight_digits) - Uses pure u64 arithmetic to parse 8 ASCII digits at a time - Single-digit fast path for common small integers - 32-54% improvement on parse_number microbenchmark 2. parser: compact JSON 2-byte peek fast path - peek2() reads 2 bytes at once to match `,"` / `":` patterns - Eliminates redundant skip_space calls for compact JSON 3. DOM construction: pointer-based push_node - Replace (ptr, len, cap) with (base, end, cap_end) pointers - Eliminates store-to-load forwarding stalls (28% of loads → ~20%) - Eliminates integer register pressure (30.7% cycles stalled → 0.1%) - IPC improved from 3.6 to 4.4-5.3 4. DOM construction: inline SIMD memcpy (sonic-cpp style Xmemcpy) - AVX2 unrolled copy for visit_container_end - Eliminates libc memmove function call overhead - memmove dropped from 61% to 1.9% of cycles (citm_catalog) End-to-end deserialize_value improvement (A/B tested, core-pinned): citm_catalog: 511 µs → 314 µs (+38.6%) golang_source: 1416 µs → 1099 µs (+22.4%) lottie: 495 µs → 351 µs (+29.1%)
- Replace peek2() slice matching with peek_u16() u16 comparison for structural characters, reducing branch count in parser main loop - Add scalar fast path for object key parsing (up to 24 chars), avoiding SIMD StringBlock setup overhead for short keys
- Add parse_number_unchecked() using get_unchecked() for digit reads, eliminating per-digit bounds checks from the hot integer parsing loop - Scalar fraction accumulation for short integer part (<8 digits), avoiding SIMD setup + POW10 table multiplication overhead - Fix parse_float_fast threshold off-by-one (sig >> 52 vs sig < 2^53) - Replace SSE simd_str2int with SWAR fraction parsing + tolerant tail, keeping integer pipeline instead of competing with FP ports
- Compact all JSON test data (minified format, no whitespace) citm_catalog: 1.7MB → 489KB (-71%) - Add new test files: golang_source (51k ints), lottie (18k ints), synthea_fhir, gsoc-2018, poet, string_unicode, string_escaped, twitterescaped - Add testdata/README.md with value type statistics and selection guide - Add golang_source and lottie to deserialize_value benchmark - Add perf_parse example binary for profiling - Expand number.rs benchmark with SWAR comparison and per-length tests
Summary
parse_eight_digits), replacing scalar digit loopsVec::pushwith raw pointer writes, eliminating store-to-load forwarding stalls (IPC 3.6 → 4.4-5.3)visit_container_end, replacing libc memmove overhead,\"/}/]/:patterns, reducing branch count in parser main loopparse_number_unchecked()withget_unchecked()for padded buffers, eliminating per-digit bounds checkssimd_str2intwith integer-pipeline SWAR for fraction digits, avoiding FP port contention on AMD Zenparse_digits_tolerantfor fraction remainderssig >> 52→sig < 2^53)Benchmark (compact JSON, core-pinned, vs baseline)
Test plan
cargo test— 146 tests passcargo test -p sonic-number(from sonic-number dir) — 7 tests pass🤖 Generated with Claude Code