opt: f32 fix + fuzz infrastructure#225
Conversation
The previous approach parsed all numbers as f64, then relied on serde's visitor to cast f64→f32. This produces off-by-one ULP errors at tie-breaking boundaries (e.g., "17005001.000000000000130" → 17005000 instead of 17005002). Now deserialize_f32 captures the raw number bytes and uses str::parse::<f32>() for correct rounding, matching std and serde_json. Also adds safety comments for unchecked slice in parse_number.
New fuzz targets for focused unsafe code coverage: - fuzz_number: SWAR parsing, u64 overflow, f32/f64 precision boundaries - fuzz_string: escape sequences, unicode, SIMD chunk boundaries, UTF-8 lossy - fuzz_get_path: get/get_unchecked with structured paths, lazy iterators - fuzz_deep_nesting: NodeBuf capacity, deep arrays/objects, tls_buffer stress - fuzz_serde_roundtrip: complex struct/enum cross-deserialization consistency Adds fuzz/src/gen.rs with Arbitrary-derived generators for structured JSON (JsonValue, NumberInput, DeepNestInput) to reach deeper parsing paths. Updates scripts/fuzz.sh to run all 6 targets with configurable FUZZ_TIME and FUZZ_TARGETS env vars. CI job updated with 3m per-target budget.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #225 +/- ##
==========================================
+ Coverage 71.06% 71.08% +0.02%
==========================================
Files 42 42
Lines 9724 9744 +20
==========================================
+ Hits 6910 6927 +17
- Misses 2814 2817 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
💡 Codex ReviewLines 316 to 320 in 40bcfed
ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
There was a problem hiding this comment.
Pull request overview
This PR focuses on improving JSON parsing/deserialization throughput (number parsing, structural matching, and DOM construction), fixes f32 deserialization precision at tie-breaking boundaries, and adds structured fuzzing + CI fuzz execution to harden the new fast paths.
Changes:
- Introduces new fast paths in the parser/DOM builder (unchecked number parsing with padding,
peek_u16structural matching, pointer-based node staging, AVX2 inline copy). - Fixes
f32deserialization to parse directly from the raw number string to avoid f64→f32 cast tie-breaking errors. - Adds structured fuzz generators/targets and updates CI/scripts to run multiple fuzz targets.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| src/value/tls_buffer.rs | Reworks TLS node staging into TlsBuf + new NodeBuf abstraction with pointer fast path. |
| src/value/node.rs | Switches DOM visitor to NodeBuf; adds AVX2 inline_copy_values; test tweak for file-size gating. |
| src/serde/de.rs | Implements custom deserialize_f32 using raw-number reparse; updates numeric-key macro usage. |
| src/reader.rs | Adds padded_slice() and peek_u16() APIs; overrides them for PaddedSliceRead. |
| src/parser.rs | Adds unchecked number parsing path, scalar key fast-path, and u16-based structural matching in arrays/objects. |
| sonic-number/src/swar.rs | Adds SWAR digit parsing utilities + tests. |
| sonic-number/src/lib.rs | Integrates SWAR into integer/fraction parsing; adds unchecked parse entrypoint; expands tests. |
| sonic-number/Cargo.toml | Bumps crate version; adds Criterion benches configuration. |
| sonic-number/benches/number.rs | Adds number parsing benchmark suite. |
| scripts/fuzz.sh | Runs multiple fuzz targets with configurable per-target time. |
| fuzz/src/lib.rs | Exposes structured generator module. |
| fuzz/src/gen.rs | Adds Arbitrary-based structured JSON/number/deep-nesting generators. |
| fuzz/fuzz_targets/string_escape.rs | New fuzz target for string parsing/escaping and round-trips. |
| fuzz/fuzz_targets/serde_roundtrip.rs | New fuzz target for serde round-trip vs serde_json. |
| fuzz/fuzz_targets/number_parse.rs | New fuzz target for numeric parsing consistency vs serde_json. |
| fuzz/fuzz_targets/get_path.rs | New fuzz target for get()/iterators/path access APIs. |
| fuzz/fuzz_targets/deep_nesting.rs | New fuzz target for deep nesting + DOM build paths. |
| fuzz/Cargo.toml | Adds arbitrary deps and registers new fuzz binaries; adds sonic-number dependency. |
| Cargo.toml | Bumps sonic-rs version. |
| benchmarks/Cargo.toml | Removes number bench and adds a perf parsing example target. |
| benchmarks/benches/testdata/string_unicode.json | Adds unicode-heavy benchmark input (raw UTF-8). |
| benchmarks/benches/testdata/string_escaped.json | Adds escape-heavy benchmark input (\\uXXXX). |
| benchmarks/benches/testdata/README.md | Documents benchmark testdata characteristics and recommendations. |
| benchmarks/benches/perf_parse.rs | Adds example program for stable perf measurement runs. |
| benchmarks/benches/number.rs | Removes prior atoi_simd comparison benchmark. |
| benchmarks/benches/deserialize_value.rs | Expands bench coverage to additional datasets. |
| benchmarks/bench_quick.sh | Adds quick-run helper script for iterative benchmarking. |
| .github/workflows/ci.yml | Adds fuzz job timeout and FUZZ_TIME env for CI fuzz runs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Summary
str::parse::<f32>()instead of f64→f32 cast, fixing off-by-one ULP errors at tie-breaking boundariesArbitrary-based JSON generators covering SWAR, escape, NodeBuf, and cross-deserialization pathsTest plan
cargo test)cargo miri test --all-features --lib)🤖 Generated with Claude Code