Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,10 +103,10 @@ ARM64 (Apple M4, NEON/PMULL scanner, same workload):

| Size | cjson | `qd.parse` | `qd.decode + t.f x3` | speedup vs. cjson |
|---:|---:|---:|---:|---:|
| 2 KB | 254,738 | 654,108 | 392,711 | 2.6× / 1.5× |
| 100 KB | 15,281 | 108,932 | 99,701 | 7.1× / 6.5× |
| 1 MB | 1,523 | 11,905 | 11,876 | 7.8× / 7.8× |
| 10 MB | 153 | 1,218 | 1,222 | 8.0× / 8.0× |
| 2 KB | 237,124 | 705,000 | 390,000 | 3.0× / 1.6× |
| 100 KB | 14,667 | 232,000 | 208,000 | 15.8× / 14.2× |
| 1 MB | 1,494 | 33,700 | 33,000 | 22.6× / 22.1× |
| 10 MB | 150 | 3,376 | 3,454 | 22.5× / 23.0× |

See [`docs/benchmarks.md`](docs/benchmarks.md) for the full size ladder,
memory numbers, an "encode round-trip" row (passthrough emit via
Expand Down
16 changes: 16 additions & 0 deletions src/scan/avx2.rs
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,22 @@ unsafe fn scan_avx2_impl(buf: &[u8], out: &mut Vec<u32>) -> Result<(), usize> {
if interesting == 0 {
bs_carry = 0;
i += 64;
// Cross-chunk jump: no quote/backslash means in_string polarity
// cannot flip and no escape can start, so jump straight to the
// 64B-aligned chunk containing the next interesting byte.
// The 4 KB remaining-buffer threshold suppresses the memchr2
// call entirely on small payloads (≤4 KB total), where the per-
// call libc overhead exceeds the in-string probe loop it would
// replace. On larger payloads only the last 4 KB foregoes the
// jump — negligible against MB-scale gains.
if i + 4096 <= buf.len() {
let scan_end = buf.len() - (buf.len() % 64);
let jump = match memchr::memchr2(b'"', b'\\', &buf[i..scan_end]) {
Some(rel) => rel & !63,
None => scan_end - i,
};
i += jump;
}
continue;
}
}
Expand Down
18 changes: 18 additions & 0 deletions src/scan/neon.rs
Original file line number Diff line number Diff line change
Expand Up @@ -176,6 +176,24 @@ unsafe fn scan_neon_impl(buf: &[u8], out: &mut Vec<u32>) -> Result<(), usize> {
if (quote_probe | backslash_probe) == 0 {
bs_carry = 0;
i += 64;
// Cross-chunk jump: with no quote/backslash in the chunk we just
// skipped, in_string polarity cannot flip and no escape can start,
// so we can use memchr2 to skip ahead to the 64B-aligned chunk
// containing the next interesting byte. Bounded by the last full
// 64B chunk; the <64B tail is handled by the scalar resume path.
// The 4 KB remaining-buffer threshold suppresses the memchr2
// call entirely on small payloads (≤4 KB total), where the per-
// call libc overhead exceeds the in-string probe loop it would
// replace. On larger payloads only the last 4 KB foregoes the
// jump — negligible against MB-scale gains.
if i + 4096 <= buf.len() {
let scan_end = buf.len() - (buf.len() % 64);
let jump = match memchr::memchr2(b'"', b'\\', &buf[i..scan_end]) {
Some(rel) => rel & !63,
None => scan_end - i,
};
i += jump;
}
continue;
}
}
Expand Down
Loading