perf(scan): fuse validate_brackets into SIMD emit loops#18
Closed
membphis wants to merge 1 commit into
Closed
Conversation
Previously the AVX2 and NEON scanners ran a two-pass design: 1. SIMD chunk loop emits all structural offsets via emit_bits 2. validate_brackets walks the emitted indices, tracking in_string and verifying bracket pairing Now both scanners maintain a depth stack inline and fuse bracket validation into the emit step via emit_bits_validate. The tail handler (scan_emit_resume) still emits without validation; its output is folded in via validate_tail_indices using the same stack. The fused validator is simpler than the original validate_brackets: the SIMD scanner guarantees no in-string structural char is ever emitted (struct_mask & !inside excludes them), so `"`, `:`, `,` are unconditional no-ops. The in_string toggle that validate_brackets used to maintain is no longer required. Code cleanup: - emit_bits and validate_brackets in scan/mod.rs replaced by emit_bits_validate and validate_tail_indices - doc references updated in scalar.rs and crosscheck.rs Perf on the multimodal bench (Apple M4): within noise (±2%), as expected — the workload is string-heavy with sparse structural density, so the eliminated validate_brackets walk was already cheap. The win is structural: a single pass over indices instead of two, and a simpler validator that drops the in_string state. Larger effect expected on config / JSONL / object-heavy payloads.
Collaborator
Author
|
Closing without merge. The fusion is correct and simplifies code, but on the current string-heavy multimodal benchmark it shows no consistent improvement (±2% noise) — the workload has sparse structural density so the eliminated validate_brackets pass was already cheap. Keeping the two-pass design avoids adding per-emit overhead. Will revisit if a structurally-dense workload (config / JSONL / object-shape JSON) becomes a target. |
2 tasks
membphis
added a commit
that referenced
this pull request
May 16, 2026
The validate_brackets fusion entry now references the closed PR #18, explains why the prototype showed no measurable improvement on the string-heavy multimodal bench (per-emit buf[pos] lookup cancels the savings from eliminating the second indices pass), and pins the revisit condition to a structurally-dense bench fixture appearing. Keeps the entry actionable: future contributors know the design has been tried, what failed, and what data to gather before retrying.
This was referenced May 16, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Completes the second half of the validate_brackets fusion started in #17.
PR #17 fused validation into
ScalarScanneronly; AVX2 and NEON scannersstill ran the two-pass
emit_bits+validate_brackets(buf, &indices)design. This PR folds bracket validation into the SIMD emit loops via
emit_bits_validate, eliminating the second linear pass overindices.Design
emit_bits_validate(new, replacesemit_bits): walks set bits ofa 64-bit chunk mask, pushes each position into
out, AND validates{/}/[/]against a depth stack inline. ReturnsErr(pos)onthe first mismatch.
validate_tail_indices(new): same per-index logic, but consumesalready-emitted indices from the scalar tail handler (which still uses
the unchanged
scan_emit_resume). Called once at end ofscan_*_impl.validate_brackets(removed): no longer needed.Key invariant exploited: the SIMD scanner already filters out structural
chars inside strings via
struct_mask & !inside, so the fused validatortreats
",:,,as unconditional no-ops — noin_stringtoggleneeded. The original
validate_bracketshad to maintain that state asa defensive measure.
Performance
Within ±2% noise on the string-heavy multimodal bench, matching the
README Roadmap note (~0.3% effect on this workload). The structural
win is what matters: a single pass over
indicesinstead of two,plus a simpler validator. Larger effect expected on structurally-dense
inputs (config / JSONL / object-shape JSON).
Test plan
cargo test --release(all 70 unit + 50+ integration tests pass)cargo test --release --no-default-features(scalar-only gate)cargo clippy --release --all-targets -- -D warningscleanmake benchruns end-to-end on Apple M4