perf(scan): fuse validate_brackets into NEON scanner#35
Closed
membphis wants to merge 2 commits into
Closed
Conversation
Eliminate the separate validate_brackets pass in the NEON scanner by carrying a depth stack inline during emit. This mirrors the scalar scanner's fused scan_and_validate approach. Changes: - Add emit_bits_validate() that validates brackets while emitting - Add scan_tail_validate() for the scalar tail with inline validation - Gate emit_bits, validate_brackets, scan_emit_resume with #[cfg] for AVX2-only (they remain used by the AVX2 scanner) Profile on bench fixtures showed validate_brackets consuming ~30% of scan time on structure-dense payloads (small_api.json). The fusion eliminates this second pass. Closes #25
Add make_dense_payload() that generates ~100KB JSON with 46% structural density (vs <0.1% for multimodal payloads). This exercises the validate_brackets fusion path more heavily. Results show the fusion provides ~3% improvement even on structure-dense payloads, confirming issue #25's analysis that per-emit buf[pos] lookups offset the eliminated pass.
Collaborator
Author
|
Closing: benchmark shows only ~3% improvement (within noise), confirming that per-emit buf[pos] lookups offset the eliminated pass. Not worth the added complexity. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Eliminate the separate
validate_bracketspass in the NEON scanner by carrying a depth stack inline during emit. This mirrors the scalar scanner's fusedscan_and_validateapproach.Changes
emit_bits_validate()that validates brackets while emitting structural offsetsscan_tail_validate()for the scalar tail with inline validationemit_bits,validate_brackets,scan_emit_resumewith#[cfg]for AVX2-only (they remain used by the AVX2 scanner)dense-100kbench scenario (46% structural density) to measure fusion impactPerformance
Measured improvement: ~3% (within noise)
Profile analysis predicted
validate_bracketsconsuming ~30% of scan time on structure-dense payloads. However, benchmarks show only ~3% end-to-end improvement even on a 46% structural density payload.This confirms issue #25's analysis: the per-emit
buf[pos]lookup to determine bracket type offsets the savings from eliminating the separate pass.Why the limited gain:
buf[pos]random memory accessvalidate_bracketspassvalidate_bracketswas never a bottleneck thereValue of this change:
Testing
cargo test --release— NEON gatecargo test --release --no-default-features— scalar unchangedcargo test --features test-panic --release— FFI panic barrier intactmake lint— clippy cleanmake bench— no regression, ~3% improvement on dense payloadsCloses #25