Skip to content

perf(scan): fuse validate_brackets into NEON scanner#35

Closed
membphis wants to merge 2 commits into
mainfrom
worktree-issue-25-validate-brackets-fusion
Closed

perf(scan): fuse validate_brackets into NEON scanner#35
membphis wants to merge 2 commits into
mainfrom
worktree-issue-25-validate-brackets-fusion

Conversation

@membphis
Copy link
Copy Markdown
Collaborator

@membphis membphis commented May 17, 2026

Summary

Eliminate the separate validate_brackets pass in the NEON scanner by carrying a depth stack inline during emit. This mirrors the scalar scanner's fused scan_and_validate approach.

Changes

  • Add emit_bits_validate() that validates brackets while emitting structural offsets
  • Add scan_tail_validate() for the scalar tail with inline validation
  • Gate emit_bits, validate_brackets, scan_emit_resume with #[cfg] for AVX2-only (they remain used by the AVX2 scanner)
  • Add dense-100k bench scenario (46% structural density) to measure fusion impact

Performance

Measured improvement: ~3% (within noise)

Profile analysis predicted validate_brackets consuming ~30% of scan time on structure-dense payloads. However, benchmarks show only ~3% end-to-end improvement even on a 46% structural density payload.

This confirms issue #25's analysis: the per-emit buf[pos] lookup to determine bracket type offsets the savings from eliminating the separate pass.

Payload Structural Density Old (ops/s) New (ops/s) Change
dense-100k 46.57% 9,360 9,677 +3.4%
small (2KB) 12.72% 831,947 848,464 +2.0%
medium 0.12% 362,319 366,300 +1.1%
100k (multimodal) 0.08% 278,552 280,899 +0.8%

Why the limited gain:

  1. Each emitted structural character now requires a buf[pos] random memory access
  2. This cost roughly equals the eliminated validate_brackets pass
  3. String-heavy payloads (multimodal, base64) have <0.1% structural density, so validate_brackets was never a bottleneck there

Value of this change:

  • Cleaner single-pass architecture (matches scalar scanner design)
  • Marginal improvement on structure-dense workloads
  • No regression on any workload

Testing

  • cargo test --release — NEON gate
  • cargo test --release --no-default-features — scalar unchanged
  • cargo test --features test-panic --release — FFI panic barrier intact
  • make lint — clippy clean
  • make bench — no regression, ~3% improvement on dense payloads

Closes #25

membphis added 2 commits May 17, 2026 16:04
Eliminate the separate validate_brackets pass in the NEON scanner by
carrying a depth stack inline during emit. This mirrors the scalar
scanner's fused scan_and_validate approach.

Changes:
- Add emit_bits_validate() that validates brackets while emitting
- Add scan_tail_validate() for the scalar tail with inline validation
- Gate emit_bits, validate_brackets, scan_emit_resume with #[cfg] for
  AVX2-only (they remain used by the AVX2 scanner)

Profile on bench fixtures showed validate_brackets consuming ~30% of
scan time on structure-dense payloads (small_api.json). The fusion
eliminates this second pass.

Closes #25
Add make_dense_payload() that generates ~100KB JSON with 46% structural
density (vs <0.1% for multimodal payloads). This exercises the
validate_brackets fusion path more heavily.

Results show the fusion provides ~3% improvement even on structure-dense
payloads, confirming issue #25's analysis that per-emit buf[pos] lookups
offset the eliminated pass.
@membphis
Copy link
Copy Markdown
Collaborator Author

Closing: benchmark shows only ~3% improvement (within noise), confirming that per-emit buf[pos] lookups offset the eliminated pass. Not worth the added complexity.

@membphis membphis closed this May 17, 2026
@membphis membphis deleted the worktree-issue-25-validate-brackets-fusion branch May 17, 2026 08:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf(scan): validate_brackets fusion in AVX2 and NEON scanners

1 participant