Skip to content

perf(scan): fuse validate_brackets into SIMD emit loops#18

Closed
membphis wants to merge 1 commit into
mainfrom
worktree-simd-bracket-fusion
Closed

perf(scan): fuse validate_brackets into SIMD emit loops#18
membphis wants to merge 1 commit into
mainfrom
worktree-simd-bracket-fusion

Conversation

@membphis
Copy link
Copy Markdown
Collaborator

Summary

Completes the second half of the validate_brackets fusion started in #17.
PR #17 fused validation into ScalarScanner only; AVX2 and NEON scanners
still ran the two-pass emit_bits + validate_brackets(buf, &indices)
design. This PR folds bracket validation into the SIMD emit loops via
emit_bits_validate, eliminating the second linear pass over indices.

Design

  • emit_bits_validate (new, replaces emit_bits): walks set bits of
    a 64-bit chunk mask, pushes each position into out, AND validates
    {/}/[/] against a depth stack inline. Returns Err(pos) on
    the first mismatch.
  • validate_tail_indices (new): same per-index logic, but consumes
    already-emitted indices from the scalar tail handler (which still uses
    the unchanged scan_emit_resume). Called once at end of scan_*_impl.
  • validate_brackets (removed): no longer needed.

Key invariant exploited: the SIMD scanner already filters out structural
chars inside strings via struct_mask & !inside, so the fused validator
treats ", :, , as unconditional no-ops — no in_string toggle
needed. The original validate_brackets had to maintain that state as
a defensive measure.

Performance

Size NEON before (PR #17) NEON after (this PR) delta
2 KB 654,108 630,994 -3.5%
60 KB 167,954 170,242 +1.4%
100 KB 108,932 108,578 -0.3%
1 MB 11,905 12,733 +7.0%
10 MB 1,218 1,228 +0.8%
interleaved 26,281 26,226 -0.2%

Within ±2% noise on the string-heavy multimodal bench, matching the
README Roadmap note (~0.3% effect on this workload). The structural
win is what matters: a single pass over indices instead of two,
plus a simpler validator. Larger effect expected on structurally-dense
inputs (config / JSONL / object-shape JSON).

Test plan

  • cargo test --release (all 70 unit + 50+ integration tests pass)
  • cargo test --release --no-default-features (scalar-only gate)
  • cargo clippy --release --all-targets -- -D warnings clean
  • make bench runs end-to-end on Apple M4
  • Proptest crosscheck (2000 cases): scalar parity with NEON
  • CI: x86_64 default + scalar-only + test-panic + Lua busted

Previously the AVX2 and NEON scanners ran a two-pass design:
1. SIMD chunk loop emits all structural offsets via emit_bits
2. validate_brackets walks the emitted indices, tracking in_string
   and verifying bracket pairing

Now both scanners maintain a depth stack inline and fuse bracket
validation into the emit step via emit_bits_validate. The tail
handler (scan_emit_resume) still emits without validation; its
output is folded in via validate_tail_indices using the same stack.

The fused validator is simpler than the original validate_brackets:
the SIMD scanner guarantees no in-string structural char is ever
emitted (struct_mask & !inside excludes them), so `"`, `:`, `,` are
unconditional no-ops. The in_string toggle that validate_brackets
used to maintain is no longer required.

Code cleanup:
- emit_bits and validate_brackets in scan/mod.rs replaced by
  emit_bits_validate and validate_tail_indices
- doc references updated in scalar.rs and crosscheck.rs

Perf on the multimodal bench (Apple M4): within noise (±2%), as
expected — the workload is string-heavy with sparse structural
density, so the eliminated validate_brackets walk was already
cheap. The win is structural: a single pass over indices instead
of two, and a simpler validator that drops the in_string state.
Larger effect expected on config / JSONL / object-heavy payloads.
@membphis
Copy link
Copy Markdown
Collaborator Author

Closing without merge. The fusion is correct and simplifies code, but on the current string-heavy multimodal benchmark it shows no consistent improvement (±2% noise) — the workload has sparse structural density so the eliminated validate_brackets pass was already cheap. Keeping the two-pass design avoids adding per-emit overhead. Will revisit if a structurally-dense workload (config / JSONL / object-shape JSON) becomes a target.

@membphis membphis closed this May 16, 2026
@membphis membphis deleted the worktree-simd-bracket-fusion branch May 16, 2026 11:31
membphis added a commit that referenced this pull request May 16, 2026
The validate_brackets fusion entry now references the closed PR #18,
explains why the prototype showed no measurable improvement on the
string-heavy multimodal bench (per-emit buf[pos] lookup cancels the
savings from eliminating the second indices pass), and pins the
revisit condition to a structurally-dense bench fixture appearing.

Keeps the entry actionable: future contributors know the design has
been tried, what failed, and what data to gather before retrying.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant