perf: streaming ReplaceAll + DFA-first FindSubmatchAt (#135)#136
Merged
perf: streaming ReplaceAll + DFA-first FindSubmatchAt (#135)#136
Conversation
…135) Convert ReplaceAllStringFunc, ReplaceAllFunc, ReplaceAllLiteral, ReplaceAllLiteralString from two-pass (collect indices then replace) to single-pass streaming. Eliminates [][]int allocation for high-match-count inputs. Returns original string when no matches (Cow-like optimization).
Implement Rust-style two-phase search for capture extraction: Phase 1: DFA/strategy finds match boundaries [start, end] Phase 2: PikeVM runs anchored within [start..end] for captures Add SearchWithCapturesInSpan to PikeVM. Reduces PikeVM work from O(remaining_haystack) to O(match_len) per match. For 50K matches on 10MB: ~400x less PikeVM work. Also optimize: skip PikeVM entirely when CaptureCount <= 1 (only group 0 needed — DFA result already provides boundaries). Rewrite FindAllSubmatch to use FindSubmatchAt internally, benefiting from the same two-phase optimization.
Benchmark ComparisonComparing Summary:
|
…pturesAt Two fixes for PR #136 CI failure: 1. FindSubmatchAt: skip two-phase search for UseBoundedBacktracker and UseNFA strategies — BT's recursive backtrackFindWithState overflows 250MB stack on 386 with deep UTF-8 NFA chains. These strategies don't benefit from two-phase anyway (Phase 1 uses the same engine as Phase 2). 2. SearchWithCapturesAt: use matchesEmptyAt(haystack, at) instead of matchesEmpty() for at==len(haystack) fast path. matchesEmpty() loses lookbehind context (evaluates with nil,0), causing \B false positives at end-of-haystack. SearchWithSlotTableAt already used the correct matchesEmptyAt — this aligns SearchWithCapturesAt to match.
…rhead FindAllSubmatch now acquires SearchState once for entire iteration loop, matching the pattern in findAllIndicesLoop. Extracted findSubmatchAtWithState internal method shared by both FindSubmatchAt (public) and FindAllSubmatch. Prevents race detector test timeouts (>10 min) caused by thousands of sync.Pool get/put operations per FindAllSubmatch call.
Strategies UseDFA, UseBoth, UseDigitPrefilter access shared mutable state (e.dfa lazy DFA, e.pikevm) in their findIndicesAt dispatch paths. When findSubmatchAtWithState routes Phase 1 through these strategies, concurrent FindSubmatch calls race on the shared state. Fix: extend the two-phase bypass guard to include all strategies that use shared mutable state. These strategies now go directly to the pooled PikeVM (state.pikevm) for capture extraction, which is thread-safe by design. Strategies that remain eligible for two-phase search all use their own immutable instances (ReverseSuffix, ReverseInner, CharClassSearcher, CompositeSearcher, etc.).
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
…verflow, \B context
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Rust-style two-phase search architecture + streaming ReplaceAll for Issue #135.
Streaming ReplaceAll —
ReplaceAllStringFunc,ReplaceAllFunc,ReplaceAllLiteral,ReplaceAllLiteralStringconverted from two-pass (collect all indices → iterate) to single-pass streaming. Eliminates[][]intallocation. Returns original string when no matches (Cow-like optimization).DFA-first FindSubmatchAt — Phase 1: DFA/strategy finds match boundaries
[start, end]. Phase 2: PikeVM runs anchored within[start..end]for captures. Reduces PikeVM work from O(remaining_haystack) to O(match_len) per match. For 50K matches on 10MB: ~400x less PikeVM work. Addsis_capture_search_neededoptimization: when only group 0 is needed, PikeVM is skipped entirely.FindAllSubmatch context fix — now uses
FindSubmatchAtwith full haystack preservation instead of slicing (haystack[pos:]), fixing lookbehind context loss for\bat match boundaries.Based on deep research of Rust regex architecture (
docs/dev/research/dfa-first-replaceall-research.md).Closes #135.
Test plan
go test ./...— all 11 packages passgofmt -l .— cleangolangci-lint run— 0 issues