-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Labels
arch: amd64x86-64 specific (AVX2, SSSE3)x86-64 specific (AVX2, SSSE3)area: simdLow-level SIMD assembly (AVX2, SSSE3)Low-level SIMD assembly (AVX2, SSSE3)effort: 8Very large, ~1 weekVery large, ~1 weekpriority: lowBacklog, nice to haveBacklog, nice to havestrategy: char-classCharClassSearcher optimizationCharClassSearcher optimizationtype: enhancementImprove existing featureImprove existing feature
Description
Summary
Add SIMD-accelerated search for CharClassSearcher when matches are sparse (>64 non-matching bytes between matches).
Background
In PR #63, we evaluated SIMD for char_class patterns but found it slower for the common case where matches are frequent (30-50% of positions). However, SIMD can provide significant speedup when:
- Searching for rare characters in large text
- Patterns like
[0-9]+in mostly alphabetic text - Log parsing where keywords are sparse
Proposed Implementation
Hybrid Approach
func (s *CharClassSearcher) SearchAt(haystack []byte, at int) (int, int, bool) {
// For sparse search (gap > 64 bytes), use SIMD to find first match
if s.sparseHint && len(haystack)-at > 64 {
pos := simd.MemchrInTable(haystack[at:], &s.membership)
if pos == -1 {
return -1, -1, false
}
at += pos
}
// Then use scalar for match extent
// ...
}SIMD Functions (already implemented)
simd.MemchrWord(haystack)- find first[A-Za-z0-9_]simd.MemchrNotWord(haystack)- find first non-word charsimd.MemchrInTable(haystack, table)- generic 256-byte table lookup
Acceptance Criteria
- Detect sparse vs dense patterns (heuristic or adaptive)
- Use SIMD
MemchrInTablefor sparse search - Benchmark: >2x speedup on sparse patterns
- No regression on dense patterns (existing tests)
References
- PR perf: streaming state machine for CharClassSearcher FindAll #63: Streaming state machine for CharClassSearcher
simd/memchr_class_amd64.s: AVX2 implementation ready- Rust regex approach: Uses DFA with SIMD byte classification
Priority
Low - Current streaming implementation is already 4.9x faster than stdlib. This is an optimization for edge cases.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
arch: amd64x86-64 specific (AVX2, SSSE3)x86-64 specific (AVX2, SSSE3)area: simdLow-level SIMD assembly (AVX2, SSSE3)Low-level SIMD assembly (AVX2, SSSE3)effort: 8Very large, ~1 weekVery large, ~1 weekpriority: lowBacklog, nice to haveBacklog, nice to havestrategy: char-classCharClassSearcher optimizationCharClassSearcher optimizationtype: enhancementImprove existing featureImprove existing feature