-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Labels
area: nfaThompson NFA, PikeVM, BoundedBacktrackerThompson NFA, PikeVM, BoundedBacktrackerpriority: mediumNormal priorityNormal prioritytype: performanceSpeed/memory improvement or regressionSpeed/memory improvement or regression
Description
Problem
When multiple compiled patterns are used on large inputs, memory usage becomes excessive. On LangArena's LogParser benchmark (13 patterns × 7MB text), coregex uses 237MB vs stdlib 28MB (8.5x more) vs Rust 11MB (21x more).
Root Causes
- BoundedBacktracker visited tables: Each
FindAllStringIndexcall allocates a visited table sizedlen(haystack) * nfa_states. With uint16 generation counters, each entry is 2 bytes. For 7MB × 35 states = 245MB per pattern scan. - Lazy DFA cache: Each compiled pattern has its own DFA state cache. 13 patterns × independent caches.
- No sharing: Unlike Rust, visited tables and DFA caches are not shared across patterns.
Proposed Solutions
- Visited table reuse via sync.Pool — Pool visited tables by size class, reuse across
FindAllcalls - Visited table cap — Limit visited table to search span, not full haystack (partially done in v0.12.6 span fix, but could be more aggressive)
- Shared DFA cache pool — Allow multiple compiled patterns to share a DFA cache pool
- 1-bit visited with lazy reset — Use 1-bit per entry with segmented lazy clear (avoid O(n) full clear)
Impact
- LogParser: 237MB → ~30-50MB target
- Any workload with multiple patterns on large inputs
References
- Issue Not success usage in my project #124 (original report by @kostya)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area: nfaThompson NFA, PikeVM, BoundedBacktrackerThompson NFA, PikeVM, BoundedBacktrackerpriority: mediumNormal priorityNormal prioritytype: performanceSpeed/memory improvement or regressionSpeed/memory improvement or regression