Skip to content

perf: excessive memory usage with multiple patterns on large inputs (237MB vs 28MB stdlib) #133

@kolkov

Description

@kolkov

Problem

When multiple compiled patterns are used on large inputs, memory usage becomes excessive. On LangArena's LogParser benchmark (13 patterns × 7MB text), coregex uses 237MB vs stdlib 28MB (8.5x more) vs Rust 11MB (21x more).

Root Causes

  1. BoundedBacktracker visited tables: Each FindAllStringIndex call allocates a visited table sized len(haystack) * nfa_states. With uint16 generation counters, each entry is 2 bytes. For 7MB × 35 states = 245MB per pattern scan.
  2. Lazy DFA cache: Each compiled pattern has its own DFA state cache. 13 patterns × independent caches.
  3. No sharing: Unlike Rust, visited tables and DFA caches are not shared across patterns.

Proposed Solutions

  1. Visited table reuse via sync.Pool — Pool visited tables by size class, reuse across FindAll calls
  2. Visited table cap — Limit visited table to search span, not full haystack (partially done in v0.12.6 span fix, but could be more aggressive)
  3. Shared DFA cache pool — Allow multiple compiled patterns to share a DFA cache pool
  4. 1-bit visited with lazy reset — Use 1-bit per entry with segmented lazy clear (avoid O(n) full clear)

Impact

  • LogParser: 237MB → ~30-50MB target
  • Any workload with multiple patterns on large inputs

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    area: nfaThompson NFA, PikeVM, BoundedBacktrackerpriority: mediumNormal prioritytype: performanceSpeed/memory improvement or regression

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions