Skip to content

perf: PikeVM UTF-8 dot alternation overhead (3x slower than stdlib on Template::Regex) #132

@kolkov

Description

@kolkov

Problem

PikeVM compiles . (AnyCharNotNL) into ~10 UTF-8 byte-range alternation splits in the NFA. For patterns like \{\{(.*?)\}\}, PikeVM must evaluate all 10 branches at every byte position.

On LangArena's Template::Regex benchmark (50000 template vars, 50 iterations, ~10MB text), this results in ~5 billion branch evaluations. coregex takes 20.2s vs stdlib 6.5s (3x slower) vs Rust 3.8s (5.3x slower).

Stdlib avoids this because it decodes one rune at a time natively without alternation splits. Rust uses lazy DFA for forward scanning, PikeVM only for captures.

Proposed Solutions

  1. Rune-native . in PikeVM — decode UTF-8 rune directly instead of splitting into byte-range alternations. Eliminates 10x branch overhead.
  2. DFA-based ReplaceAll — for simple capture patterns like \{\{(.*?)\}\}, use lazy DFA for forward scan to find match boundaries, then PikeVM only for capture extraction within the match.
  3. Hybrid approach — ASCII fast path (single byte, no alternation) + rune decode fallback for non-ASCII.

Impact

  • Template::Regex: 20s → ~4-6s target (parity with stdlib/Rust)
  • Any pattern with .*? or .+ on large inputs
  • Affects PikeVM, BoundedBacktracker

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    area: nfaThompson NFA, PikeVM, BoundedBacktrackerpriority: highImportant for next releasetype: performanceSpeed/memory improvement or regression

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions