Skip to content

Isolate alternation branches in RegexGroupParser::walkGroupAst() to prevent non-empty/non-falsy state bleeding#5602

Merged
staabm merged 4 commits intophpstan:2.1.xfrom
phpstan-bot:create-pull-request/patch-grhfx5b
May 6, 2026
Merged

Isolate alternation branches in RegexGroupParser::walkGroupAst() to prevent non-empty/non-falsy state bleeding#5602
staabm merged 4 commits intophpstan:2.1.xfrom
phpstan-bot:create-pull-request/patch-grhfx5b

Conversation

@phpstan-bot
Copy link
Copy Markdown
Collaborator

Summary

When a regex alternation contained a zero-width assertion (like $, ^, \b) alongside a non-empty pattern (like foo), preg_match incorrectly inferred the match as non-empty-string instead of string. The root cause was in RegexGroupParser::walkGroupAst() where the alternation handler accumulated nonEmpty/nonFalsy/numeric state across branches instead of evaluating each branch independently.

Changes

  • src/Type/Regex/RegexGroupParser.php: Rewrote the #alternation handler in walkGroupAst():

    • Each branch now starts with fresh nonEmpty/nonFalsy/numeric state (all maybe) instead of inheriting state from previous branches
    • Branch results are combined with TrinaryLogic::and() — the property holds only if ALL branches agree
    • The combined alternation result is merged with the parent context using or() for nonEmpty/nonFalsy (preserves parent concatenation's positive determination) and and() for numeric (preserves parent's negative determination)
    • Changed inAlternation parameter from true to false for each branch, since the AND combination now handles cross-branch semantics that the flag was originally designed to enforce
  • tests/PHPStan/Analyser/nsrt/bug-14575.php: New regression test covering:

    • $, ^, \b anchors in alternations (the reported bug)
    • Anchor in alternation within capturing groups
    • Anchor in alternation not affecting parent concatenation (e.g., ^abc(def|$))
    • All-non-empty alternations still produce non-empty-string/non-falsy-string
    • Three-way alternation with anchor
    • Single-character alternations (a|b)
  • tests/PHPStan/Analyser/nsrt/bug-12210.php, tests/PHPStan/Analyser/nsrt/preg_match_shapes.php: Updated existing assertions from non-empty-string to non-falsy-string where the fix correctly improves type precision (patterns where all alternatives are multi-character non-falsy strings)

Root cause

The walkGroupAst() alternation handler reused the $walkResult object across alternation branches. When the first branch (e.g., foo) set nonEmpty = yes, this state carried into subsequent branches (e.g., $). Since the $ anchor returns empty string '' from getLiteralValue() but doesn't actively reset nonEmpty, the yes value persisted, making the overall alternation appear non-empty.

The inAlternation flag was a partial workaround — it suppressed nonFalsy determination within alternation branches to prevent similar bleeding, but it didn't address nonEmpty and also prevented correct nonFalsy inference when all branches genuinely were non-falsy.

Test

  • Regression test tests/PHPStan/Analyser/nsrt/bug-14575.php covers the reported bug (foo|$string) and analogous cases with ^, \b, capturing groups, parent concatenations, and multi-way alternations.
  • Probed and found no additional bugs in: the createGroupType method's alternation handling (already correct — processes alternatives independently and unions types), the isMaybeEmptyNode function (correctly identifies anchors as maybe-empty), and the subject base type computation (correctly uses walkGroupAst whose alternation handler is now fixed).

Fixes phpstan/phpstan#14575

phpstan-bot and others added 2 commits May 5, 2026 17:42
… prevent non-empty/non-falsy state bleeding

- Walk each alternation branch with fresh nonEmpty/nonFalsy/numeric state instead of carrying state from previous branches
- Combine branch results with AND (all branches must agree for the property to hold)
- Merge alternation result with parent context using OR for nonEmpty/nonFalsy (parent's positive determination is preserved) and AND for numeric (parent's negative determination is preserved)
- Pass inAlternation=false to each branch since the AND combination now handles cross-branch semantics that the flag was meant to enforce
- Update test assertions where the fix correctly improves type precision from non-empty-string to non-falsy-string (all alternatives in those patterns are multi-character non-falsy strings)
@staabm staabm requested a review from VincentLanglet May 5, 2026 18:49
@staabm
Copy link
Copy Markdown
Contributor

staabm commented May 5, 2026

the actual change is in commit 1.
commit 2 is just cleaning up the now unused parameter

Comment thread src/Type/Regex/RegexGroupParser.php
…ty/nonFalsy/numeric

When #alternation has no children, return $walkResult unchanged instead of
incorrectly propagating Yes for all three flags from the unexecuted loop.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@staabm staabm merged commit 0beee48 into phpstan:2.1.x May 6, 2026
659 checks passed
@staabm staabm deleted the create-pull-request/patch-grhfx5b branch May 6, 2026 05:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants