Perf: O(1) conditions checks + cached arrow-fn scopes in ConsistentIndentSniff#63
Merged
Merged
Conversation
Three helpers in ConsistentIndentSniff (isInsideClosure,
isInsideSwitchCase, couldBeInCaseBlock) walked the token stream
backwards up to 2000 / 500 / 500 tokens for every indented line.
The tokens they searched for are already tracked in the per-token
'conditions' map built by the phpcs PHP tokenizer:
- T_CLOSURE: PHP::processAdditional() rewrites the conditions
of every token inside an anonymous function from T_FUNCTION
to T_CLOSURE (PHP.php around line 2840).
- T_SWITCH / T_CASE / T_DEFAULT: all scope openers, so
createLevelMap() adds them to inner tokens' conditions.
So the inner-loop scans were redundant. Replace with O(depth) walks
of the conditions array (typically <10 entries). T_FN is the one
exception — arrow functions get scope_opener/scope_closer but don't
propagate to inner tokens' conditions — so keep a bounded fallback
scan for it.
Removed the outdated "Case blocks don't show up in conditions"
comment; they do.
Measured on the same workload as #62 (CakePHP 5 app, 1095 files,
biggest controller 11k lines):
parallel=1, single file: 20s -> 16.7s
parallel=16, whole codebase: 1m30s -> 1m16s
Sniff perf on the slowest file: 4.12s (24%) -> 1.09s (8%).
Existing test suite (100 tests / 122 assertions) passes unchanged.
Output diff across 1095 real-world files: identical violations.
Initial draft used a 500-token backward scan to find a containing T_FN (arrow function) when the conditions-based fast path didn't match. Codex review correctly flagged that as a behaviour regression: arrow function bodies can legitimately exceed 500 tokens (a long match expression, a deeply nested array as the body, a long method chain), and the bounded scan would then fail to detect that we're inside the arrow and could emit false-positive over-indentation errors on lines the original implementation correctly skipped. Replace the bounded scan with a per-file cache of [scope_opener, scope_closer] ranges for every T_FN token. Built lazily on first use in O(file) and checked in O(arrow_count) per query (typically a handful of arrows per file). No scan limit, no false negatives. Cache stores the token count alongside the ranges so it self- invalidates if the file is re-tokenized inside a phpcbf fix loop. Net effect vs the bounded-scan draft: parallel=1 ResidentsController.php: 16.7s -> 6.5s parallel=16 whole codebase wall: 1m16s -> 27.9s The big extra speedup comes from cheap empty-array iteration in files with no arrow functions at all, which was previously paying for a 500-token scan per indented line.
This was referenced May 13, 2026
dereuromark
added a commit
that referenced
this pull request
May 13, 2026
UseStatementSniff has its own getUseStatements() implementation that duplicates UseStatementsTrait::getUseStatements() (cached in #64). It already has an instance-level cache via existingStatements, which covers repeated calls within a single phpcs pass; the new static cache adds coverage across phpcbf fix iterations, where populateTokenListeners() creates fresh sniff instances per pass and resets the instance cache. Cache invalidation follows the same fingerprint-based scheme as #64: token count alone is not strong enough, since an alias rename keeps it constant. Cached entries record a content fingerprint of each use statement range and re-verify them against the live tokens before being trusted. The cache also refuses to serve an empty result so it cannot return stale state for a file where a fix added a first use statement while another simultaneous fix happened to keep the file's overall token count unchanged. Measured on the same CakePHP 5 app from #62 / #63 / #64 / #65 (parallel=1, --report=performance): PhpCollective.Namespaces.UseStatement 3.36s -> 2.08s Existing test suite (100 tests / 122 assertions) passes unchanged. The FQCN cache changes from the earlier draft of this PR were dropped in favour of the cache that landed via #65.
dereuromark
added a commit
that referenced
this pull request
May 13, 2026
…#66) UseStatementSniff has its own getUseStatements() implementation that duplicates UseStatementsTrait::getUseStatements() (cached in #64). It already has an instance-level cache via existingStatements, which covers repeated calls within a single phpcs pass; the new static cache adds coverage across phpcbf fix iterations, where populateTokenListeners() creates fresh sniff instances per pass and resets the instance cache. Cache invalidation follows the same fingerprint-based scheme as #64: token count alone is not strong enough, since an alias rename keeps it constant. Cached entries record a content fingerprint of each use statement range and re-verify them against the live tokens before being trusted. The cache also refuses to serve an empty result so it cannot return stale state for a file where a fix added a first use statement while another simultaneous fix happened to keep the file's overall token count unchanged. Measured on the same CakePHP 5 app from #62 / #63 / #64 / #65 (parallel=1, --report=performance): PhpCollective.Namespaces.UseStatement 3.36s -> 2.08s Existing test suite (100 tests / 122 assertions) passes unchanged. The FQCN cache changes from the earlier draft of this PR were dropped in favour of the cache that landed via #65.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to #62. Replaces three expensive backward token scans in
ConsistentIndentSniffwith O(1) lookups against the pre-computedconditionsmap, plus a per-file cache of T_FN scope ranges. No behaviour change; substantially faster on large files.Background
ConsistentIndentSniffregisters onT_WHITESPACE, so it runs against every whitespace token in the file. For each token at the start of a line it called three helpers that walked the token stream backwards:isInsideClosureisInsideSwitchCasecouldBeInCaseBlockEach of those tokens IS in fact tracked by phpcs's PHP tokenizer in the per-token
conditionsmap:T_CLOSURE—PHP::processAdditional()rewrites the conditions of every inner token fromT_FUNCTIONtoT_CLOSUREfor anonymous functions (tokenizer source:PHP.phparound line 2840).T_SWITCH,T_CASE,T_DEFAULT— all scope openers inPHP::$scopeOpeners, socreateLevelMap()already adds them to inner tokens' conditions.So the inner-loop scans were redundant work. An empirical probe sniff confirms it:
The outdated comment "Case blocks don't show up in conditions, so expected might be wrong" was wrong for current phpcs; it has been removed.
Changes
isInsideClosure: O(1) walk ofconditionsforT_CLOSURE. ForT_FN, which is the one case where phpcs does not propagate the token into inner tokens' conditions, build a per-file cache of[scope_opener, scope_closer]ranges on first use (single O(file) walk, then O(arrow_count) per query — typically a handful per file). The cache stores the token count alongside the ranges so it self-invalidates if the file is re-tokenized during a phpcbf fix loop.isInsideSwitchCase: pure O(1) walk ofconditionsforT_SWITCH. The old backward scan was completely replaceable.couldBeInCaseBlock: pure O(1) walk ofconditionsforT_CASE/T_DEFAULT. Same.Why a cache and not a scan limit
An earlier draft used a 500-token bounded scan for the
T_FNfallback. Codex review correctly flagged that as a behaviour regression: arrow function bodies can legitimately exceed 500 tokens (a longmatchexpression, a deeply nested array literal as the body, a long method chain), and a bounded scan would then fail to detect that we're inside the arrow and could emit false-positive over-indentation errors on lines the original implementation correctly skipped. The cached-ranges approach is both unbounded and faster, so it strictly dominates.Benchmarks
Measured on the same 1095-file CakePHP 5 app from #62 (
ResidentsController.php, 11k LOC):parallel=1,ResidentsController.phpparallel=16, whole codebase wallSniff-level perf report on
ResidentsController.php:PhpCollective.WhiteSpace.ConsistentIndentThe full-codebase speedup is larger than just "the sniff got faster on the slow file" — much of it comes from the cheap empty-array iteration in files with no arrow functions at all, which previously paid for a backward scan per indented line.
Verification
ConsistentIndentSniffTestexercises the closure + switch/case scenarios.composer cs-checkandcomposer stanon this repo: clean.matcharrow expression): both versions correctly skip indent checks throughout — confirming the codex regression concern is addressed by the cached-ranges approach.codex review --uncommittedon the final patch: clean.