Perf: cache class-name resolution and skip redundant return-type body scans by dereuromark · Pull Request #62 · php-collective/code-sniffer

dereuromark · 2026-05-13T18:41:00Z

Summary

Two small changes that cut whole-codebase composer cs-check time by ~30% and the slowest single file by ~50% on real-world large CakePHP code, with no behaviour change.

1. `AbstractSniff`: per-file cache for class-name resolution

getClassNameWithNamespace() calls $phpcsFile->findPrevious([T_CLASS, T_TRAIT, T_INTERFACE, T_ENUM], $lastToken) — i.e. it scans backwards from the last token of the file to find the class declaration, which sits at the top. The result is stable for the lifetime of a phpcs run on a given file.

It is invoked indirectly by every sniff that needs the current class name, once per T_FUNCTION:

MethodTypeHintSniff::process -> getCurrentClassName()
ReturnTypeHintSniff::assertNotThisOrStatic -> getClassNameWithNamespace()
DocBlockReturnSelfSniff::fixClassToThis -> getClassName()
FullyQualifiedClassNameInDocBlockSniff::findInSameNameSpace -> getNamespace()
Psr4Sniff::process -> getClassName()

On an 11k-line controller with ~100 methods, that is hundreds of full-file walks per file. Caching by filename turns this into one walk per file. Cache key is $phpcsFile->getFilename(); entries live for the lifetime of the worker process, which is bounded by the file set assigned to it.

2. `DocBlockReturnSelfSniff`: pre-filter before expensive return-type scan

getReturnTypes() walks the entire function body looking for return statements. The original process() called it unconditionally for every method with a return docblock entry. But the three assertions guarded by it only fire when the annotation contains self, $this, or the own class name — so for plain types like void, int, string, array<...>, all that work was wasted.

Added a cheap pre-filter on the parts array before the body scan. Skips the scan for the common case.

Benchmarks

Measured on a 1095-file CakePHP 5 app whose biggest controller is ResidentsController.php at 11,111 lines / ~100 actions, on squizlabs/php_codesniffer:^4.0.1 and slevomat/coding-standard:^8.23.0:

Run	Before	After	Delta
`parallel=1`, ResidentsController.php alone	40s	20s	-50%
`parallel=16`, whole codebase (wall)	2m08s	1m30s	-30%
`parallel=16`, whole codebase (CPU)	5m55s	4m37s	-22%

Top entries in the perf report on the slowest file went from:

before: PhpCollective.Commenting.DocBlockReturnSelf 5.86s (32%), PhpCollective.Classes.MethodTypeHint 3.27s (18%)
after: both dropped out of the top 22 entirely

The 50% gain on the biggest file is the key win — it eliminates the load-imbalance bottleneck that previously prevented parallel=16 from scaling much beyond 2.7x.

Verification

Existing PHPUnit suite passes (100 tests / 122 assertions).
composer cs-check and composer stan on this repo: clean.
Output diff on a real 1095-file codebase: identical violations reported before vs after.
Smoke test on a hand-crafted file exercising self, $this, and fully-qualified class name in return docblocks: identical 3 errors reported before vs after, with both fixable markers preserved.

AbstractSniff::getClassNameWithNamespace() and getClassName() walk the token stream backwards from the last token of the file on every call, and are invoked by several sniffs once per T_FUNCTION (MethodTypeHintSniff, ReturnTypeHintSniff, DocBlockReturnSelfSniff, FullyQualifiedClassNameInDocBlockSniff, Psr4Sniff). On large files with many methods this becomes O(methods * file_size). Cache the result by filename so each file resolves at most once. DocBlockReturnSelfSniff::process() called getReturnTypes() unconditionally for every method with a return annotation. getReturnTypes() walks the entire function body looking for return statements. Since the three assertions guarded by it only fire when the annotation contains 'self', '$this', or the own class name, add a cheap pre-filter that skips the body scan for plain types like 'void', 'int', 'string', 'array<...>', etc. Measured on a real-world ~11k-line CakePHP controller with ~100 actions: parallel=1, single file: 40s -> 20s (-50%) parallel=16, whole codebase: 128s -> 90s (-30%) Total CPU across all workers: 355s -> 277s (-22%) Existing test suite (100 tests / 122 assertions) passes unchanged.

…dentSniff (#63) * Use O(1) conditions lookups in ConsistentIndentSniff Three helpers in ConsistentIndentSniff (isInsideClosure, isInsideSwitchCase, couldBeInCaseBlock) walked the token stream backwards up to 2000 / 500 / 500 tokens for every indented line. The tokens they searched for are already tracked in the per-token 'conditions' map built by the phpcs PHP tokenizer: - T_CLOSURE: PHP::processAdditional() rewrites the conditions of every token inside an anonymous function from T_FUNCTION to T_CLOSURE (PHP.php around line 2840). - T_SWITCH / T_CASE / T_DEFAULT: all scope openers, so createLevelMap() adds them to inner tokens' conditions. So the inner-loop scans were redundant. Replace with O(depth) walks of the conditions array (typically <10 entries). T_FN is the one exception — arrow functions get scope_opener/scope_closer but don't propagate to inner tokens' conditions — so keep a bounded fallback scan for it. Removed the outdated "Case blocks don't show up in conditions" comment; they do. Measured on the same workload as #62 (CakePHP 5 app, 1095 files, biggest controller 11k lines): parallel=1, single file: 20s -> 16.7s parallel=16, whole codebase: 1m30s -> 1m16s Sniff perf on the slowest file: 4.12s (24%) -> 1.09s (8%). Existing test suite (100 tests / 122 assertions) passes unchanged. Output diff across 1095 real-world files: identical violations. * Cache T_FN scope ranges instead of bounded backward scan Initial draft used a 500-token backward scan to find a containing T_FN (arrow function) when the conditions-based fast path didn't match. Codex review correctly flagged that as a behaviour regression: arrow function bodies can legitimately exceed 500 tokens (a long match expression, a deeply nested array as the body, a long method chain), and the bounded scan would then fail to detect that we're inside the arrow and could emit false-positive over-indentation errors on lines the original implementation correctly skipped. Replace the bounded scan with a per-file cache of [scope_opener, scope_closer] ranges for every T_FN token. Built lazily on first use in O(file) and checked in O(arrow_count) per query (typically a handful of arrows per file). No scan limit, no false negatives. Cache stores the token count alongside the ranges so it self- invalidates if the file is re-tokenized inside a phpcbf fix loop. Net effect vs the bounded-scan draft: parallel=1 ResidentsController.php: 16.7s -> 6.5s parallel=16 whole codebase wall: 1m16s -> 27.9s The big extra speedup comes from cheap empty-array iteration in files with no arrow functions at all, which was previously paying for a 500-token scan per indented line.

UseStatementSniff has its own getUseStatements() implementation that duplicates UseStatementsTrait::getUseStatements() (cached in #64). It already has an instance-level cache via existingStatements, which covers repeated calls within a single phpcs pass; the new static cache adds coverage across phpcbf fix iterations, where populateTokenListeners() creates fresh sniff instances per pass and resets the instance cache. Cache invalidation follows the same fingerprint-based scheme as #64: token count alone is not strong enough, since an alias rename keeps it constant. Cached entries record a content fingerprint of each use statement range and re-verify them against the live tokens before being trusted. The cache also refuses to serve an empty result so it cannot return stale state for a file where a fix added a first use statement while another simultaneous fix happened to keep the file's overall token count unchanged. Measured on the same CakePHP 5 app from #62 / #63 / #64 / #65 (parallel=1, --report=performance): PhpCollective.Namespaces.UseStatement 3.36s -> 2.08s Existing test suite (100 tests / 122 assertions) passes unchanged. The FQCN cache changes from the earlier draft of this PR were dropped in favour of the cache that landed via #65.

…#66) UseStatementSniff has its own getUseStatements() implementation that duplicates UseStatementsTrait::getUseStatements() (cached in #64). It already has an instance-level cache via existingStatements, which covers repeated calls within a single phpcs pass; the new static cache adds coverage across phpcbf fix iterations, where populateTokenListeners() creates fresh sniff instances per pass and resets the instance cache. Cache invalidation follows the same fingerprint-based scheme as #64: token count alone is not strong enough, since an alias rename keeps it constant. Cached entries record a content fingerprint of each use statement range and re-verify them against the live tokens before being trusted. The cache also refuses to serve an empty result so it cannot return stale state for a file where a fix added a first use statement while another simultaneous fix happened to keep the file's overall token count unchanged. Measured on the same CakePHP 5 app from #62 / #63 / #64 / #65 (parallel=1, --report=performance): PhpCollective.Namespaces.UseStatement 3.36s -> 2.08s Existing test suite (100 tests / 122 assertions) passes unchanged. The FQCN cache changes from the earlier draft of this PR were dropped in favour of the cache that landed via #65.

dereuromark merged commit fe9726f into master May 13, 2026
4 checks passed

dereuromark deleted the perf-cache-classname-and-skip-body-scans branch May 13, 2026 19:01

dereuromark mentioned this pull request May 13, 2026

Perf: O(1) conditions checks + cached arrow-fn scopes in ConsistentIndentSniff #63

Merged

This was referenced May 13, 2026

Perf: cache UseStatementsTrait getUseStatements + bound throw class-name lookup #64

Merged

Perf: cache UseStatementSniff getUseStatements across fix iterations #66

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perf: cache class-name resolution and skip redundant return-type body scans#62

Perf: cache class-name resolution and skip redundant return-type body scans#62
dereuromark merged 1 commit into
masterfrom
perf-cache-classname-and-skip-body-scans

dereuromark commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dereuromark commented May 13, 2026

Summary

1. AbstractSniff: per-file cache for class-name resolution

2. DocBlockReturnSelfSniff: pre-filter before expensive return-type scan

Benchmarks

Verification

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. `AbstractSniff`: per-file cache for class-name resolution

2. `DocBlockReturnSelfSniff`: pre-filter before expensive return-type scan