Skip to content

Perf: cache class-name resolution and skip redundant return-type body scans#62

Merged
dereuromark merged 1 commit into
masterfrom
perf-cache-classname-and-skip-body-scans
May 13, 2026
Merged

Perf: cache class-name resolution and skip redundant return-type body scans#62
dereuromark merged 1 commit into
masterfrom
perf-cache-classname-and-skip-body-scans

Conversation

@dereuromark
Copy link
Copy Markdown
Contributor

Summary

Two small changes that cut whole-codebase composer cs-check time by ~30% and the slowest single file by ~50% on real-world large CakePHP code, with no behaviour change.

1. AbstractSniff: per-file cache for class-name resolution

getClassNameWithNamespace() calls $phpcsFile->findPrevious([T_CLASS, T_TRAIT, T_INTERFACE, T_ENUM], $lastToken) — i.e. it scans backwards from the last token of the file to find the class declaration, which sits at the top. The result is stable for the lifetime of a phpcs run on a given file.

It is invoked indirectly by every sniff that needs the current class name, once per T_FUNCTION:

  • MethodTypeHintSniff::process -> getCurrentClassName()
  • ReturnTypeHintSniff::assertNotThisOrStatic -> getClassNameWithNamespace()
  • DocBlockReturnSelfSniff::fixClassToThis -> getClassName()
  • FullyQualifiedClassNameInDocBlockSniff::findInSameNameSpace -> getNamespace()
  • Psr4Sniff::process -> getClassName()

On an 11k-line controller with ~100 methods, that is hundreds of full-file walks per file. Caching by filename turns this into one walk per file. Cache key is $phpcsFile->getFilename(); entries live for the lifetime of the worker process, which is bounded by the file set assigned to it.

2. DocBlockReturnSelfSniff: pre-filter before expensive return-type scan

getReturnTypes() walks the entire function body looking for return statements. The original process() called it unconditionally for every method with a return docblock entry. But the three assertions guarded by it only fire when the annotation contains self, $this, or the own class name — so for plain types like void, int, string, array<...>, all that work was wasted.

Added a cheap pre-filter on the parts array before the body scan. Skips the scan for the common case.

Benchmarks

Measured on a 1095-file CakePHP 5 app whose biggest controller is ResidentsController.php at 11,111 lines / ~100 actions, on squizlabs/php_codesniffer:^4.0.1 and slevomat/coding-standard:^8.23.0:

Run Before After Delta
parallel=1, ResidentsController.php alone 40s 20s -50%
parallel=16, whole codebase (wall) 2m08s 1m30s -30%
parallel=16, whole codebase (CPU) 5m55s 4m37s -22%

Top entries in the perf report on the slowest file went from:

  • before: PhpCollective.Commenting.DocBlockReturnSelf 5.86s (32%), PhpCollective.Classes.MethodTypeHint 3.27s (18%)
  • after: both dropped out of the top 22 entirely

The 50% gain on the biggest file is the key win — it eliminates the load-imbalance bottleneck that previously prevented parallel=16 from scaling much beyond 2.7x.

Verification

  • Existing PHPUnit suite passes (100 tests / 122 assertions).
  • composer cs-check and composer stan on this repo: clean.
  • Output diff on a real 1095-file codebase: identical violations reported before vs after.
  • Smoke test on a hand-crafted file exercising self, $this, and fully-qualified class name in return docblocks: identical 3 errors reported before vs after, with both fixable markers preserved.

AbstractSniff::getClassNameWithNamespace() and getClassName() walk the
token stream backwards from the last token of the file on every call,
and are invoked by several sniffs once per T_FUNCTION
(MethodTypeHintSniff, ReturnTypeHintSniff, DocBlockReturnSelfSniff,
FullyQualifiedClassNameInDocBlockSniff, Psr4Sniff). On large files with
many methods this becomes O(methods * file_size). Cache the result by
filename so each file resolves at most once.

DocBlockReturnSelfSniff::process() called getReturnTypes() unconditionally
for every method with a return annotation. getReturnTypes() walks the
entire function body looking for return statements. Since the three
assertions guarded by it only fire when the annotation contains 'self',
'$this', or the own class name, add a cheap pre-filter that skips the
body scan for plain types like 'void', 'int', 'string', 'array<...>',
etc.

Measured on a real-world ~11k-line CakePHP controller with ~100 actions:

  parallel=1, single file:        40s -> 20s   (-50%)
  parallel=16, whole codebase:   128s -> 90s   (-30%)
  Total CPU across all workers:  355s -> 277s  (-22%)

Existing test suite (100 tests / 122 assertions) passes unchanged.
@dereuromark dereuromark merged commit fe9726f into master May 13, 2026
4 checks passed
@dereuromark dereuromark deleted the perf-cache-classname-and-skip-body-scans branch May 13, 2026 19:01
dereuromark added a commit that referenced this pull request May 13, 2026
…dentSniff (#63)

* Use O(1) conditions lookups in ConsistentIndentSniff

Three helpers in ConsistentIndentSniff (isInsideClosure,
isInsideSwitchCase, couldBeInCaseBlock) walked the token stream
backwards up to 2000 / 500 / 500 tokens for every indented line.

The tokens they searched for are already tracked in the per-token
'conditions' map built by the phpcs PHP tokenizer:

  - T_CLOSURE: PHP::processAdditional() rewrites the conditions
    of every token inside an anonymous function from T_FUNCTION
    to T_CLOSURE (PHP.php around line 2840).
  - T_SWITCH / T_CASE / T_DEFAULT: all scope openers, so
    createLevelMap() adds them to inner tokens' conditions.

So the inner-loop scans were redundant. Replace with O(depth) walks
of the conditions array (typically <10 entries). T_FN is the one
exception — arrow functions get scope_opener/scope_closer but don't
propagate to inner tokens' conditions — so keep a bounded fallback
scan for it.

Removed the outdated "Case blocks don't show up in conditions"
comment; they do.

Measured on the same workload as #62 (CakePHP 5 app, 1095 files,
biggest controller 11k lines):

  parallel=1, single file:       20s -> 16.7s
  parallel=16, whole codebase:  1m30s -> 1m16s

Sniff perf on the slowest file: 4.12s (24%) -> 1.09s (8%).

Existing test suite (100 tests / 122 assertions) passes unchanged.
Output diff across 1095 real-world files: identical violations.

* Cache T_FN scope ranges instead of bounded backward scan

Initial draft used a 500-token backward scan to find a containing
T_FN (arrow function) when the conditions-based fast path didn't
match. Codex review correctly flagged that as a behaviour regression:
arrow function bodies can legitimately exceed 500 tokens (a long
match expression, a deeply nested array as the body, a long method
chain), and the bounded scan would then fail to detect that we're
inside the arrow and could emit false-positive over-indentation
errors on lines the original implementation correctly skipped.

Replace the bounded scan with a per-file cache of [scope_opener,
scope_closer] ranges for every T_FN token. Built lazily on first use
in O(file) and checked in O(arrow_count) per query (typically a
handful of arrows per file). No scan limit, no false negatives.

Cache stores the token count alongside the ranges so it self-
invalidates if the file is re-tokenized inside a phpcbf fix loop.

Net effect vs the bounded-scan draft:
  parallel=1 ResidentsController.php:  16.7s -> 6.5s
  parallel=16 whole codebase wall:     1m16s -> 27.9s

The big extra speedup comes from cheap empty-array iteration in
files with no arrow functions at all, which was previously paying
for a 500-token scan per indented line.
dereuromark added a commit that referenced this pull request May 13, 2026
UseStatementSniff has its own getUseStatements() implementation that
duplicates UseStatementsTrait::getUseStatements() (cached in #64). It
already has an instance-level cache via existingStatements, which
covers repeated calls within a single phpcs pass; the new static
cache adds coverage across phpcbf fix iterations, where
populateTokenListeners() creates fresh sniff instances per pass and
resets the instance cache.

Cache invalidation follows the same fingerprint-based scheme as #64:
token count alone is not strong enough, since an alias rename keeps
it constant. Cached entries record a content fingerprint of each
use statement range and re-verify them against the live tokens
before being trusted. The cache also refuses to serve an empty
result so it cannot return stale state for a file where a fix added
a first use statement while another simultaneous fix happened to
keep the file's overall token count unchanged.

Measured on the same CakePHP 5 app from #62 / #63 / #64 / #65
(parallel=1, --report=performance):

  PhpCollective.Namespaces.UseStatement   3.36s -> 2.08s

Existing test suite (100 tests / 122 assertions) passes unchanged.

The FQCN cache changes from the earlier draft of this PR were
dropped in favour of the cache that landed via #65.
dereuromark added a commit that referenced this pull request May 13, 2026
…#66)

UseStatementSniff has its own getUseStatements() implementation that
duplicates UseStatementsTrait::getUseStatements() (cached in #64). It
already has an instance-level cache via existingStatements, which
covers repeated calls within a single phpcs pass; the new static
cache adds coverage across phpcbf fix iterations, where
populateTokenListeners() creates fresh sniff instances per pass and
resets the instance cache.

Cache invalidation follows the same fingerprint-based scheme as #64:
token count alone is not strong enough, since an alias rename keeps
it constant. Cached entries record a content fingerprint of each
use statement range and re-verify them against the live tokens
before being trusted. The cache also refuses to serve an empty
result so it cannot return stale state for a file where a fix added
a first use statement while another simultaneous fix happened to
keep the file's overall token count unchanged.

Measured on the same CakePHP 5 app from #62 / #63 / #64 / #65
(parallel=1, --report=performance):

  PhpCollective.Namespaces.UseStatement   3.36s -> 2.08s

Existing test suite (100 tests / 122 assertions) passes unchanged.

The FQCN cache changes from the earlier draft of this PR were
dropped in favour of the cache that landed via #65.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant