perf(lexer): skip allocation in ensureLF when input has no carriage returns#1275
Closed
b-wulan wants to merge 1 commit into
Closed
perf(lexer): skip allocation in ensureLF when input has no carriage returns#1275b-wulan wants to merge 1 commit into
b-wulan wants to merge 1 commit into
Conversation
Owner
|
This optimisation doesn't seem worthwhile. Also please don't use these incredibly verbose AI PR descriptions. A rough guide is that if your PR description is many times longer than your code change, it's bad. |
Contributor
Author
|
Thank you for your time Alec, my apologies for the poor PR. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Add an early-return guard at the top of
ensureLFinregexp.gousingstrings.IndexByte(text, '\r') < 0. When the input string contains no\rbytes, the function returns immediately with the original string — no allocation, no byte-by-byte copy, nostring()construction.Why
ensureLFis called on everyTokeniseinvocation (whenEnsureLFis set, which is the default). For the overwhelming majority of real-world source files — virtually all Linux/macOS-authored code, and most modern editors on Windows — there are no carriage returns. The current implementation unconditionally allocates a[]byteof the full input length, copies every byte, and constructs a new string, even when the input is already clean. The early-return eliminates that entire allocation+copy path for the common case.strings.IndexByteis a SIMD-accelerated byte search in the Go runtime, so the check itself is O(n) but with a very small constant — significantly cheaper than the alloc+copy it replaces.How to verify
The existing
TestEnsureLFFuncandTestEnsureLFOptiontests cover both the CR-only and CRLF cases and continue to pass. To confirm the no-CR fast path explicitly:Run
go test -bench=BenchmarkEnsureLFNoCR -benchmem .before and after; allocations should drop to 0 B/op for the no-CR input.