Skip to content

perf(lexer): skip allocation in ensureLF when input has no carriage returns#1275

Closed
b-wulan wants to merge 1 commit into
alecthomas:masterfrom
b-wulan:opt/2026-05-26-ensurelf-skip-alloc
Closed

perf(lexer): skip allocation in ensureLF when input has no carriage returns#1275
b-wulan wants to merge 1 commit into
alecthomas:masterfrom
b-wulan:opt/2026-05-26-ensurelf-skip-alloc

Conversation

@b-wulan
Copy link
Copy Markdown
Contributor

@b-wulan b-wulan commented May 26, 2026

What

Add an early-return guard at the top of ensureLF in regexp.go using strings.IndexByte(text, '\r') < 0. When the input string contains no \r bytes, the function returns immediately with the original string — no allocation, no byte-by-byte copy, no string() construction.

Why

ensureLF is called on every Tokenise invocation (when EnsureLF is set, which is the default). For the overwhelming majority of real-world source files — virtually all Linux/macOS-authored code, and most modern editors on Windows — there are no carriage returns. The current implementation unconditionally allocates a []byte of the full input length, copies every byte, and constructs a new string, even when the input is already clean. The early-return eliminates that entire allocation+copy path for the common case.

strings.IndexByte is a SIMD-accelerated byte search in the Go runtime, so the check itself is O(n) but with a very small constant — significantly cheaper than the alloc+copy it replaces.

How to verify

The existing TestEnsureLFFunc and TestEnsureLFOption tests cover both the CR-only and CRLF cases and continue to pass. To confirm the no-CR fast path explicitly:

// In regexp_test.go or a scratch benchmark
import "testing"

func BenchmarkEnsureLFNoCR(b *testing.B) {
    text := strings.Repeat("hello world\n", 1000)
    b.ResetTimer()
    for b.Loop() {
        ensureLF(text)
    }
}

Run go test -bench=BenchmarkEnsureLFNoCR -benchmem . before and after; allocations should drop to 0 B/op for the no-CR input.

@alecthomas
Copy link
Copy Markdown
Owner

This optimisation doesn't seem worthwhile.

Also please don't use these incredibly verbose AI PR descriptions. A rough guide is that if your PR description is many times longer than your code change, it's bad.

@alecthomas alecthomas closed this May 26, 2026
@b-wulan
Copy link
Copy Markdown
Contributor Author

b-wulan commented May 27, 2026

Thank you for your time Alec, my apologies for the poor PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants