Skip to content

Reduce allocations in parseLog / parseDiff via Substring and Scanner #2

@poolcamacho

Description

@poolcamacho

Context

GitService.parseDiff(_:), the log parser, and the blame / status parsers all use String.components(separatedBy:), which copies every line into a new String. That cost grows linearly with history size (large git log runs) and diff size (large changesets), and shows up in instruments as a lot of small heap allocations.

Suggested by feedback on r/swift.

Proposal

Two-step refactor:

  1. Switch to Substring views by replacing components(separatedBy:) with split(separator:). Substring is a non-owning view over the original String, so no copy. Most of the parsers only need slices for matching prefixes (@@, +, -, ...).

  2. Use Scanner (or a hand-rolled state machine over UnicodeScalarView) for the parsers that walk character by character — particularly the hunk-header parser (@@ -oldStart,oldCount +newStart,newCount @@) and the blame porcelain header. Scanner fits well here because the format is line-anchored and structured.

Files in scope

  • Services/GitService.swiftparseDiff, parseBlame, log, status, branches, parseBranchFromRefs
  • Services/ConflictParser.swift — currently uses components(separatedBy:) over the whole file content

Acceptance criteria

  • No behavioural change in the parsed output (existing call sites keep working)
  • Measurable reduction in allocation count when parsing a git log of 200+ commits or a diff over a few hundred KB (Instruments → Allocations template)
  • Build still passes, SwiftLint still clean

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions