Output safety gates: dedup, insertion safety, sentence boundary by FuJacob · Pull Request #485 · FuJacob/cotabby

FuJacob · 2026-05-31T18:45:42Z

Summary

Moves last-mile suggestion-quality control into small, testable helpers and suppresses more classes of bad completion before they reach ghost text. The goal is the project principle that a suppressed completion beats a wrong one.

First of two stacked PRs. App-only, no engine change.

TrailingDuplicationFilter: replaces the raw hasPrefix(trailingText) guard in SuggestionTextNormalizer with a folded (lowercased, alphanumeric-only) check covering three duplication shapes, so a stray leading glyph, a case difference, or a contained suffix no longer slips a duplicate through.
InsertionSafetyGate: final gate in SuggestionTextNormalizer that rejects control characters, U+FFFD replacement glyphs, and whitespace-only output. Does not judge punctuation, so a lone ")" or "." still passes.
SentenceBoundaryClassifier: consulted by SuggestionSessionReconciler.endsInSentenceTerminator so phrase acceptance no longer stops early on decimals ("1.2"), list numbers ("1."), single-letter initials, or common abbreviations ("e.g.", "U.S.").

Typo suppression is intentionally not included here. It overlaps the more complete #353 (which adds a spell checker, correction mode, and settings), so this PR defers to that rather than shipping a weaker duplicate.

Validation

swiftlint lint --strict --quiet <touched files>          # exit 0
xcodebuild ... build-for-testing -derivedDataPath ...     # ** TEST BUILD SUCCEEDED **
xcodebuild test ... CODE_SIGNING_ALLOWED=NO \
  -only-testing:CotabbyTests/TrailingDuplicationFilterTests \
  -only-testing:CotabbyTests/InsertionSafetyGateTests \
  -only-testing:CotabbyTests/SentenceBoundaryClassifierTests
# ** TEST SUCCEEDED **  Executed 19 tests, with 0 failures

Local Team ID caveat: the default signed test run cannot load the xctest bundle locally, so the suites are run with CODE_SIGNING_ALLOWED=NO. build-for-testing passes signed.

Linked issues

None. Related: #353 (typo suppression, intentionally not duplicated here).

Risk / rollout notes

Behavior change: these gates suppress more completions than before. Thresholds are conservative (minimum overlap length, abbreviation allowlist) to avoid suppressing valid suggestions.
pbxproj migration: three new source files and three new test files, regenerated with xcodegen generate.
No settings, schema, or runtime-engine changes.

Greptile Summary

This PR introduces five new helper types that form a multi-layer output safety pipeline, tightening last-mile filtering before suggestions reach ghost text. All new types are pure functions with dedicated test suites (19 tests, 0 failures reported).

New gates in SuggestionTextNormalizer: TrailingDuplicationFilter replaces the raw hasPrefix guard with a folded (lowercased, alphanumeric-only) check across three duplication shapes; InsertionSafetyGate rejects control characters, U+FFFD glyphs, and whitespace-only output as the final step before a suggestion is surfaced.
Phrase-acceptance improvement: SentenceBoundaryClassifier is integrated into SuggestionSessionReconciler.endsInSentenceTerminator so decimals, list numbers, single-letter initials, and common abbreviations no longer cause early phrase breaks.
Runtime additions: ConfidenceSuppressionPolicy and MidWordContinuationPolicy are wired into LlamaRuntimeCore / LlamaSuggestionEngine with all new options disabled by default; the CotabbyInference package is switched to a feature branch to expose the required engine APIs.

Confidence Score: 3/5

Not safe to merge as-is: the CotabbyInference dependency is pinned to a feature branch, making all builds after a potential branch deletion or rebase silently broken.

The new helper types are well-structured and the test suite is solid, but the dependency on feat/generation-quality-controls instead of a stable ref is a real build-stability problem. Additionally, TrailingDuplicationFilter Shape 3 uses integer division that collapses the effective threshold to the same 3-character floor for short completions, which can suppress valid 4–6 character suggestions sharing a common prefix with trailing text.

project.yml and Cotabby.xcodeproj/project.pbxproj both reference the feat/generation-quality-controls branch; TrailingDuplicationFilter.swift Shape 3 threshold deserves a second look for short completion inputs.

Important Files Changed

Filename	Overview
project.yml	CotabbyInference package branch changed from main to feat/generation-quality-controls. Pinning to a feature branch makes builds non-reproducible and fragile.
Cotabby/Support/TrailingDuplicationFilter.swift	New file implementing folded duplication detection across three shapes. Shape 3 integer-division threshold can cause false-positive suppression for short 4–6 character completions sharing a 3-character prefix with trailing text.
Cotabby/Support/SentenceBoundaryClassifier.swift	New classifier disambiguating periods for phrase acceptance. Logic is correct but the abbreviation allow-list is small and will miss common cases like Prof., Jr., Sr.
Cotabby/Support/InsertionSafetyGate.swift	New final gate rejecting control characters, U+FFFD glyphs, and whitespace-only output. Logic and edge cases are sound.
Cotabby/Support/ConfidenceSuppressionPolicy.swift	New pure policy suppressing completions below a log-probability floor. Disabled by default; guard for the −∞ sentinel is correct and well-tested.
Cotabby/Support/MidWordContinuationPolicy.swift	New policy constraining the first token to a word continuation only when both sides of the caret are word characters. Narrowly scoped and well-tested.
Cotabby/Support/SuggestionTextNormalizer.swift	Integrates TrailingDuplicationFilter and InsertionSafetyGate; extracts stripThinkBlocks into a named helper. Changes are clean.
Cotabby/Support/SuggestionSessionReconciler.swift	endsInSentenceTerminator now delegates period disambiguation to SentenceBoundaryClassifier. Updated test correctly validates the new behavior.
Cotabby/Services/Runtime/LlamaRuntimeCore.swift	Adds log-probability accumulation, confidence suppression gate, forceWordContinuation wiring on both paths, and single_line flag forwarding. Defer-based KV cleanup still runs on early return.
Cotabby/Models/LlamaRuntimeModels.swift	Adds singleLine, forceWordContinuation, and confidenceFloor fields with safe defaults so all existing call sites remain unaffected.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[LlamaSuggestionEngine] --> B[LlamaRuntimeCore
generates tokens
accumulates sumLogprob]
    B --> C{ConfidenceSuppressionPolicy}
    C -- suppressed --> Z[return empty]
    C -- passes --> D[SuggestionTextNormalizer]
    D --> D1[stripThinkBlocks]
    D1 --> D2[prompt-echo strip]
    D2 --> D3{TrailingDuplicationFilter
shapes 1·2·3}
    D3 -- duplicate --> Z
    D3 -- unique --> D4[whitespace trim]
    D4 --> D5{InsertionSafetyGate}
    D5 -- unsafe --> Z
    D5 -- safe --> E[ghost text shown]
    F[SuggestionSessionReconciler] --> G{endsInSentenceTerminator}
    G -- exclamation or question --> H[stop phrase]
    G -- period --> I{SentenceBoundaryClassifier}
    I -- decimal/initial/abbrev --> J[continue phrase]
    I -- real sentence end --> H

Comments Outside Diff (1)

Cotabby/Support/SentenceBoundaryClassifier.swift, line 541-542 (link)

Abbreviation list omits several high-frequency titles and terms

The set covers basics but is missing commonly occurring terms: "prof", "jr", "sr", "corp", "dept", "ave", "blvd", "vol", "ed" (editor), "est" (established), and "ca" (circa). When any of these appear at the end of a chunk in phrase-acceptance mode, the classifier will fall through to return true and treat their trailing period as a sentence end. Because the classifier deliberately biases toward continuing on ambiguous periods (safe default), extending this list with a few more well-known cases costs nothing in correctness and reduces unnecessary early breaks.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

_{Reviews (1): Last reviewed commit: "Generation-time quality controls: token ..." | Re-trigger Greptile}

Greptile also left 2 inline comments on this PR.

Move suggestion-quality controls into shared, testable helpers and suppress more bad completions before they reach ghost text: - TrailingDuplicationFilter: folded, multi-shape after-caret duplication check replacing the raw hasPrefix guard in SuggestionTextNormalizer. - InsertionSafetyGate: reject control chars, U+FFFD, and whitespace-only output. - SentenceBoundaryClassifier: disambiguate periods (decimals, list numbers, initials, abbreviations) so phrase acceptance does not stop early. Refactor <think>-block stripping into a helper to keep normalize() under the cyclomatic-complexity threshold. Typo suppression is intentionally left to the more complete PR #353 rather than duplicated here.

Phrase-terminator detection now routes periods through SentenceBoundaryClassifier, which treats a run of single-letter initials like "U.S.A." as non-terminal. The test still asserted the pre-classifier "known limitation" (an early break at "U.S.A."), so it failed once the classifier was wired in. Update it to assert the corrected behavior: acceptance walks past the initials to the real sentence end, yielding "U.S.A. is great.". Matches SentenceBoundaryClassifierTests, which already covers the same rule.

…continuation (#488)

greptile-apps · 2026-06-01T04:15:57Z

  CotabbyInference:
    url: https://github.com/FuJacob/cotabbyinference.git


Dependency pinned to a feature branch

CotabbyInference is now locked to feat/generation-quality-controls instead of main. Branch references are not content-addressed: if that branch is deleted, rebased, or diverges after this PR is merged, every subsequent build will silently fail to resolve the package. The same change appears in project.pbxproj, so both the YAML spec and the generated project file are affected. This should be resolved before merge — either by merging the inference-engine PR first and pointing back to main, or by pinning to a concrete commit SHA that survives branch lifecycle events.

greptile-apps · 2026-06-01T04:15:58Z

+        // common "model re-emits the next few words" case where the two diverge only after a while.
+        let overlap = commonPrefixLength(foldedCompletion, foldedTrailing)
+        return overlap >= max(minimumFoldedOverlap, foldedCompletion.count / 2)


Shape 3 threshold collapses to minimumFoldedOverlap for short completions

foldedCompletion.count / 2 is integer division, so for completions of 4–6 folded characters the divisor rounds down to 2 or 3, making max(minimumFoldedOverlap, ...) always equal to minimumFoldedOverlap (3). A 4-character completion like "work" shares 3 folded characters (wor) with trailing text "world...", so 3 >= max(3, 4/2) → 3 >= 3 → suppressed, even though the two words are distinct. The intent of shape 3 is to catch re-emitted multi-word runs; that goal is better served by requiring the overlap to be strictly more than half the completion length, e.g. overlap > foldedCompletion.count / 2, or by raising minimumFoldedOverlap for this shape alone.

FuJacob mentioned this pull request May 31, 2026

Generation-time quality controls: token masks, single-line, mid-word continuation #488

Merged

FuJacob force-pushed the feat/output-safety-gates branch from 023d913 to c42c482 Compare May 31, 2026 19:13

FuJacob changed the title ~~Output safety gates: dedup, insertion safety, typo, sentence boundary~~ Output safety gates: dedup, insertion safety, sentence boundary May 31, 2026

FuJacob added 2 commits May 31, 2026 19:30

Generation-time quality controls: token masks, single-line, mid-word …

9d9db99

…continuation (#488)

FuJacob marked this pull request as ready for review June 1, 2026 04:09

FuJacob merged commit 14be489 into main Jun 1, 2026
4 checks passed

greptile-apps Bot reviewed Jun 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Output safety gates: dedup, insertion safety, sentence boundary#485

Output safety gates: dedup, insertion safety, sentence boundary#485
FuJacob merged 3 commits into
mainfrom
feat/output-safety-gates

FuJacob commented May 31, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

Uh oh!

greptile-apps Bot Jun 1, 2026

Uh oh!

greptile-apps Bot Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		CotabbyInference:
		url: https://github.com/FuJacob/cotabbyinference.git

Uh oh!

Conversation

FuJacob commented May 31, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

Linked issues

Risk / rollout notes

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Flowchart

Comments Outside Diff (1)

Uh oh!

Uh oh!

greptile-apps Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

FuJacob commented May 31, 2026 •

edited by greptile-apps Bot

Loading