Skip to content

feat(diarization): cross-chunk + recovery speaker consistency (epic #71)#72

Open
fmasi wants to merge 6 commits into
fix/v0.7.x-review-followupsfrom
feature/diarization-consistency
Open

feat(diarization): cross-chunk + recovery speaker consistency (epic #71)#72
fmasi wants to merge 6 commits into
fix/v0.7.x-review-followupsfrom
feature/diarization-consistency

Conversation

@fmasi

@fmasi fmasi commented May 20, 2026

Copy link
Copy Markdown
Owner

Diarization & speaker-tagging consistency work — epic #71. Targets two user-reported symptoms: too many speaker boxes, and speaker identities not staying consistent across chunks/sections.

Stacked on fix/v0.7.x-review-followups (#63) so the diff shows only diarization changes; retarget to main once #63 merges.

Landed (all of v0.7.x scope)

Deferred

Test plan

  • swift test --filter TranscriberTests green (492/492), swift build clean
  • Manual (human): multi-chunk dual-stream recording → stable, consistent speaker boxes across sections
  • Manual (human): session-dialog Speakers control renders; "Exactly N" warns; "At least N" still admits extra speakers
  • Manual (human, optional): a crash-recovered (multi-segment) recording shows consistent speakers across segments

🤖 Generated with Claude Code

fmasi and others added 6 commits May 20, 2026 14:28
Reconciliation was computed then silently discarded: ChunkProcessor tags
segment speakers with a source prefix ("Local Speaker 1") before storing,
but the finalize() lookup keyed on those prefixed names against a bare-keyed
reconciliation map, always missing. Local mic embeddings were also never
persisted, so local speakers could never be reconciled.

- ChunkSession: add backward-compatible local_speaker_database to ProcessedChunk
- ChunkProcessor: persist mic pool as localSpeakerDatabase
- SpeakerReconciler: extract reconcile(databases:) core; keep reconcile(chunks:)
  as a thin wrapper over the remote pool (existing behavior unchanged)
- SpeakerLabeler (new Core helper): reconcile local/remote pools separately,
  strip source prefixes for lookup, and assign readable per-source display names
- TranscriptionRunner.finalize(): use SpeakerLabeler; drop redundant tagging

Fixes #64

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a diarization smoothing pass (SpeakerAssignment.smoothDiarization) that
collapses sub-threshold diarized turns into their temporally-dominant neighbor
and merges adjacent same-speaker turns, run at both consumer call sites before
buildSpeakerMap/assign. This stops spurious <0.5s fragments from becoming their
own "Speaker N" box and inflating the speaker count. Threshold is wired through
Config.minSpeakerTurnDuration (min_speaker_turn_duration), defaulting to 0.5s.

Fixes #65

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add optional clustering-threshold and speaker-count knobs to Config
(diarization_clustering_threshold / min / max / exact speakers) and thread
them into FluidAudioDiarizer via a DiarizationTuning struct. When all fields
are nil the diarizer config is unchanged, preserving current behavior.

Fixes #66

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a Speakers control to the session-name dialog with Auto (default),
At least N, and Exactly N modes. Auto imposes no constraint; "At least"
maps to the SDK minSpeakers floor (never caps); "Exactly" forces a hard
count and is shown with a warning that extra speakers will be merged.

Introduces SpeakerSelection plus DiarizationTuning.applying(_:) in Core
and threads the selection from the dialog through MenuView into the
runner's refreshDiarizer. Auto is behavior-identical to before and the
diarizerOverridden mock-injection guard is untouched.

Fixes #67

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… configurable

Add optional `reconciliation_threshold` (default 0.65) and
`reconciliation_ema_alpha` (default 0.9) Config fields and thread them
through SpeakerLabeler.label and SpeakerReconciler.reconcile. The
previously hardcoded EMA alpha is now a parameter. Defaults preserve
exact current behavior when the Config fields are nil.

Fixes #69

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Multi-segment recordings recovered after a crash were transcribed and
diarized independently per segment, so "Speaker 1" in one segment was
unrelated to the next. Apply SpeakerLabeler cross-segment reconciliation
(the #64 chunked-path fix) in run() when there is more than one recovery
segment. Single-segment recordings (the common path, incl. the CLI
transcribe command) keep their inline labels unchanged. Timing is
preserved by sharing one meetingStart as every chunk's startTime, so the
reconciler's chunk offset is 0 and segment times pass through untouched.

Fixes #70

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@fmasi fmasi changed the title feat(diarization): cross-chunk speaker consistency (epic #71) feat(diarization): cross-chunk + recovery speaker consistency (epic #71) May 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant