feat(diarization): cross-chunk + recovery speaker consistency (epic #71)#72
Open
fmasi wants to merge 6 commits into
Open
feat(diarization): cross-chunk + recovery speaker consistency (epic #71)#72fmasi wants to merge 6 commits into
fmasi wants to merge 6 commits into
Conversation
Reconciliation was computed then silently discarded: ChunkProcessor tags
segment speakers with a source prefix ("Local Speaker 1") before storing,
but the finalize() lookup keyed on those prefixed names against a bare-keyed
reconciliation map, always missing. Local mic embeddings were also never
persisted, so local speakers could never be reconciled.
- ChunkSession: add backward-compatible local_speaker_database to ProcessedChunk
- ChunkProcessor: persist mic pool as localSpeakerDatabase
- SpeakerReconciler: extract reconcile(databases:) core; keep reconcile(chunks:)
as a thin wrapper over the remote pool (existing behavior unchanged)
- SpeakerLabeler (new Core helper): reconcile local/remote pools separately,
strip source prefixes for lookup, and assign readable per-source display names
- TranscriptionRunner.finalize(): use SpeakerLabeler; drop redundant tagging
Fixes #64
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a diarization smoothing pass (SpeakerAssignment.smoothDiarization) that collapses sub-threshold diarized turns into their temporally-dominant neighbor and merges adjacent same-speaker turns, run at both consumer call sites before buildSpeakerMap/assign. This stops spurious <0.5s fragments from becoming their own "Speaker N" box and inflating the speaker count. Threshold is wired through Config.minSpeakerTurnDuration (min_speaker_turn_duration), defaulting to 0.5s. Fixes #65 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add optional clustering-threshold and speaker-count knobs to Config (diarization_clustering_threshold / min / max / exact speakers) and thread them into FluidAudioDiarizer via a DiarizationTuning struct. When all fields are nil the diarizer config is unchanged, preserving current behavior. Fixes #66 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a Speakers control to the session-name dialog with Auto (default), At least N, and Exactly N modes. Auto imposes no constraint; "At least" maps to the SDK minSpeakers floor (never caps); "Exactly" forces a hard count and is shown with a warning that extra speakers will be merged. Introduces SpeakerSelection plus DiarizationTuning.applying(_:) in Core and threads the selection from the dialog through MenuView into the runner's refreshDiarizer. Auto is behavior-identical to before and the diarizerOverridden mock-injection guard is untouched. Fixes #67 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… configurable Add optional `reconciliation_threshold` (default 0.65) and `reconciliation_ema_alpha` (default 0.9) Config fields and thread them through SpeakerLabeler.label and SpeakerReconciler.reconcile. The previously hardcoded EMA alpha is now a parameter. Defaults preserve exact current behavior when the Config fields are nil. Fixes #69 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Multi-segment recordings recovered after a crash were transcribed and diarized independently per segment, so "Speaker 1" in one segment was unrelated to the next. Apply SpeakerLabeler cross-segment reconciliation (the #64 chunked-path fix) in run() when there is more than one recovery segment. Single-segment recordings (the common path, incl. the CLI transcribe command) keep their inline labels unchanged. Timing is preserved by sharing one meetingStart as every chunk's startTime, so the reconciler's chunk offset is 0 and segment times pass through untouched. Fixes #70 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Diarization & speaker-tagging consistency work — epic #71. Targets two user-reported symptoms: too many speaker boxes, and speaker identities not staying consistent across chunks/sections.
Stacked on
fix/v0.7.x-review-followups(#63) so the diff shows only diarization changes; retarget tomainonce #63 merges.Landed (all of v0.7.x scope)
Local/Remote Speaker Nlabels via newSpeakerLabeler. +5 tests.min_speaker_turn_duration. +8 tests.Config(diarization_clustering_threshold,diarization_min/max/exact_speakers); all-nil preserves behavior. +5 tests.reconciliation_threshold/reconciliation_ema_alpha). +7 tests.run()path now reconciles speakers across recovery segments (reusesSpeakerLabeler); gated to multi-segment so the single-file/CLI path is byte-for-byte unchanged; timing preserved. +1 test.Deferred
Test plan
swift test --filter TranscriberTestsgreen (492/492),swift buildclean🤖 Generated with Claude Code