feat(diarization): cross-chunk + recovery speaker consistency (epic #71) by fmasi · Pull Request #72 · fmasi/parley

fmasi · 2026-05-20T13:29:44Z

Diarization & speaker-tagging consistency work — epic #71. Targets two user-reported symptoms: too many speaker boxes, and speaker identities not staying consistent across chunks/sections.

Stacked on fix/v0.7.x-review-followups (#63) so the diff shows only diarization changes; retarget to main once #63 merges.

Landed (all of v0.7.x scope)

[Tier 1] Cross-chunk speaker reconciliation is never applied (key mismatch + missing local embeddings) #64 — cross-chunk reconciliation now actually applies (was silently discarded: prefixed-vs-bare key mismatch). Stores per-chunk local embeddings, reconciles local/remote in separate pools, assigns deduplicated Local/Remote Speaker N labels via new SpeakerLabeler. +5 tests.
[Tier 2] Merge tiny/adjacent same-speaker diarization fragments before labeling #65 — sub-0.5s diarization fragments absorbed into dominant neighbor before labeling. Configurable min_speaker_turn_duration. +8 tests.
[Tier 2] Expose FluidAudio diarizer tuning knobs (clustering threshold / min duration / numSpeakers hint) #66 — FluidAudio diarizer tuning via Config (diarization_clustering_threshold, diarization_min/max/exact_speakers); all-nil preserves behavior. +5 tests.
[Tier 2] Optional expected-speaker-count hint in session dialog #67 — per-recording Speakers dialog control: Auto (default) / "At least N" / "Exactly N" (⚠ warned). +7 tests.
[Tier 4] Make reconciliation cosine threshold and EMA alpha configurable #69 — reconciliation cosine threshold + EMA alpha configurable (reconciliation_threshold/reconciliation_ema_alpha). +7 tests.
[Tier 4] Apply cross-speaker reconciliation to single-file / crash-recovery run() path #70 — crash-recovery run() path now reconciles speakers across recovery segments (reuses SpeakerLabeler); gated to multi-segment so the single-file/CLI path is byte-for-byte unchanged; timing preserved. +1 test.

Deferred

[Tier 3] Cross-session speaker profiles + persistent renames #68 — cross-session speaker profiles + persistent renames → v0.8 (new persistent biometric store + rename-flow embedding capture + privacy posture).

Test plan

swift test --filter TranscriberTests green (492/492), swift build clean
Manual (human): multi-chunk dual-stream recording → stable, consistent speaker boxes across sections
Manual (human): session-dialog Speakers control renders; "Exactly N" warns; "At least N" still admits extra speakers
Manual (human, optional): a crash-recovered (multi-segment) recording shows consistent speakers across segments

🤖 Generated with Claude Code

Reconciliation was computed then silently discarded: ChunkProcessor tags segment speakers with a source prefix ("Local Speaker 1") before storing, but the finalize() lookup keyed on those prefixed names against a bare-keyed reconciliation map, always missing. Local mic embeddings were also never persisted, so local speakers could never be reconciled. - ChunkSession: add backward-compatible local_speaker_database to ProcessedChunk - ChunkProcessor: persist mic pool as localSpeakerDatabase - SpeakerReconciler: extract reconcile(databases:) core; keep reconcile(chunks:) as a thin wrapper over the remote pool (existing behavior unchanged) - SpeakerLabeler (new Core helper): reconcile local/remote pools separately, strip source prefixes for lookup, and assign readable per-source display names - TranscriptionRunner.finalize(): use SpeakerLabeler; drop redundant tagging Fixes #64 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add a diarization smoothing pass (SpeakerAssignment.smoothDiarization) that collapses sub-threshold diarized turns into their temporally-dominant neighbor and merges adjacent same-speaker turns, run at both consumer call sites before buildSpeakerMap/assign. This stops spurious <0.5s fragments from becoming their own "Speaker N" box and inflating the speaker count. Threshold is wired through Config.minSpeakerTurnDuration (min_speaker_turn_duration), defaulting to 0.5s. Fixes #65 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add optional clustering-threshold and speaker-count knobs to Config (diarization_clustering_threshold / min / max / exact speakers) and thread them into FluidAudioDiarizer via a DiarizationTuning struct. When all fields are nil the diarizer config is unchanged, preserving current behavior. Fixes #66 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Add a Speakers control to the session-name dialog with Auto (default), At least N, and Exactly N modes. Auto imposes no constraint; "At least" maps to the SDK minSpeakers floor (never caps); "Exactly" forces a hard count and is shown with a warning that extra speakers will be merged. Introduces SpeakerSelection plus DiarizationTuning.applying(_:) in Core and threads the selection from the dialog through MenuView into the runner's refreshDiarizer. Auto is behavior-identical to before and the diarizerOverridden mock-injection guard is untouched. Fixes #67 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… configurable Add optional `reconciliation_threshold` (default 0.65) and `reconciliation_ema_alpha` (default 0.9) Config fields and thread them through SpeakerLabeler.label and SpeakerReconciler.reconcile. The previously hardcoded EMA alpha is now a parameter. Defaults preserve exact current behavior when the Config fields are nil. Fixes #69 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Multi-segment recordings recovered after a crash were transcribed and diarized independently per segment, so "Speaker 1" in one segment was unrelated to the next. Apply SpeakerLabeler cross-segment reconciliation (the #64 chunked-path fix) in run() when there is more than one recovery segment. Single-segment recordings (the common path, incl. the CLI transcribe command) keep their inline labels unchanged. Timing is preserved by sharing one meetingStart as every chunk's startTime, so the reconciler's chunk offset is 0 and segment times pass through untouched. Fixes #70 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

fmasi and others added 6 commits May 20, 2026 14:28

fmasi changed the title ~~feat(diarization): cross-chunk speaker consistency (epic #71)~~ feat(diarization): cross-chunk + recovery speaker consistency (epic #71) May 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(diarization): cross-chunk + recovery speaker consistency (epic #71)#72

feat(diarization): cross-chunk + recovery speaker consistency (epic #71)#72
fmasi wants to merge 6 commits into
fix/v0.7.x-review-followupsfrom
feature/diarization-consistency

fmasi commented May 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fmasi commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Landed (all of v0.7.x scope)

Deferred

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fmasi commented May 20, 2026 •

edited

Loading