Fix microphone capture: device-true source format + fragment-aware clip reading by MaxHeimbrock · Pull Request #308 · livekit/client-sdk-unity

MaxHeimbrock · 2026-06-12T14:17:20Z

Consolidated, cleaned-up version of the microphone capture fix (supersedes #303, #304, #305, #306 — see History).

Problem

Publishing the microphone with a Bluetooth HFP headset on macOS produced sample_rate and num_channels don't match errors from the native source, and — beyond the error — persistently choppy or garbled audio on receivers. The same headset works fine from Android.

Root causes (both proven empirically)

1. Hardcoded native source format. The native (Rust) audio source was created at a fixed 48000 Hz / 2ch, while captured frames arrive at whatever Unity actually delivers. The Rust source rejects mismatched frames; it does not resample.

2. Fragmented mic clip buffer on macOS + BT-HFP. A raw WAV dump of the clip's ring buffer (analyzed offline) showed: FMOD writes each real 20 ms packet of clip.frequency audio, then advances Microphone.GetPosition as if it had written k ≈ 3.2× as much, zero-filling the gap. The buffer holds valid fragments of exactly 320 samples at a stride of exactly 1024 (320/1024 = 1/k), and the fragments join continuously — the stream is intact, just scattered with zero padding. Hence:

any playback-based capture (AudioSource + OnAudioFilterRead) plays 31% voice + 69% padding → chop
reading at GetPosition's pace replays fragments + padding too fast over a live buffer → garbled noise

Change

RtcAudioSource: two constructors — device-mode (format resolved from Unity's output configuration) and explicit-format (type, sampleRate, channels) for sources that know their exact rate. Frames that still mismatch the configured format are dropped with a throttled warning instead of erroring natively.
MicrophoneSource: reads the clip ring buffer directly (no AudioSource, no OnAudioFilterRead — which also decouples capture from the output device's clock for all devices). A ~0.3 s pre-roll measures k = counterRate / clip.frequency and the counter's smallest discrete jump (the stride J):
- k ≈ 1 (healthy devices): plain contiguous read at the counter's pace.
- k > 1.05 (fragmented state): read only the first J/k samples of each stride — exactly the valid fragments.
- Downmix → mono, streaming-resample clip.frequency → fixed 48 kHz native source (preserves the publish-before-start contract). Stall backlog beyond 200 ms is dropped stride-aligned so the native queue can't overrun.
BasicAudioSource uses device-mode (drops its unused channels parameter — minor source-breaking change); the test SineWaveAudioSource declares its exact format; the Meet sample drops a redundant Microphone.Start.

Log signature in the bad state:

MicrophoneSource: fragmented clip detected (k=3.20); reading 320 of every 1024 samples at 16000Hz

Healthy devices log contiguous capture (k=1.00).

Verification

End-to-end, on hardware: macOS publisher with the Bluetooth headset mic → Android receiver now sounds clean, correct-pitch, and continuous (previously chopped/garbled). Reconstruction was first validated offline by dumping the buffer, concatenating fragments, and listening.
Runtime, PlayModeTests, and Meet Assembly-CSharp compile clean.
Recommended before merge: a PlayMode E2E run against a dev server and a quick healthy-mic check (expect k≈1.0, contiguous path).

Notes

Microphone.GetPosition's counter is packet-granular and can be rate-inflated on macOS; this PR never trusts it directly — only its measured average and jump size. The dump utility used for the diagnosis is PR Add AudioClipDump debugging utility (dump audio buffers to WAV) #307.
This is arguably a Unity bug worth reporting upstream (clip labeled 16 kHz, position counter at ~51 k/s, zero-padded fragment writes).
Platform Audio (native ADM capture) remains the preferred path where applications can use it; the k measurement doubles as a detector for surfacing degraded-mic states in the future.

History

The investigation went through several falsified designs, preserved on their branches: #303 (recreate + republish on mismatch), #304 (device-config init + output-rate mic open), #305 (naive direct polling — three experiments), #306 (pitch servos, then the working fragment-aware capture that this PR consolidates). The buffer dump was the decisive step.

🤖 Generated with Claude Code

…ip reading Publishing the microphone with a Bluetooth HFP headset on macOS produced "sample_rate and num_channels don't match" errors from the native source and, beyond that, persistently choppy or garbled audio on receivers. Two root causes, both fixed here: 1. The native (Rust) audio source was created with a hardcoded format (48000Hz/2ch) while captured frames arrive at whatever format the device actually delivers. The native source rejects mismatched frames (it does not resample). RtcAudioSource now has two constructors: a device-mode one that resolves the format from Unity's output configuration, and an explicit-format one for sources that know their exact rate/channels. Frames that still mismatch are dropped with a throttled warning instead of erroring natively. 2. On macOS with a Bluetooth HFP headset, Unity's Microphone clip buffer is fragmented: FMOD writes each real 20ms packet of clip.frequency audio, then advances Microphone.GetPosition as if it had written ~3.2x as much, zero-filling the skipped range. A raw buffer dump showed valid fragments of exactly 320 samples at a stride of exactly 1024 (= 1/k where k is the counter inflation), with the fragments joining continuously - the stream is intact, just scattered. Every playback-based capture strategy therefore chops (31% voice, 69% padding) and counter-paced reading garbles. MicrophoneSource now reads the clip ring buffer directly (no AudioSource, no OnAudioFilterRead - which also decouples capture from the output device's clock). A short pre-roll measures the counter rate (k = counterRate / clip.frequency) and the counter's smallest discrete jump (the stride). Healthy devices (k ~ 1) use a plain contiguous read; fragmented devices (k > 1.05) read only the first stride/k samples of each stride - exactly the valid fragments. Captured audio is downmixed to mono and resampled from clip.frequency to a fixed 48kHz native source, preserving the publish-before-start contract. Backlog beyond 200ms after a stall is dropped, stride-aligned, to avoid overrunning the native queue. Also removes the redundant Microphone.Start in the Meet sample and lets the test sine source declare its exact format explicitly. Verified end-to-end: macOS publisher with the Bluetooth headset microphone to an Android receiver now sounds clean and correct-pitch; healthy microphones take the contiguous path unchanged. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

The fragment-aware capture logic is subtle and was painful to diagnose, but most of it is pure logic that doesn't need a microphone. Extract it from MicrophoneSource into two UnityEngine-free internal classes: - MicClipReader: pre-roll measurement (counter rate k, smallest jump = stride), contiguous vs fragmented mode selection, per-stride valid-range emission, ring-wrap splitting, and stride-aligned backlog dropping. - StreamingResampler: the streaming linear resampler (state carries across chunks so fragment junctions stay continuous). MicrophoneSource.CaptureLoop becomes a thin Unity shell: poll GetPosition, feed the reader, GetData the emitted ranges, downmix, resample, push. Behavior is unchanged. Add EditMode tests covering: healthy contiguous capture (k~1, every sample emitted), fragmented detection (k=3.2, stride 1024, valid 320 - the exact structure dumped from the Sony MDR-1000X on macOS), lossless reconstruction of a synthetic fragmented buffer across multiple ring laps (strictly sequential output, no gaps/repeats/padding), stride-aligned backlog drops bounded by the limit, pre-roll emitting nothing, resampler frequency/length preservation, and chunked-equals-whole resampling (1-sample tail tolerance for float boundary rounding). Logic verified by executing all test scenarios in a standalone harness (mono) in addition to compiling the Unity assemblies. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

MaxHeimbrock · 2026-06-12T14:39:30Z

Added test coverage for the capture logic (35031cc): the fragment reconstruction and resampling logic is now extracted into UnityEngine-free internal classes (MicClipReader, StreamingResampler) with EditMode tests — including a replay of the exact fragmented structure dumped from the Sony MDR-1000X (320 valid / 1024 stride, k=3.2) asserting lossless, strictly-sequential reconstruction across multiple ring laps, plus contiguous-mode, backlog-drop, and resampler continuity tests. All scenarios were additionally executed in a standalone harness to verify behavior, not just compilation. CaptureLoop is now a thin Unity shell; behavior unchanged.

MicrophoneSource no longer attaches an AudioSource to its GameObject (it reads the mic clip directly), but the Meet sample still called GetComponent<AudioSource>()?.Stop() on unpublish. The ?. operator bypasses Unity's overloaded null-check on the editor's missing-component stub, so Stop() ran on the stub and threw MissingComponentException. Remove the obsolete call. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…eanup Field testing device transitions surfaced a false positive: right after recovering onto the healthy MacBook microphone, the pre-roll measured k=1.07 (counter startup burst while driver buffers flush) which crossed the old 1.05 threshold and engaged fragmented mode - silently discarding ~6% of real audio (heard as choppiness) until the next re-measurement. Engaging fragmented mode discards (stride - valid) samples per stride, so a false positive guarantees audio loss while a false negative only risks mild artifacts. Fix both sides of the measurement: - Raise the fragmented threshold from 1.05 to 1.5: the observed pathological device measures k=3.2, healthy devices ~1.0 plus a few percent of noise - keep a wide margin between the two. - Add a 100ms settle window that discards the counter's startup burst before the rate measurement begins. Add a regression test for the borderline case (k=1.07 must stay contiguous). Also fix the second AudioSource null-propagation site (CleanUpAllTracks via OnDestroy) with TryGetComponent - same MissingComponentException class as the unpublish path, hit because the local mic object no longer carries an AudioSource. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Generated by the editor; required for stable GUIDs when the package is imported. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

MaxHeimbrock and others added 2 commits June 12, 2026 16:16

MaxHeimbrock mentioned this pull request Jun 12, 2026

Recover microphone capture when the device disappears mid-call #309

Open

MaxHeimbrock and others added 2 commits June 12, 2026 17:19

Add Unity meta files for the new scripts

457ce3f

Generated by the editor; required for stable GUIDs when the package is imported. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix microphone capture: device-true source format + fragment-aware clip reading#308

Fix microphone capture: device-true source format + fragment-aware clip reading#308
MaxHeimbrock wants to merge 5 commits into
mainfrom
max/mic-fragment-aware-capture

MaxHeimbrock commented Jun 12, 2026

Uh oh!

MaxHeimbrock commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MaxHeimbrock commented Jun 12, 2026

Problem

Root causes (both proven empirically)

Change

Verification

Notes

History

Uh oh!

MaxHeimbrock commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant