Skip to content

Capture microphone by polling the clip directly (no AudioSource/OnAudioFilterRead)#305

Open
MaxHeimbrock wants to merge 2 commits into
mainfrom
max/mic-direct-clip-poll
Open

Capture microphone by polling the clip directly (no AudioSource/OnAudioFilterRead)#305
MaxHeimbrock wants to merge 2 commits into
mainfrom
max/mic-direct-clip-poll

Conversation

@MaxHeimbrock

Copy link
Copy Markdown
Contributor

Problem

MicrophoneSource captured the mic by playing its looping AudioClip through an AudioSource and tapping the DSP output in AudioProbe.OnAudioFilterRead. That path has two unsynchronized clocks — the mic hardware clock (fills the clip) and the audio-output/DSP clock (plays it back). With no sync between the write cursor and the playback read cursor, they drift, and when they cross you get periodic gaps = choppy audio. It also resampled the mic to the output rate and ran the capture/encode work on the real-time audio thread.

Change

Read the mic clip's ring buffer directly instead of playing it:

  • A main-thread coroutine polls each frame: it reads the new samples between the last read position and Microphone.GetPosition (splitting at the ring-buffer wrap so each AudioClip.GetData read is contiguous), downmixes to mono if needed, and pushes to the native source. No AudioSource, no AudioProbe, no playback cursor → no drift.
  • Capture runs on the main thread (Unity audio APIs + FFI are main-thread-friendly); the native source's internal queue absorbs the per-frame pacing jitter and re-paces to 10 ms frames for WebRTC.
  • The capture rate is resolved before start from Microphone.GetDeviceCaps (DefaultMicrophoneSampleRate clamped into the device's supported range), and the native source is created at that rate / mono via a new explicit-format RtcAudioSource(type, sampleRate, channels) constructor — so pushed frames always match the native source and never trip a rate/channel mismatch. We don't assume the requested rate is honored: we read clip.frequency and, if it differs (e.g. the device changed since construction), skip capture with a warning instead of resampling.

RtcAudioSource's existing (int channels, RtcAudioSourceType) constructor now delegates to the new explicit-format one; no behavior change for other sources. The MicrophoneSource(deviceName, sourceObject) signature is kept for compatibility (sourceObject is no longer used).

Scope / trade-offs

  • Touches only MicrophoneSource.cs and a small additive constructor in RtcAudioSource.cs. Track.cs/Participant.cs/MeetManager.cs unchanged.
  • Main-thread pacing trades a little latency/jitter (absorbed by the native queue) for safety — no real-time work on the audio thread, no DSP resample, no clock drift.
  • A mid-call device change (caps change after construction) is not auto-recovered: clip.frequency won't match and capture is skipped with a warning until the track is restarted. (Separate concern from this PR.)

Verification

  • LiveKit Runtime, PlayModeTests, and Meet Assembly-CSharp all compile clean.
  • Not yet run: live publish to a remote receiver (confirm non-choppy audio) and an on-device Bluetooth check — the one-time Utils.Info log reports clip.frequency/clip.channels/native rate for confirmation.

🤖 Generated with Claude Code

MaxHeimbrock and others added 2 commits June 12, 2026 12:06
MicrophoneSource captured by playing the looping mic AudioClip through an
AudioSource and tapping the DSP output in AudioProbe.OnAudioFilterRead. That
path has two unsynchronized clocks — the mic hardware clock (fills the clip)
and the audio-output clock (plays it) — so the read cursor drifts against the
write cursor and produces periodic gaps (choppy audio). It also resampled the
mic to the output rate and ran capture work on the real-time audio thread.

Read the mic clip's ring buffer directly instead: each frame, read the new
samples between the last read position and Microphone.GetPosition (splitting at
the ring wrap), downmix to mono, and push to the native source. No AudioSource,
no playback cursor, no drift. Capture runs on the main thread; the native
source's queue absorbs the per-frame pacing.

The capture rate is resolved before start from Microphone.GetDeviceCaps
(DefaultMicrophoneSampleRate clamped to the device's supported range) and the
native source is created at that rate/mono via a new explicit-format
RtcAudioSource constructor, so pushed frames always match it. If the device
opens at a different rate than expected, capture is skipped with a warning
rather than pushing a mismatch.

The MicrophoneSource(deviceName, sourceObject) signature is kept for
compatibility; sourceObject is no longer used.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
In some device states (observed with a Bluetooth headset) Unity misreports
the microphone clip's sample rate: clip.frequency/clip.samples claim 16000
while Microphone.GetPosition advances at the device's true ~51kHz, so the
clip is filled ~3x faster than its label. Pushing those samples labeled as
the (wrong) declared rate overfed the native source's 1-second buffer, which
then rejected ~2/3 of frames with "InvalidState - failed to capture frame".

Stop trusting clip.frequency. Configure the native source at a fixed 48kHz
mono, measure the true capture rate at startup from how fast GetPosition
advances (refined with an EMA to track slow drift), and resample the captured
audio from that measured rate to 48kHz with a streaming linear resampler
before pushing. We then push exactly 48kHz/s, matching the native drain rate,
so the buffer no longer overruns and the audio is correctly pitched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant