Skip to content

Configure native audio source from device config instead of hardcoded defaults#304

Open
MaxHeimbrock wants to merge 3 commits into
mainfrom
max/mic-samplerate-device-init
Open

Configure native audio source from device config instead of hardcoded defaults#304
MaxHeimbrock wants to merge 3 commits into
mainfrom
max/mic-samplerate-device-init

Conversation

@MaxHeimbrock

Copy link
Copy Markdown
Contributor

Problem

When the device's audio output configuration differs from the hardcoded defaults — most commonly with a Bluetooth headset — microphone capture fails with sample_rate and num_channels don't match and the RtcAudioSource metadata-mismatch warning.

Root cause: RtcAudioSource created the native (Rust) audio source with a fixed sample_rate (48000) and num_channels (2). But captured frames flow through Unity's audio graph (AudioProbe.OnAudioFilterRead) at the actual DSP output configuration. The Rust native source does not resample — NativeAudioSource::capture_frame rejects any frame whose rate/channels differ from how the source was configured.

Change

  • RtcAudioSource now reads the sample rate and channel count from Unity's output configuration (AudioSettings.GetConfiguration) instead of hardcoded defaults, falling back to the platform defaults only when Unity can't report one.
  • The base constructor exposes two overloads: a device-mode one (RtcAudioSourceType only) for capture sources whose format is whatever Unity delivers, and an explicit one (type, sampleRate, channels) for sources that generate a fixed, known format.
  • MicrophoneSource and BasicAudioSource use device mode; BasicAudioSource drops its unused channels parameter. SineWaveAudioSource (tests) declares its exact format.
  • If a frame's format still doesn't match the source (an inconsistent Unity report, or the output configuration changing at runtime), the frame is dropped with a throttled warning rather than sent as a mismatch the native side would error on.
  • Removes the redundant Microphone.Start(null, …, 44100) in the Meet sample.

Scope

This is the focused "get the format right at init" fix. It deliberately does not attempt to hot-swap the source mid-call when the audio device changes after publishing — that's a larger change (it requires recreating + republishing the track, which the binding between a track and its source handle forces) and is better handled separately, e.g. by having MicrophoneSource react to AudioSettings.OnAudioConfigurationChanged and restart capture. A device change during a call will currently drop frames (quiet audio) rather than auto-recover.

Verification

  • Compiled the Runtime, PlayModeTests, and Meet Assembly-CSharp assemblies cleanly.
  • Not yet run: live E2E PlayMode tests (require livekit-server --dev) and an on-device Bluetooth repro.

🤖 Generated with Claude Code

MaxHeimbrock and others added 3 commits June 11, 2026 17:45
The native (Rust) audio source was created with a hardcoded sample rate
(48000) and channel count (2). Microphone frames flow through Unity's
audio graph (AudioProbe) at the actual DSP output configuration, which
often differs — e.g. with a Bluetooth headset. The Rust source does not
resample; it rejects frames whose rate/channels don't match, causing the
metadata-mismatch warning and capture failures.

Read the source's sample rate and channel count from Unity's output
configuration (AudioSettings.GetConfiguration) instead of hardcoded
defaults, falling back to the defaults only when Unity can't report one.
The base constructor now exposes a device-mode overload (type only) and an
explicit overload (type, sampleRate, channels) for sources that generate a
fixed format. MicrophoneSource and BasicAudioSource use device mode;
BasicAudioSource drops its unused channels parameter. SineWaveAudioSource
declares its exact format.

If a frame's format still doesn't match (inconsistent Unity report or a
runtime output change), drop it with a throttled warning instead of
sending a mismatch the native side would error on. Also removes the
redundant Microphone.Start in the Meet sample.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Temporary, ~2s-throttled diagnostics to investigate choppy received audio:

- RtcAudioSource logs the effective capture sample rate (samples/sec by
  wall clock) vs the rate declared to the native source. A measured rate
  that differs from the declared rate means the frame format label is
  wrong, which would sound fast/slow/choppy on the receiver.
- AudioStream logs buffer fill, underrun count, callback count and frames
  received, to distinguish receive-side starvation from a clean stream.

Emitted via Utils.Info so they appear without LK_DEBUG (Utils.Debug is
compiled out unless LK_DEBUG is defined).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
MicrophoneSource started the device at the hardcoded DefaultMicrophoneSampleRate
and played the looping clip through an AudioSource read on the DSP thread. When
the device's actual rate differs from the engine output rate, the clip fills and
plays back at different rates, so the read position drifts against the write
position and the captured audio becomes choppy.

Open the microphone at AudioSettings.outputSampleRate when the device supports
it (clamped to the device's reported caps; falling back to the default when the
output rate is unknown), so capture and playback run at the same rate. This also
aligns the mic rate with the native source rate, which is taken from the same
output configuration.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant