Fix audio interleaving and thread safety in screencapture-audio#47
Open
pasrom wants to merge 3 commits intom96-chan:mainfrom
Open
Fix audio interleaving and thread safety in screencapture-audio#47pasrom wants to merge 3 commits intom96-chan:mainfrom
pasrom wants to merge 3 commits intom96-chan:mainfrom
Conversation
Replace the concurrent global queue (.global(qos: .userInteractive)) with a dedicated serial DispatchQueue for SCStream audio output handling. The concurrent queue allows multiple audio callbacks to execute simultaneously, causing interleaved writes to stdout that corrupt the PCM byte stream and produce crackling artifacts. A serial queue with explicit .userInteractive QoS ensures callbacks execute one at a time, eliminating byte-level interleaving without requiring locks.
ScreenCaptureKit on macOS 13+ delivers non-interleaved (planar) float32 audio by default: Buffer 0: [L0, L1, ..., Ln] Buffer 1: [R0, R1, ..., Rn] The previous code used CMBlockBufferGetDataPointer which reads raw bytes sequentially, treating planar data as if it were interleaved. This causes Python consumers doing reshape(-1, 2).mean(axis=1) to average adjacent same-channel samples, destroying high-frequency content and producing metallic/robotic-sounding audio. Fix: inspect the AudioStreamBasicDescription format flags. When kAudioFormatFlagIsNonInterleaved is set, use the two-pass CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer pattern to properly read per-channel AudioBuffers and interleave them to [L0, R0, L1, R1, ...] before writing to stdout. The interleaved path is preserved as a fallback for future macOS versions that may change the default format.
Replace FileHandle.standardOutput.write(Data(...)) with direct POSIX write() calls to avoid per-callback Data allocation and Foundation overhead in the real-time audio path. - Add writeAllToStdout() helper that loops until all bytes are written, handles partial writes, and retries on EINTR (signal interruption) - Disable C stdout buffering with setbuf(stdout, nil) so PCM data reaches the pipe consumer immediately
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
[L0..Ln, R0..Rn]). The previous code usedCMBlockBufferGetDataPointerwhich reads raw bytes sequentially, treating planar data as interleaved. This causes consumers to average adjacent same-channel samples, destroying high-frequency content and producing metallic/robotic-sounding audio.DispatchQueuefor audio output handling, preventing interleaved writes to stdout from concurrent callbacks.FileHandle.standardOutput.write(Data(...))with direct POSIXwrite()calls to avoid per-callback Data allocation in the real-time audio path, and disable stdout buffering for immediate pipe delivery.Root Cause
ScreenCaptureKit delivers audio in non-interleaved (planar) format:
The old code dumped these bytes sequentially via
CMBlockBufferGetDataPointer, producing[L0..Ln, R0..Rn]on stdout. Any consumer interpreting this as interleaved[L0, R0, L1, R1, ...]would mix adjacent same-channel samples, causing metallic audio.Fix
AudioStreamBasicDescription.mFormatFlagsforkAudioFormatFlagIsNonInterleavedCMSampleBufferGetAudioBufferListWithRetainedBlockBufferto read per-channelAudioBuffers, then interleave to[L0, R0, L1, R1, ...]Test plan
swift build -c releaseon macOS 15 (Apple Silicon)Audio format: 48000 Hz, 2ch, 32-bit, flags=0x... (nonInterleaved=true)