fix: auto-recover from audio cutout caused by stale inputOutputSampleDelta#66
fix: auto-recover from audio cutout caused by stale inputOutputSampleDelta#66scottblackmgfx wants to merge 1 commit intobriankendall:masterfrom
Conversation
…Delta inputOutputSampleDelta is computed once per IO session and never adjusted. When the proxy device's sample time epoch shifts (IO restart, power management, GetZeroTimeStamp recalculation), the ring buffer fill level jumps from ~1600 to ~48 million — far beyond the 16384-frame capacity. The output proc silently reads stale zeros, causing permanent silence. Add a bounds check in outputDeviceIOProc before each Fetch: if the fill level exceeds ring capacity, recalculate inputOutputSampleDelta from the current input position and recompute startFrame on the spot. One integer comparison per audio cycle; recalculation only fires when the delta is stale. Fixes briankendall#19, briankendall#14, briankendall#43.
|
Hello, I'm glad to see you've submitted a pull request to fix this longstanding issue. I haven't been able to test in production if this fix works on my end since it's really difficult for me to reproduce the issue in the first place, because it doesn't happen consistently on any of my mac systems (which is part of the reason I've never managed to make a fix). However it's encouraging to hear that it's been working for you. However, I have a number of questions about this PR, and assuming it does fix the issue there's a lot about it that needs to be cleaned up before I can accept it, some of which seems to be related to using an AI to author it. Before I delve into any of that, though, I want to ask more about what you observed as being the root cause of the problem. When you write:
...can you clarify what you mean by this? What exactly are the conditions that cause the input and output reads and writes to go out of sync with each other? I've never observed anything related to power management being a trigger for this situation, and IO restart causes Also, if I understand correctly, what you were observing on your end was there being a sudden jump in the position the output thread was reading from the ring buffer. From my previous work on this, what I found is that the input and output data would gradually go out of time with each other until the output thread was trying to read data from the ring buffer that was gradually more and more out of its range. It wouldn't cause the audio to cut off instantly, but rather have it gradually become more distorted until finally it cut out completely (caused by the output thread gradually reading more and more off the end of the valid range of the ring buffer). So I don't think the diagnostics you added that suddenly shift Recalculating |
|
Thank you for the detailed response and for taking the time to review this. To be fully transparent: I did use AI to help write the solution, the associated tests, and diagnostics. I intentionally kept the actual recovery logic as cleanly self-contained as possible (DesyncRecovery.h is a single pure function) so that it could be selectively merged, adapted, or discarded at your discretion. Please feel free to use as much or as little of what I submitted, if at all. Part of the thinking behind the tests and diagnostics was to demonstrate the reproducibility of the bug and the reliability of the compensation logic as clearly as possible. Regardless of the methodology, this has been an effective solution for my setup, and IMO is a worthwhile addition that I've been able to verify. To address your questions: The failure I captured was a sudden, discrete jump. The fill level went from the low thousands to several million in a single cycle, not a gradual drift. You can see this in the provided logs. This is interesting given what you've described (gradual distortion leading to eventual silence), which could be a separate cause or a different manifestation of the same underlying timing issue. My fix should catch both cases: any time the fill level exceeds the ring buffer capacity, whether it happens suddenly or gradually. Regarding the root cause description, I can't say with certainty exactly what triggers the epoch shift. The "IO restart, power management" language in the description is my best guess from observing when the jumps occur during normal use. Still, I haven't definitively proven the specific macOS-level trigger. What I can confirm is that inputOutputSampleDelta becomes stale relative to the actual buffer positions, and that this happens repeatedly during normal use. My use of this since has logged hundreds of instances over the course of a day. The frequency of these corrections suggests this is also addressing the slower, gradual drifting you observed, not just the sudden jumps I initially captured. On audio discontinuity - this is a fair concern. Since submitting two days ago, I've been using this nonstop as my daily driver. In the majority of recovery cases, I cannot detect any discontinuity: the recalculation occurs before the fetch, so the corrected read position is used for that cycle's output. There are occasional, very brief audio interruptions, very likely related to this issue, but they're rare and substantially less disruptive than restarting CoreAudio after a cutout. I did sincerely use this for 11+ hours after producing that smoketest and noticed zero cutouts in that time, including during the tests, whereas previously I was mashing Alt+F11 to toggle audio devices multiple times a day. I would like to hear how this performs for others. I fully agree that this is compensation rather than a true fix. I wasn't able to identify the underlying cause of the desynchronisation. But in practice, it should resolve the issue for most users most of the time, and isn't antagonistic to a proper fix. If the root cause is ever identified and prevented, the recovery check would never trigger (and could be removed entirely, or even be initially added as optional). |
|
I look forward to a fix, regardless of whether it is a true fix or error recovery mechanism. Several times a day, I have to switch output device and switch back to restore audio output. |
I've uploaded the build I've been using on the fork I created if you want to give it a go. Any feedback would be useful data. |
|
It would certainly be good to know if scottblackmgfx's build fixes the problem for anyone else! I also think I may have figured out what was causing the issue with very gradual drift that would cause the driver to gradually stop working over the course of hours. So I'm hopeful that a combination of scottblackmgfx fix for failure states plus an improved fix for the drift will properly nip the problem in the bud. |
|
This patch has been great for me so far, it fixed every audio cut I had (which is previously fixed by changing buffer size manually every time it cut) |
First - thank you for creating this; it's my preferred option for being able to control my Scarlet 2i2 with native Mac volume controls.
I've been experiencing the seemingly random audio-cutting-out issue that others have reported. I've taken a crack at it, and I've identified the underlying issue, reliably reproduced it, and created an effective fix.
Fix: Auto-recover from audio cut-out caused by stale inputOutputSampleDelta
Fixes issues #19, #14, #43, Likely related to #62 (macOS 26) but untested on that version.
The problem
Audio randomly goes silent after minutes to hours of playback. The only recovery is to restart CoreAudio or toggle the settings. This has been the project's longest-standing bug, reported since 2021 across all hardware and macOS versions.
Root cause
inputOutputSampleDeltais computed once per IO session and never adjusted. It maps between the proxy device's clock domain (mach_absolute_time) and the hardware device's clock domain (USB clock). When the proxy device's sample time epoch shifts — due to IO restart, GetZeroTimeStamp recalculation, or macOS power management — the ring buffer's write position (mEndFrame) jumps relative to the read position (startFrame).I implemented a basic logging and diagnostics system, and captured the exact failure. Below are logs highlighting the issue:
Three seconds after the audio started, the fill level jumped from 1,609 to 47,924,021. The output proc continues fetching from the ring buffer "successfully" (no overrun flag), but it's reading from positions that were overwritten thousands of times — returning stale zeros. The result is audio cutting out silently with no error.
The fix
A single bounds check in
outputDeviceIOProc, before theFetchcall:If the fill level exceeds the ring buffer capacity, we recalculate
inputOutputSampleDeltaon the spot and recomputestartFrame— all before theFetch. Audio continues seamlessly; I do not hear any discernible interruption when the self-healing logic kicks in.The check runs every audio cycle: one integer subtraction and comparison; this is sub-nanosecond and imperceptible. The recalculation only runs when something has gone wrong, which is rare (ranging from seconds to hours).
The fix logic lives in its own file (
DesyncRecovery.h) — a pure function with no allocations, no locks, no side effects. Safe for the real-time audio thread.Verification
After identifying the issue, I produced a smoke test to invoke the 'audio-cutting-out' behaviour artificially. I added a trigger mechanism (touch a file to inject a 50M-frame delta corruption during music/video playback). Without the fix, audio cuts out. With the fix, audio continues uninterrupted.
I then formalised this into a series of tests, including a system test that demonstrates the fix in action with audio output to the default device (ensure to set the proxy as your default device before initiating):
Automated tests (
tests/):TestDesyncRecovery— Unit tests for the recovery function using exact values from the production failure logs, boundary conditions, and edge casesTestE2ERecovery— End-to-end simulation using the real AudioRingBuffer: proves injection causes permanent silence without the fix, and zero silence with itTestAudibleRecovery— Plays a sine wave through Core Audio, injects the desync mid-stream, verifies audio continuity both audibly and programmaticallyProduction soak test: Fixed driver ran for 12 hours of normal use with zero audio cut-outs. Logs showed instances of the fill level automatically recalculating when it would otherwise have become stale, resulting in a cut-out.
How to test