Add source@v1 role for audio input devices#52
Add source@v1 role for audio input devices#52rudyberends wants to merge 5 commits intoSendspin:mainfrom
Conversation
|
This would be amazing! One thing that I’ve been toying with on my current setup (turntable preamp->ADC->ffmpeg->icecast) is using an audio fingerprinting library or service to identify what’s playing on the turntable, and injecting the metadata into the stream. If it’s in the scope of what Sendspin is intended to do, it would be awesome to support that natively. edit: or would this be better handled by the server? |
|
Any updates or comments? Btw, believe I read somewhere they where considering calling this new type of client-role for "sender"? Ping @maximmaxim345 and @marcelveldt
For reference and if interested in discussing this further then please also see these related discussions and requests as well: |
|
There isn't much progress into getting a source or sender role into the specification yet, since it would be nice to have a working implementation in Music Assistant first (so we can figure out issues with the specification of a new role before it's part of the spec). Right now it's rather convoluted to add new roles in |
maximmaxim345
left a comment
There was a problem hiding this comment.
Thanks for the proposal and reference implementations @rudyberends !
One major thing missing with this role is sending of the base64 encoded header for opus and flac.
I think the most consistent way to solve this is to create a copy of the stream messages:
input_stream/startinput_stream/request-format(this can replace the format section ofserver/command.source)input_stream/end
In case we ever have another role that sends data from the client to the server, these input_stream messages can be reused.
README.md
Outdated
| - `format`: object - capture/encode format used by this source | ||
| - `codec`: 'opus' | 'flac' | 'pcm' - codec identifier | ||
| - `channels`: integer - number of channels (e.g., 1 = mono, 2 = stereo) | ||
| - `sample_rate`: integer - sample rate in Hz (e.g., 44100, 48000) | ||
| - `bit_depth`: integer - bit depth (e.g., 16, 24) |
There was a problem hiding this comment.
I think we should expand this to list of supported formats, just like we do now with the player role.
Then servers can show a dropdown of supported formats.
Just sending a format server/command is a gamble since we don't know if the client supports that exact format or not, potentially causing user confusion.
There was a problem hiding this comment.
updated the spec to match the player role
| - Bytes 1-8: timestamp (big-endian int64) - server clock time in microseconds when the first sample was captured | ||
| - Rest of bytes: encoded audio frame | ||
|
|
||
| The timestamp indicates when the first audio sample in this chunk was captured (in server time domain). The server may resample/transcode and then distribute the audio to players with its normal buffering and synchronization strategy. |
There was a problem hiding this comment.
Since timestamps are in the server time domain (which may not be 100% accurate, and potential clock drift of the ADC), lets add a disclaimer or note for server implementations that the timestamps may not be continuous.
But the data itself should still be continuous.
There was a problem hiding this comment.
Good point — added a note to the spec clarifying that source timestamps are derived from client clock offset and may show small discontinuities/drift, while the sample stream itself should remain continuous.
| - `supported_commands`: string[] - subset of: 'play' | 'pause' | 'stop' | 'next' | 'previous' | 'volume' | 'mute' | 'repeat_off' | 'repeat_one' | 'repeat_all' | 'shuffle' | 'unshuffle' | 'switch' | 'select_source' | ||
| - `volume`: integer - volume of the whole group, range 0-100 | ||
| - `muted`: boolean - mute state of the whole group | ||
| - sources?: object[] - list of available/known sources on the server |
There was a problem hiding this comment.
Lets remove the select_source command from this PR.
If we include this, this should rather be part of future role since this adds quite a lot of data for basic controller use cases.
Just an idea: Maybe that future role will also allow you to see your library and select a album or playlist for playback? But that's something for later.
There was a problem hiding this comment.
Agreed — removed select_source from this PR and left it for a future “media/inputs” role. The reference implementation has been updated accordingly (no controller command, no select/clear CLI; only source listing remains).
| - `level?`: number - optional normalized RMS/peak level (0.0-1.0), only if 'level' is supported | ||
| - `signal?`: 'unknown' | 'present' | 'absent' - optional line sensing/signal presence, only if 'line_sense' is supported |
There was a problem hiding this comment.
I like this idea!
Especially since its optional, very low powered devices can just skip the level computation.
| - `level?`: number - optional normalized RMS/peak level (0.0-1.0), only if 'level' is supported | ||
| - `signal?`: 'unknown' | 'present' | 'absent' - optional line sensing/signal presence, only if 'line_sense' is supported |
There was a problem hiding this comment.
What is the use case of unknown?
Maybe I'm missing something, but couldn't the client just set line_sense to false?
There was a problem hiding this comment.
It isn’t strictly required. We could simplify by only using present/absent, and treat signal as “unknown” when it’s omitted (or when line_sense=false).
The only reason to keep unknown is semantic clarity for clients that do support line sensing but can’t determine it yet (startup, device not ready, no samples). If we want to keep the spec minimal, dropping unknown is perfectly fine.
|
I think we should consider the synchronization implications for sources with native local output that cannot be modelled as simply having both a source and client role. Many users are drawn to Sendspin for its open-source, easy-to-implement multi-room sync [citation needed], but certain hardware inputs present a challenge:
Proposed Requirement: If we want sources with native output to remain viable in a synced environment, the spec should optionally allow sources to:
|
Thanks for the note — totally get the concern. The key point is that source@v1 is intentionally capture‑only. The client is meant to be as dumb as possible: it timestamps audio in the server time domain (using the existing time‑sync offset) and sends frames upstream. From there, the server already does what it does for every stream: buffer, resample/encode if needed, and distribute synchronized playback to the group. If the device also wants to hear its own input, the correct model is simply source + player on the same client, and the server will send the synchronized stream back to it like any other player. The reference implementation already demonstrates this: a source can be selected and played back in perfect sync across multiple clients, including the device that captured the input. So “synced playback” isn’t missing — it’s already solved by the existing server → player pipeline. What is outside scope is a source trying to keep its native local output in sync with the network stream. That would require hardware‑specific delay control and isn’t part of the source role by design. In short: capture stays dumb, server owns sync, and local playback is handled by the standard player path. |
@HarvsG also see these related discussions and requests as well as they talk more about different use case scenarios for using this feature, including ideas how a client as a appliance-like ”product” (e.g. device based on a ESP32 or a Raspberry Pi) could have two roles as both as source (capture) and player (output) at the same time: and
For a such scenario to work practically I think the client device needs to have two active roles as both as source (capture) and player (output) at the same time, as that would allow the client to send the source to the server and then get the returning syncronized stream to play locally. Hence the client device can not simply have pass-through for the local audio as then it will be impossible to syncronize. Therefore the physical client ”product” needs both inputs and outputs on the same device. While not compaible softwarewise check out the ports on existing devices like example the"UniFi PoE Audio Port" and the ”WiiM Ultra” which are soley used for a visual aid to show ideas of the type of different audio input and output ports that could be featureed on same audio input and output device in order to do both audio-capture for input source to the server and playback to output to local speakers at the same time:
|
I agree and i implemented exactly that in the reference flow: Added input_stream/start with codec_header (base64) for Opus/FLAC I also added optional source control commands in the reference implementation (play/pause/next/previous/activate/deactivate). These are advertised via source@v1_support.controls and sent as command.source.control. They’re purely optional and intended for controllable sources (e.g. networked players), while line‑in sources simply omit them. |


Summary
This PR introduces a new source@v1 role to the Sendspin protocol, allowing audio input devices (e.g. line-in, turntable preamps, HDMI, Bluetooth receivers, microphones) to be represented and selected in a consistent, protocol-native way.
The goal is to enable remote audio inputs without increasing client complexity, while keeping the Sendspin server as the single place where all heavy processing happens.
All changes are additive and backward-compatible.
Motivation
Several real-world setups require audio to enter Sendspin from a device rather than originate inside the server:
• Line-in or turntable inputs connected to speakers or satellites
• HDMI / ARC inputs from TVs
• Bluetooth receivers acting as a local input
• Voice assistant or microphone satellites forwarding captured audio
Today the protocol focuses primarily on server-originated playback streams. This PR adds an additive source@v1 role to represent audio inputs as first-class sources, while keeping the server responsible for processing and distribution.
Design overview
The source role represents a client that:
• captures audio locally
• streams it to the server
• optionally reports basic signal presence or level
• does not perform any heavy processing
The server remains fully authoritative:
• resampling, transcoding, EQ
• buffering and synchronization
• visualization and distribution to players
Sources are intentionally kept simple so they can run on constrained devices.
Input semantics
Sources explicitly describe their behavior using two orthogonal concepts:
Input type
• analog – line-level style inputs (AUX, turntable preamp)
Audio presence depends on physical user interaction.
• digital – HDMI, S/PDIF, Bluetooth, or similar
Audio is usually continuous or remotely controllable.
Activation model
• manual – cannot be reliably started remotely (e.g. turntable)
• remote – server can start/stop capture predictably
• always_on – capture is always available
This allows controllers and UIs to behave sensibly without hard-coding device assumptions.
Signal presence and feedback
For analog inputs especially, it is important to avoid “playing silence”.
Therefore the role optionally supports:
• signal: present | absent | unknown (line sensing)
• level: normalized audio level (RMS/peak)
Both are optional and only reported if the source supports them.
Controller integration
• Controllers receive a list of available sources via server/state
• A new controller command select_source allows selecting an active source for a group
• Server-local inputs can be exposed as virtual source clients, using the same model
This keeps source selection aligned with existing group control concepts.
Protocol changes (high level)
• New role: source@v1
• Additions to:
• client/hello
• client/state
• client/command
• server/state
• server/command
• Binary message allocation for source audio frames
• Controller extensions for listing and selecting sources
No existing roles or message semantics are changed.
Compatibility
• Fully backward-compatible
• Existing clients and servers can ignore the new role
• No impact on playback, grouping, timing, or synchronization
Open for feedback
This proposal is meant as a starting point.
Feedback is very welcome on:
• field naming and structure
• activation semantics
• signal/level reporting
• whether anything should be simplified or removed
If parts of this feel out of scope or misaligned with Sendspin’s direction, I’m very happy to adjust or iterate.
If helpful, I am also happy to adapt or provide reference implementations.
Thanks for the great project and taking the time to review this.