Add source@v1 role for audio input devices by rudyberends · Pull Request #52 · Sendspin/spec

rudyberends · 2025-12-16T14:28:55Z

Summary

This PR introduces a new source@v1 role to the Sendspin protocol, allowing audio input devices (e.g. line-in, turntable preamps, HDMI, Bluetooth receivers, microphones) to be represented and selected in a consistent, protocol-native way.

The goal is to enable remote audio inputs without increasing client complexity, while keeping the Sendspin server as the single place where all heavy processing happens.

All changes are additive and backward-compatible.

Motivation

Several real-world setups require audio to enter Sendspin from a device rather than originate inside the server:
• Line-in or turntable inputs connected to speakers or satellites
• HDMI / ARC inputs from TVs
• Bluetooth receivers acting as a local input
• Voice assistant or microphone satellites forwarding captured audio

Today the protocol focuses primarily on server-originated playback streams. This PR adds an additive source@v1 role to represent audio inputs as first-class sources, while keeping the server responsible for processing and distribution.

Design overview

The source role represents a client that:
• captures audio locally
• streams it to the server
• optionally reports basic signal presence or level
• does not perform any heavy processing

The server remains fully authoritative:
• resampling, transcoding, EQ
• buffering and synchronization
• visualization and distribution to players

Sources are intentionally kept simple so they can run on constrained devices.

Input semantics

Sources explicitly describe their behavior using two orthogonal concepts:

Input type
• analog – line-level style inputs (AUX, turntable preamp)
Audio presence depends on physical user interaction.
• digital – HDMI, S/PDIF, Bluetooth, or similar
Audio is usually continuous or remotely controllable.

Activation model
• manual – cannot be reliably started remotely (e.g. turntable)
• remote – server can start/stop capture predictably
• always_on – capture is always available

This allows controllers and UIs to behave sensibly without hard-coding device assumptions.

Signal presence and feedback

For analog inputs especially, it is important to avoid “playing silence”.
Therefore the role optionally supports:
• signal: present | absent | unknown (line sensing)
• level: normalized audio level (RMS/peak)

Both are optional and only reported if the source supports them.

Controller integration
• Controllers receive a list of available sources via server/state
• A new controller command select_source allows selecting an active source for a group
• Server-local inputs can be exposed as virtual source clients, using the same model

This keeps source selection aligned with existing group control concepts.

Protocol changes (high level)
• New role: source@v1
• Additions to:
• client/hello
• client/state
• client/command
• server/state
• server/command
• Binary message allocation for source audio frames
• Controller extensions for listing and selecting sources

No existing roles or message semantics are changed.

Compatibility
• Fully backward-compatible
• Existing clients and servers can ignore the new role
• No impact on playback, grouping, timing, or synchronization

Open for feedback

This proposal is meant as a starting point.

Feedback is very welcome on:
• field naming and structure
• activation semantics
• signal/level reporting
• whether anything should be simplified or removed

If parts of this feel out of scope or misaligned with Sendspin’s direction, I’m very happy to adjust or iterate.

If helpful, I am also happy to adapt or provide reference implementations.

Thanks for the great project and taking the time to review this.

jgillies · 2025-12-31T15:37:22Z

This would be amazing! One thing that I’ve been toying with on my current setup (turntable preamp->ADC->ffmpeg->icecast) is using an audio fingerprinting library or service to identify what’s playing on the turntable, and injecting the metadata into the stream. If it’s in the scope of what Sendspin is intended to do, it would be awesome to support that natively.

edit: or would this be better handled by the server?

Hedda · 2026-01-19T08:33:38Z

Any updates or comments? Btw, believe I read somewhere they where considering calling this new type of client-role for "sender"?

Ping @maximmaxim345 and @marcelveldt

Motivation

Several real-world setups require audio to enter Sendspin from a device rather than originate inside the server: • Line-in or turntable inputs connected to speakers or satellites • HDMI / ARC inputs from TVs • Bluetooth receivers acting as a local input • Voice assistant or microphone satellites forwarding captured audio

Today the protocol focuses primarily on server-originated playback streams. This PR adds an additive source@v1 role to represent audio inputs as first-class sources, while keeping the server responsible for processing and distribution.

For reference and if interested in discussing this further then please also see these related discussions and requests as well:

maximmaxim345 · 2026-01-19T16:02:51Z

There isn't much progress into getting a source or sender role into the specification yet, since it would be nice to have a working implementation in Music Assistant first (so we can figure out issues with the specification of a new role before it's part of the spec). Right now it's rather convoluted to add new roles in aiosendspin (the server library used by Music Assistant). So I'm working on rewriting parts of aiosendspin first.
After that is done, getting the Visualization role included and tested (#28) is a higher priority, but I'd also love to see a source/sender role as part of the Sendspin specification.

maximmaxim345

Thanks for the proposal and reference implementations @rudyberends !

One major thing missing with this role is sending of the base64 encoded header for opus and flac.

I think the most consistent way to solve this is to create a copy of the stream messages:

input_stream/start
input_stream/request-format (this can replace the format section of server/command.source)
input_stream/end

In case we ever have another role that sends data from the client to the server, these input_stream messages can be reused.

maximmaxim345 · 2026-01-22T08:04:55Z

README.md

+  - `format`: object - capture/encode format used by this source
+    - `codec`: 'opus' | 'flac' | 'pcm' - codec identifier
+    - `channels`: integer - number of channels (e.g., 1 = mono, 2 = stereo)
+    - `sample_rate`: integer - sample rate in Hz (e.g., 44100, 48000)
+    - `bit_depth`: integer - bit depth (e.g., 16, 24)


I think we should expand this to list of supported formats, just like we do now with the player role.
Then servers can show a dropdown of supported formats.
Just sending a format server/command is a gamble since we don't know if the client supports that exact format or not, potentially causing user confusion.

updated the spec to match the player role

maximmaxim345 · 2026-01-22T08:14:11Z

README.md

+- Bytes 1-8: timestamp (big-endian int64) - server clock time in microseconds when the first sample was captured
+- Rest of bytes: encoded audio frame
+
+The timestamp indicates when the first audio sample in this chunk was captured (in server time domain). The server may resample/transcode and then distribute the audio to players with its normal buffering and synchronization strategy.


Since timestamps are in the server time domain (which may not be 100% accurate, and potential clock drift of the ADC), lets add a disclaimer or note for server implementations that the timestamps may not be continuous.
But the data itself should still be continuous.

Good point — added a note to the spec clarifying that source timestamps are derived from client clock offset and may show small discontinuities/drift, while the sample stream itself should remain continuous.

maximmaxim345 · 2026-01-22T08:26:24Z

README.md

+  - `supported_commands`: string[] - subset of: 'play' | 'pause' | 'stop' | 'next' | 'previous' | 'volume' | 'mute' | 'repeat_off' | 'repeat_one' | 'repeat_all' | 'shuffle' | 'unshuffle' | 'switch' | 'select_source'
  - `volume`: integer - volume of the whole group, range 0-100
  - `muted`: boolean - mute state of the whole group
+  - sources?: object[] - list of available/known sources on the server


Lets remove the select_source command from this PR.
If we include this, this should rather be part of future role since this adds quite a lot of data for basic controller use cases.

Just an idea: Maybe that future role will also allow you to see your library and select a album or playlist for playback? But that's something for later.

Agreed — removed select_source from this PR and left it for a future “media/inputs” role. The reference implementation has been updated accordingly (no controller command, no select/clear CLI; only source listing remains).

maximmaxim345 · 2026-01-22T08:29:51Z

README.md

+  - `level?`: number - optional normalized RMS/peak level (0.0-1.0), only if 'level' is supported
+  - `signal?`: 'unknown' | 'present' | 'absent' - optional line sensing/signal presence, only if 'line_sense' is supported


I like this idea!
Especially since its optional, very low powered devices can just skip the level computation.

maximmaxim345 · 2026-01-22T08:32:25Z

README.md

+  - `level?`: number - optional normalized RMS/peak level (0.0-1.0), only if 'level' is supported
+  - `signal?`: 'unknown' | 'present' | 'absent' - optional line sensing/signal presence, only if 'line_sense' is supported


What is the use case of unknown?
Maybe I'm missing something, but couldn't the client just set line_sense to false?

It isn’t strictly required. We could simplify by only using present/absent, and treat signal as “unknown” when it’s omitted (or when line_sense=false).

The only reason to keep unknown is semantic clarity for clients that do support line sensing but can’t determine it yet (startup, device not ready, no samples). If we want to keep the spec minimal, dropping unknown is perfectly fine.

HarvsG · 2026-01-24T19:54:31Z

I think we should consider the synchronization implications for sources with native local output that cannot be modelled as simply having both a source and client role. Many users are drawn to Sendspin for its open-source, easy-to-implement multi-room sync [citation needed], but certain hardware inputs present a challenge:

Local Playback Conflict: Example inputs like a TV (with built-in speakers) or a turntable (via a pre-amp) are often configured to play audio locally - whilst they would be simultaneously feeding Sendspin.
Lack of Latency Control: Most consumer hardware cannot report internal timestamps or buffer/delay their local output. Consequently, the local audio will play ahead of the synchronized Sendspin network stream.
The AV-Sync Dilemma: To achieve sync, users would have to mute the native output.
- For video sources (TVs), this creates significant lip-sync issues as the video will remain ahead of the distributed audio.
- For turntables this may mean muting the best speakers in the house, or switching inputs to a sendspin client

Proposed Requirement: If we want sources with native output to remain viable in a synced environment, the spec should optionally allow sources to:

Report accurate timestamp information.
Support internal buffering/delay to align local playback with the rest of the Sendspin group.

rudyberends · 2026-01-24T22:29:03Z

I think we should consider the synchronization implications for sources with native local output that cannot be modelled as simply having both a source and client role. Many users are drawn to Sendspin for its open-source, easy-to-implement multi-room sync [citation needed], but certain hardware inputs present a challenge:

Local Playback Conflict: Example inputs like a TV (with built-in speakers) or a turntable (via a pre-amp) are often configured to play audio locally - whilst they would be simultaneously feeding Sendspin.

Lack of Latency Control: Most consumer hardware cannot report internal timestamps or buffer/delay their local output. Consequently, the local audio will play ahead of the synchronized Sendspin network stream.

The AV-Sync Dilemma: To achieve sync, users would have to mute the native output.

For video sources (TVs), this creates significant lip-sync issues as the video will remain ahead of the distributed audio.

For turntables this may mean muting the best speakers in the house, or switching inputs to a sendspin client

Proposed Requirement: If we want sources with native output to remain viable in a synced environment, the spec should optionally allow sources to:

Report accurate timestamp information.

Support internal buffering/delay to align local playback with the rest of the Sendspin group.

Thanks for the note — totally get the concern. The key point is that source@v1 is intentionally capture‑only. The client is meant to be as dumb as possible: it timestamps audio in the server time domain (using the existing time‑sync offset) and sends frames upstream.

From there, the server already does what it does for every stream: buffer, resample/encode if needed, and distribute synchronized playback to the group. If the device also wants to hear its own input, the correct model is simply source + player on the same client, and the server will send the synchronized stream back to it like any other player.

The reference implementation already demonstrates this: a source can be selected and played back in perfect sync across multiple clients, including the device that captured the input.

So “synced playback” isn’t missing — it’s already solved by the existing server → player pipeline. What is outside scope is a source trying to keep its native local output in sync with the network stream. That would require hardware‑specific delay control and isn’t part of the source role by design.

In short: capture stays dumb, server owns sync, and local playback is handled by the standard player path.

Hedda · 2026-01-25T09:03:40Z

I think we should consider the synchronization implications for sources with native local output that cannot be modelled as simply having both a source and client role. Many users are drawn to Sendspin for its open-source, easy-to-implement multi-room sync [citation needed], but certain hardware inputs present a challenge

@HarvsG also see these related discussions and requests as well as they talk more about different use case scenarios for using this feature, including ideas how a client as a appliance-like ”product” (e.g. device based on a ESP32 or a Raspberry Pi) could have two roles as both as source (capture) and player (output) at the same time:

[QUESTION] Possible solutions for use case where embedded hardware can also have physical Line-Level Audio input and act as a Resonate audio stream source? #14

and

https://github.com/orgs/music-assistant/discussions/2343

Local Playback Conflict: Example inputs like a TV (with built-in speakers) or a turntable (via a pre-amp) are often configured to play audio locally - whilst they would be simultaneously feeding Sendspin.

For a such scenario to work practically I think the client device needs to have two active roles as both as source (capture) and player (output) at the same time, as that would allow the client to send the source to the server and then get the returning syncronized stream to play locally.

Hence the client device can not simply have pass-through for the local audio as then it will be impossible to syncronize. Therefore the physical client ”product” needs both inputs and outputs on the same device.

While not compaible softwarewise check out the ports on existing devices like example the"UniFi PoE Audio Port" and the ”WiiM Ultra” which are soley used for a visual aid to show ideas of the type of different audio input and output ports that could be featureed on same audio input and output device in order to do both audio-capture for input source to the server and playback to output to local speakers at the same time:

rudyberends · 2026-01-25T15:26:13Z

Thanks for the proposal and reference implementations @rudyberends !

One major thing missing with this role is sending of the base64 encoded header for opus and flac.

I think the most consistent way to solve this is to create a copy of the stream messages:

input_stream/start

input_stream/request-format (this can replace the format section of server/command.source)

input_stream/end

In case we ever have another role that sends data from the client to the server, these input_stream messages can be reused.

I agree and i implemented exactly that in the reference flow:

Added input_stream/start with codec_header (base64) for Opus/FLAC
Added input_stream/request-format and removed format from command.source
Added input_stream/end and require input_stream/start before sending audio chunks
So the source role now mirrors the stream message pattern and is reusable for future client→server media roles.

I also added optional source control commands in the reference implementation (play/pause/next/previous/activate/deactivate). These are advertised via source@v1_support.controls and sent as command.source.control. They’re purely optional and intended for controllable sources (e.g. networked players), while line‑in sources simply omit them.
Are you OK with including this in the spec as an optional capability, or should we keep it out of the spec for now?

spec: introduce source role for streaming audio inputs to server

e958810

Hedda mentioned this pull request Jan 19, 2026

[QUESTION] Possible solutions for use case where embedded hardware can also have physical Line-Level Audio input and act as a Resonate audio stream source? #14

Open

rudyberends and others added 2 commits January 20, 2026 13:32

spec: updated spec to reflect reference implementation

7036dde

Merge branch 'main' into spec/add-source-role

e2aac33

maximmaxim345 reviewed Jan 22, 2026

View reviewed changes

This was referenced Jan 22, 2026

Added Local Audio Source Provider music-assistant/server#2356

Draft

allow specifying source ffmpeg format Sendspin/sendspin-cli#102

Merged

rudyberends added 2 commits January 25, 2026 10:11

Update source spec (formats, controller, timing notes)

05ab13d

Add input_stream messages and VAD hints for source@v1

8352361

rudyberends marked this pull request as ready for review January 27, 2026 17:57

Hedda mentioned this pull request Feb 11, 2026

[REQUEST] ESP32 Louder ESP dev board with Line-Level Audio input (to act as ADC and network streamer)? sonocotta/esp32-audio-dock#53

Open

		- `level?`: number - optional normalized RMS/peak level (0.0-1.0), only if 'level' is supported
		- `signal?`: 'unknown' \| 'present' \| 'absent' - optional line sensing/signal presence, only if 'line_sense' is supported

Conversation

rudyberends commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jgillies commented Dec 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Hedda commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maximmaxim345 commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

maximmaxim345 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

HarvsG commented Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rudyberends commented Jan 24, 2026

Uh oh!

Hedda commented Jan 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rudyberends commented Jan 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

rudyberends commented Dec 16, 2025 •

edited

Loading

jgillies commented Dec 31, 2025 •

edited

Loading

Hedda commented Jan 19, 2026 •

edited

Loading

maximmaxim345 commented Jan 19, 2026 •

edited

Loading

HarvsG commented Jan 24, 2026 •

edited

Loading

Hedda commented Jan 25, 2026 •

edited

Loading