Skip to content

Add source@v1 role for audio input devices#52

Open
rudyberends wants to merge 5 commits intoSendspin:mainfrom
rudyberends:spec/add-source-role
Open

Add source@v1 role for audio input devices#52
rudyberends wants to merge 5 commits intoSendspin:mainfrom
rudyberends:spec/add-source-role

Conversation

@rudyberends
Copy link

@rudyberends rudyberends commented Dec 16, 2025

Summary

This PR introduces a new source@v1 role to the Sendspin protocol, allowing audio input devices (e.g. line-in, turntable preamps, HDMI, Bluetooth receivers, microphones) to be represented and selected in a consistent, protocol-native way.

The goal is to enable remote audio inputs without increasing client complexity, while keeping the Sendspin server as the single place where all heavy processing happens.

All changes are additive and backward-compatible.

Motivation

Several real-world setups require audio to enter Sendspin from a device rather than originate inside the server:
• Line-in or turntable inputs connected to speakers or satellites
• HDMI / ARC inputs from TVs
• Bluetooth receivers acting as a local input
• Voice assistant or microphone satellites forwarding captured audio

Today the protocol focuses primarily on server-originated playback streams. This PR adds an additive source@v1 role to represent audio inputs as first-class sources, while keeping the server responsible for processing and distribution.

Design overview

The source role represents a client that:
• captures audio locally
• streams it to the server
• optionally reports basic signal presence or level
• does not perform any heavy processing

The server remains fully authoritative:
• resampling, transcoding, EQ
• buffering and synchronization
• visualization and distribution to players

Sources are intentionally kept simple so they can run on constrained devices.

Input semantics

Sources explicitly describe their behavior using two orthogonal concepts:

Input type
• analog – line-level style inputs (AUX, turntable preamp)
Audio presence depends on physical user interaction.
• digital – HDMI, S/PDIF, Bluetooth, or similar
Audio is usually continuous or remotely controllable.

Activation model
• manual – cannot be reliably started remotely (e.g. turntable)
• remote – server can start/stop capture predictably
• always_on – capture is always available

This allows controllers and UIs to behave sensibly without hard-coding device assumptions.

Signal presence and feedback

For analog inputs especially, it is important to avoid “playing silence”.
Therefore the role optionally supports:
• signal: present | absent | unknown (line sensing)
• level: normalized audio level (RMS/peak)

Both are optional and only reported if the source supports them.

Controller integration
• Controllers receive a list of available sources via server/state
• A new controller command select_source allows selecting an active source for a group
• Server-local inputs can be exposed as virtual source clients, using the same model

This keeps source selection aligned with existing group control concepts.

Protocol changes (high level)
• New role: source@v1
• Additions to:
• client/hello
• client/state
• client/command
• server/state
• server/command
• Binary message allocation for source audio frames
• Controller extensions for listing and selecting sources

No existing roles or message semantics are changed.

Compatibility
• Fully backward-compatible
• Existing clients and servers can ignore the new role
• No impact on playback, grouping, timing, or synchronization

Open for feedback

This proposal is meant as a starting point.

Feedback is very welcome on:
• field naming and structure
• activation semantics
• signal/level reporting
• whether anything should be simplified or removed

If parts of this feel out of scope or misaligned with Sendspin’s direction, I’m very happy to adjust or iterate.

If helpful, I am also happy to adapt or provide reference implementations.

Thanks for the great project and taking the time to review this.

@jgillies
Copy link

jgillies commented Dec 31, 2025

This would be amazing! One thing that I’ve been toying with on my current setup (turntable preamp->ADC->ffmpeg->icecast) is using an audio fingerprinting library or service to identify what’s playing on the turntable, and injecting the metadata into the stream. If it’s in the scope of what Sendspin is intended to do, it would be awesome to support that natively.

edit: or would this be better handled by the server?

@Hedda
Copy link

Hedda commented Jan 19, 2026

Any updates or comments? Btw, believe I read somewhere they where considering calling this new type of client-role for "sender"?

Ping @maximmaxim345 and @marcelveldt

Motivation

Several real-world setups require audio to enter Sendspin from a device rather than originate inside the server: • Line-in or turntable inputs connected to speakers or satellites • HDMI / ARC inputs from TVs • Bluetooth receivers acting as a local input • Voice assistant or microphone satellites forwarding captured audio

Today the protocol focuses primarily on server-originated playback streams. This PR adds an additive source@v1 role to represent audio inputs as first-class sources, while keeping the server responsible for processing and distribution.

For reference and if interested in discussing this further then please also see these related discussions and requests as well:

@maximmaxim345
Copy link
Member

maximmaxim345 commented Jan 19, 2026

There isn't much progress into getting a source or sender role into the specification yet, since it would be nice to have a working implementation in Music Assistant first (so we can figure out issues with the specification of a new role before it's part of the spec). Right now it's rather convoluted to add new roles in aiosendspin (the server library used by Music Assistant). So I'm working on rewriting parts of aiosendspin first.
After that is done, getting the Visualization role included and tested (#28) is a higher priority, but I'd also love to see a source/sender role as part of the Sendspin specification.

Copy link
Member

@maximmaxim345 maximmaxim345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the proposal and reference implementations @rudyberends !

One major thing missing with this role is sending of the base64 encoded header for opus and flac.

I think the most consistent way to solve this is to create a copy of the stream messages:

  • input_stream/start
  • input_stream/request-format (this can replace the format section of server/command.source)
  • input_stream/end

In case we ever have another role that sends data from the client to the server, these input_stream messages can be reused.

README.md Outdated
Comment on lines 544 to 548
- `format`: object - capture/encode format used by this source
- `codec`: 'opus' | 'flac' | 'pcm' - codec identifier
- `channels`: integer - number of channels (e.g., 1 = mono, 2 = stereo)
- `sample_rate`: integer - sample rate in Hz (e.g., 44100, 48000)
- `bit_depth`: integer - bit depth (e.g., 16, 24)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should expand this to list of supported formats, just like we do now with the player role.
Then servers can show a dropdown of supported formats.
Just sending a format server/command is a gamble since we don't know if the client supports that exact format or not, potentially causing user confusion.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the spec to match the player role

- Bytes 1-8: timestamp (big-endian int64) - server clock time in microseconds when the first sample was captured
- Rest of bytes: encoded audio frame

The timestamp indicates when the first audio sample in this chunk was captured (in server time domain). The server may resample/transcode and then distribute the audio to players with its normal buffering and synchronization strategy.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since timestamps are in the server time domain (which may not be 100% accurate, and potential clock drift of the ADC), lets add a disclaimer or note for server implementations that the timestamps may not be continuous.
But the data itself should still be continuous.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — added a note to the spec clarifying that source timestamps are derived from client clock offset and may show small discontinuities/drift, while the sample stream itself should remain continuous.

- `supported_commands`: string[] - subset of: 'play' | 'pause' | 'stop' | 'next' | 'previous' | 'volume' | 'mute' | 'repeat_off' | 'repeat_one' | 'repeat_all' | 'shuffle' | 'unshuffle' | 'switch' | 'select_source'
- `volume`: integer - volume of the whole group, range 0-100
- `muted`: boolean - mute state of the whole group
- sources?: object[] - list of available/known sources on the server
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets remove the select_source command from this PR.
If we include this, this should rather be part of future role since this adds quite a lot of data for basic controller use cases.

Just an idea: Maybe that future role will also allow you to see your library and select a album or playlist for playback? But that's something for later.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — removed select_source from this PR and left it for a future “media/inputs” role. The reference implementation has been updated accordingly (no controller command, no select/clear CLI; only source listing remains).

Comment on lines +587 to +588
- `level?`: number - optional normalized RMS/peak level (0.0-1.0), only if 'level' is supported
- `signal?`: 'unknown' | 'present' | 'absent' - optional line sensing/signal presence, only if 'line_sense' is supported
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea!
Especially since its optional, very low powered devices can just skip the level computation.

Comment on lines +587 to +588
- `level?`: number - optional normalized RMS/peak level (0.0-1.0), only if 'level' is supported
- `signal?`: 'unknown' | 'present' | 'absent' - optional line sensing/signal presence, only if 'line_sense' is supported
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the use case of unknown?
Maybe I'm missing something, but couldn't the client just set line_sense to false?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn’t strictly required. We could simplify by only using present/absent, and treat signal as “unknown” when it’s omitted (or when line_sense=false).

The only reason to keep unknown is semantic clarity for clients that do support line sensing but can’t determine it yet (startup, device not ready, no samples). If we want to keep the spec minimal, dropping unknown is perfectly fine.

@HarvsG
Copy link

HarvsG commented Jan 24, 2026

I think we should consider the synchronization implications for sources with native local output that cannot be modelled as simply having both a source and client role. Many users are drawn to Sendspin for its open-source, easy-to-implement multi-room sync [citation needed], but certain hardware inputs present a challenge:

  • Local Playback Conflict: Example inputs like a TV (with built-in speakers) or a turntable (via a pre-amp) are often configured to play audio locally - whilst they would be simultaneously feeding Sendspin.
  • Lack of Latency Control: Most consumer hardware cannot report internal timestamps or buffer/delay their local output. Consequently, the local audio will play ahead of the synchronized Sendspin network stream.
  • The AV-Sync Dilemma: To achieve sync, users would have to mute the native output.
    • For video sources (TVs), this creates significant lip-sync issues as the video will remain ahead of the distributed audio.
    • For turntables this may mean muting the best speakers in the house, or switching inputs to a sendspin client

Proposed Requirement: If we want sources with native output to remain viable in a synced environment, the spec should optionally allow sources to:

  • Report accurate timestamp information.
  • Support internal buffering/delay to align local playback with the rest of the Sendspin group.

@rudyberends
Copy link
Author

I think we should consider the synchronization implications for sources with native local output that cannot be modelled as simply having both a source and client role. Many users are drawn to Sendspin for its open-source, easy-to-implement multi-room sync [citation needed], but certain hardware inputs present a challenge:

  • Local Playback Conflict: Example inputs like a TV (with built-in speakers) or a turntable (via a pre-amp) are often configured to play audio locally - whilst they would be simultaneously feeding Sendspin.

  • Lack of Latency Control: Most consumer hardware cannot report internal timestamps or buffer/delay their local output. Consequently, the local audio will play ahead of the synchronized Sendspin network stream.

  • The AV-Sync Dilemma: To achieve sync, users would have to mute the native output.

    • For video sources (TVs), this creates significant lip-sync issues as the video will remain ahead of the distributed audio.
    • For turntables this may mean muting the best speakers in the house, or switching inputs to a sendspin client

Proposed Requirement: If we want sources with native output to remain viable in a synced environment, the spec should optionally allow sources to:

  • Report accurate timestamp information.
  • Support internal buffering/delay to align local playback with the rest of the Sendspin group.

Thanks for the note — totally get the concern. The key point is that source@v1 is intentionally capture‑only. The client is meant to be as dumb as possible: it timestamps audio in the server time domain (using the existing time‑sync offset) and sends frames upstream.

From there, the server already does what it does for every stream: buffer, resample/encode if needed, and distribute synchronized playback to the group. If the device also wants to hear its own input, the correct model is simply source + player on the same client, and the server will send the synchronized stream back to it like any other player.

The reference implementation already demonstrates this: a source can be selected and played back in perfect sync across multiple clients, including the device that captured the input.

So “synced playback” isn’t missing — it’s already solved by the existing server → player pipeline. What is outside scope is a source trying to keep its native local output in sync with the network stream. That would require hardware‑specific delay control and isn’t part of the source role by design.

In short: capture stays dumb, server owns sync, and local playback is handled by the standard player path.

@Hedda
Copy link

Hedda commented Jan 25, 2026

I think we should consider the synchronization implications for sources with native local output that cannot be modelled as simply having both a source and client role. Many users are drawn to Sendspin for its open-source, easy-to-implement multi-room sync [citation needed], but certain hardware inputs present a challenge

@HarvsG also see these related discussions and requests as well as they talk more about different use case scenarios for using this feature, including ideas how a client as a appliance-like ”product” (e.g. device based on a ESP32 or a Raspberry Pi) could have two roles as both as source (capture) and player (output) at the same time:

and

Local Playback Conflict: Example inputs like a TV (with built-in speakers) or a turntable (via a pre-amp) are often configured to play audio locally - whilst they would be simultaneously feeding Sendspin.

For a such scenario to work practically I think the client device needs to have two active roles as both as source (capture) and player (output) at the same time, as that would allow the client to send the source to the server and then get the returning syncronized stream to play locally.

Hence the client device can not simply have pass-through for the local audio as then it will be impossible to syncronize. Therefore the physical client ”product” needs both inputs and outputs on the same device.

While not compaible softwarewise check out the ports on existing devices like example the"UniFi PoE Audio Port" and the ”WiiM Ultra” which are soley used for a visual aid to show ideas of the type of different audio input and output ports that could be featureed on same audio input and output device in order to do both audio-capture for input source to the server and playback to output to local speakers at the same time:

image

Image

@rudyberends
Copy link
Author

Thanks for the proposal and reference implementations @rudyberends !

One major thing missing with this role is sending of the base64 encoded header for opus and flac.

I think the most consistent way to solve this is to create a copy of the stream messages:

  • input_stream/start
  • input_stream/request-format (this can replace the format section of server/command.source)
  • input_stream/end

In case we ever have another role that sends data from the client to the server, these input_stream messages can be reused.

I agree and i implemented exactly that in the reference flow:

Added input_stream/start with codec_header (base64) for Opus/FLAC
Added input_stream/request-format and removed format from command.source
Added input_stream/end and require input_stream/start before sending audio chunks
So the source role now mirrors the stream message pattern and is reusable for future client→server media roles.

I also added optional source control commands in the reference implementation (play/pause/next/previous/activate/deactivate). These are advertised via source@v1_support.controls and sent as command.source.control. They’re purely optional and intended for controllable sources (e.g. networked players), while line‑in sources simply omit them.
Are you OK with including this in the spec as an optional capability, or should we keep it out of the spec for now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants