voice support: ElevenLabs STT + TTS on Telegram and web by cjus · Pull Request #30 · cjus/solrac

cjus · 2026-05-18T20:29:34Z

Why

Solrac's pitch was "configure, hack, converse with" — but the conversation was text-only. This PR adds optional voice on both transports without changing the agent loop, queue, audit shape, or cost-cap posture. Voice is off by default; enabling it requires VOICE_ENABLED=true plus an ELEVENLABS_API_KEY + ELEVENLABS_VOICE_ID.

Two design constraints shaped the work:

No new framework, no SDK. ElevenLabs gets a typed fetch wrapper (~175 lines, src/elevenlabs.ts) — same minimalism as src/telegram.ts. One additional library on the public surface would have been disproportionate.
No bleed into the Anthropic cost cap. Voice spend rides an independent axis (voice_events.cost_usd_estimate, sliding 60-min per-chat + global ceilings). A voice runaway never starves a Claude turn and vice versa.

What's in the scope

STT. Telegram voice notes get transcribed via Scribe and re-enqueued as a synthesized text Update — so the rest of the agent doesn't know it came from audio. Web UI gets a mic button that pre-fills the composer.
TTS. /voice on (per-chat sticky, persisted in sessions.voice_replies) makes Telegram replies attach an audio voice note and tells the model to keep replies terse via a <voice-mode> system block. Web UI gets a per-message 🔊 button with a cached blob — replays don't re-bill.
Audit. New voice_events table (parallel to audit, not nested) records every attempt — ok / denied_cap / denied_gate / error — with kind, source, model, cost estimate, duration / chars. Denied-gate STTs still get a row even though they never reach audit.
Safety. Independent per-chat + global voice cost cap. Allowlist gates apply uniformly. ELEVENLABS_* and VOICE_* env vars are scrubbed from the Claude SDK subprocess (agent.ts::sanitizedSubprocessEnv) so a compromised model can't exfiltrate the billed credential.
Docs. Voice covered in CONFIG / ARCHITECTURE / USAGE / OPERATIONS / FEATURES / README, plus this PR's final commit backfilling SCHEMA (voice_events table reference + 6 debugging queries), GLOSSARY (ElevenLabs, STT, TTS, voice cost cap, voice mode, voice_events), and RUNBOOK (voice cost runaway + ElevenLabs error recovery).

Anti-goal status

Does not reverse any anti-goals.

No new HTTP framework — same fetch-only posture.
No new Telegram framework — sendVoice / sendAudio added to the existing typed client.
No queue server, no Docker.
ElevenLabs is the second runtime HTTP provider (after Anthropic + Telegram + OpenRouter / Ollama / LMStudio), but it's behind a master switch and locks itself off via the independent cost cap.

Impact

17 implementation commits + 1 docs backfill commit. ~1500 lines net add across src/elevenlabs.ts, src/voice.ts, src/voice.test.ts, src/db.ts, src/config.ts, src/agent.ts, src/engine.ts, src/telegram.ts, src/main.ts, src/commands.ts, src/web.ts, public/index.html, public/app.js, public/style.css, plus docs.
29 new pure-logic tests for footer-stripping, cost math, prompt construction. Full suite passes (828/828); typecheck clean.
Default-off — operators on main see no behavior change after upgrade.

Test plan

… subprocess

…, features, readme

cjus added 18 commits May 18, 2026 12:08

add typed elevenlabs http wrapper

450d30a

add voice_events table and voice helpers in db.ts

c0fbfee

parse voice and elevenlabs env vars in config.ts

77059b8

add voice.ts orchestration and pure-logic tests

4c6f784

inject voice-mode prompt at solrac.md sites; scrub voice env from sdk…

7440c80

… subprocess

add voice env vars to .env.example

069ef78

add typed sendVoice/sendAudio helpers on TelegramClient

a8ac536

add post-turn tts hook in agent.ts and engine.ts

c1acdbc

add /voice on|off slash command

0c22fc9

wire msg.voice dispatcher and tts hook in main.ts

607cca6

add /api/stt and /api/tts routes to web server

214ae2c

add mic and speak buttons to web ui

fe73922

strip agent and engine footer from tts input

a27d51b

move voice-mode badge to header; click to disable

24dcd99

hide audio controls; toggle speak button between play and stop

0e32160

document voice support across config, usage, architecture, operations…

2080efe

…, features, readme

punch up readme tagline to lead with conversation

f730506

backfill schema, glossary, runbook for voice

be95476

cjus merged commit 51610da into main May 18, 2026
1 check passed

cjus deleted the carlos/solrac-voice-support branch May 18, 2026 20:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

voice support: ElevenLabs STT + TTS on Telegram and web#30

voice support: ElevenLabs STT + TTS on Telegram and web#30
cjus merged 18 commits into
mainfrom
carlos/solrac-voice-support

cjus commented May 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cjus commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What's in the scope

Anti-goal status

Impact

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cjus commented May 18, 2026 •

edited

Loading