Skip to content

voice support: ElevenLabs STT + TTS on Telegram and web#30

Merged
cjus merged 18 commits into
mainfrom
carlos/solrac-voice-support
May 18, 2026
Merged

voice support: ElevenLabs STT + TTS on Telegram and web#30
cjus merged 18 commits into
mainfrom
carlos/solrac-voice-support

Conversation

@cjus
Copy link
Copy Markdown
Owner

@cjus cjus commented May 18, 2026

Why

Solrac's pitch was "configure, hack, converse with" — but the conversation was text-only. This PR adds optional voice on both transports without changing the agent loop, queue, audit shape, or cost-cap posture. Voice is off by default; enabling it requires VOICE_ENABLED=true plus an ELEVENLABS_API_KEY + ELEVENLABS_VOICE_ID.

Two design constraints shaped the work:

  • No new framework, no SDK. ElevenLabs gets a typed fetch wrapper (~175 lines, src/elevenlabs.ts) — same minimalism as src/telegram.ts. One additional library on the public surface would have been disproportionate.
  • No bleed into the Anthropic cost cap. Voice spend rides an independent axis (voice_events.cost_usd_estimate, sliding 60-min per-chat + global ceilings). A voice runaway never starves a Claude turn and vice versa.

What's in the scope

  • STT. Telegram voice notes get transcribed via Scribe and re-enqueued as a synthesized text Update — so the rest of the agent doesn't know it came from audio. Web UI gets a mic button that pre-fills the composer.
  • TTS. /voice on (per-chat sticky, persisted in sessions.voice_replies) makes Telegram replies attach an audio voice note and tells the model to keep replies terse via a <voice-mode> system block. Web UI gets a per-message 🔊 button with a cached blob — replays don't re-bill.
  • Audit. New voice_events table (parallel to audit, not nested) records every attempt — ok / denied_cap / denied_gate / error — with kind, source, model, cost estimate, duration / chars. Denied-gate STTs still get a row even though they never reach audit.
  • Safety. Independent per-chat + global voice cost cap. Allowlist gates apply uniformly. ELEVENLABS_* and VOICE_* env vars are scrubbed from the Claude SDK subprocess (agent.ts::sanitizedSubprocessEnv) so a compromised model can't exfiltrate the billed credential.
  • Docs. Voice covered in CONFIG / ARCHITECTURE / USAGE / OPERATIONS / FEATURES / README, plus this PR's final commit backfilling SCHEMA (voice_events table reference + 6 debugging queries), GLOSSARY (ElevenLabs, STT, TTS, voice cost cap, voice mode, voice_events), and RUNBOOK (voice cost runaway + ElevenLabs error recovery).

Anti-goal status

Does not reverse any anti-goals.

  • No new HTTP framework — same fetch-only posture.
  • No new Telegram framework — sendVoice / sendAudio added to the existing typed client.
  • No queue server, no Docker.
  • ElevenLabs is the second runtime HTTP provider (after Anthropic + Telegram + OpenRouter / Ollama / LMStudio), but it's behind a master switch and locks itself off via the independent cost cap.

Impact

  • 17 implementation commits + 1 docs backfill commit. ~1500 lines net add across src/elevenlabs.ts, src/voice.ts, src/voice.test.ts, src/db.ts, src/config.ts, src/agent.ts, src/engine.ts, src/telegram.ts, src/main.ts, src/commands.ts, src/web.ts, public/index.html, public/app.js, public/style.css, plus docs.
  • 29 new pure-logic tests for footer-stripping, cost math, prompt construction. Full suite passes (828/828); typecheck clean.
  • Default-off — operators on main see no behavior change after upgrade.

Test plan

  • npm run typecheck — clean
  • bun test — 828/828 pass
  • Telegram voice note → transcribed → routed through engine
  • /voice on → Telegram reply attaches voice note
  • Web mic button → composer pre-fill
  • Web 🔊 button → first click synthesizes, subsequent clicks replay from cached blob
  • Voice mode badge toggles on header click
  • Cost cap (per-chat + global) writes denied_cap rows when hit
  • Allowlist gate writes denied_gate rows
  • ElevenLabs auth failure surfaces as error row with verbatim upstream message
  • Reviewer: smoke-test with their own ElevenLabs key before merge

@cjus cjus merged commit 51610da into main May 18, 2026
1 check passed
@cjus cjus deleted the carlos/solrac-voice-support branch May 18, 2026 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant