DevRohit06 · DevRohit06 · Apr 11, 2026 · Apr 11, 2026 · Apr 11, 2026 · Apr 11, 2026
diff --git a/.gitignore b/.gitignore
@@ -9,3 +9,4 @@ build/
 docs/dist/
 docs/.astro/
 docs/node_modules/
+.claude/
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -4,14 +4,17 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
 
 ## Project Overview
 
-discli is a Discord CLI for AI agents and humans — a Python command-line tool for managing Discord servers, messages, reactions, threads, DMs, and events from the terminal. Published on PyPI as `discord-cli-agent`.
+discli is a Discord CLI for AI agents and humans — a Python command-line tool for managing Discord servers, messages, reactions, threads, DMs, events, voice audio, and rich interactive components (modals, workflows, dashboards) from the terminal. Published on PyPI as `discord-cli-agent`.
 
 ## Commands
 
 ```bash
 # Install (editable, with dev deps)
 pip install -e ".[dev]"
 
+# Install with voice features (TTS/STT providers)
+pip install -e ".[voice,elevenlabs,deepgram]"
+
 # Run tests
 pytest tests/ -v
 
@@ -32,13 +35,17 @@ No linter is configured. Commit style: conventional commits (`feat:`, `fix:`, `d
 
 **Key modules:**
 - `client.py` — Token resolution (flag → env var → config file), `run_discord()` sync wrapper around async Discord actions
-- `security.py` — Permission profiles (full/chat/readonly/moderation), audit logging to `~/.discli/audit.log` (JSONL), token-bucket rate limiter
+- `security.py` — Permission profiles (full/chat/readonly/moderation/voice/interact), audit logging to `~/.discli/audit.log` (JSONL), token-bucket rate limiter
 - `utils.py` — Output formatting (text/JSON via `--json` flag), channel/guild resolvers
 - `config.py` — Token storage at `~/.discli/config.json`
+- `voice_engine.py` — VoiceEngine with AudioPlayer, AudioListener, VAD-based speech segmentation, audio codec handling
+- `interact_engine.py` — InteractEngine for modals, workflows, and dashboards with interaction routing and state management
+- `tts.py` — TTS provider protocol with ElevenLabs and OpenAI implementations
+- `stt.py` — STT provider protocol with Deepgram and OpenAI Whisper implementations
 
 **Command pattern:** Each module in `commands/` defines Click commands, takes an async action function, and calls `run_discord(ctx, action)`. Follow existing modules when adding new commands.
 
-**`serve` command (commands/serve.py):** The largest module (~1100 lines). Runs a persistent bot with bidirectional JSONL over stdin/stdout. Features: event forwarding, action dispatch (send/reply/edit/delete/stream/typing/reactions/threads/polls/channels/members/roles/DMs), slash command registration, streaming message edits with periodic flush, Windows-compatible stdin reading via threading.
+**`serve` command (commands/serve.py):** The largest module (~1200 lines). Runs a persistent bot with bidirectional JSONL over stdin/stdout. Features: event forwarding, action dispatch (send/reply/edit/delete/stream/typing/reactions/threads/polls/channels/members/roles/DMs), voice actions (connect/speak/play/listen/etc.), interactive actions (workflows, dashboards), slash command registration, streaming message edits with periodic flush, Windows-compatible stdin reading via threading.
 
 ## Adding a New Command
 

diff --git a/agents/discord-agent.md b/agents/discord-agent.md
@@ -116,6 +116,27 @@ discli event create "server" "Voice Hangout" "2026-04-01T18:00:00" --channel #vo
 discli event delete "server" <event_id>
 ```
 
+### Voice
+```bash
+discli voice join <voice_channel>
+discli voice leave <voice_channel>
+discli voice speak <voice_channel> "text" --tts elevenlabs|openai --voice "voice-id"
+discli voice play <voice_channel> path/to/audio.mp3
+discli voice stop <voice_channel>
+discli voice pause <voice_channel>
+discli voice resume <voice_channel>
+discli voice listen <voice_channel> --stt deepgram|openai --duration 10
+discli voice status <voice_channel>
+discli voice config <voice_channel> --tts elevenlabs --stt deepgram --vad enabled
+```
+
+### Interactive
+```bash
+discli interact modal <channel> <message_id> --title "Form" --custom-id "myform" --field "name::string::true" --field "email::string::true"
+discli interact workflow <channel> <message_id> --workflow-id "workflow123"
+discli interact dashboard <channel> --dashboard-id "dash1" --data '{"key": "value"}'
+```
+
 ### Live Monitoring
 ```bash
 discli listen --events messages,reactions,members,edits,deletes,voice
@@ -127,7 +148,7 @@ discli listen --server "server name" --channel "#channel"
 ```bash
 discli serve --slash-commands commands.json --status online
 ```
-**stdin commands:** `send`, `reply`, `edit`, `delete`, `typing_start`, `typing_stop`, `presence`, `reaction_add`, `reaction_remove`, `stream_start`, `stream_chunk`, `stream_end`, `interaction_followup`, `modal_send`, `channel_edit`, `channel_set_permissions`, `forum_post`, `thread_archive`, `thread_rename`, `thread_add_member`, `thread_remove_member`, `member_timeout`, `role_edit`, `reaction_users`, `poll_results`, `poll_end`, `webhook_list`, `webhook_create`, `webhook_delete`, `event_list`, `event_create`, `message_bulk_delete`
+**stdin commands:** `send`, `reply`, `edit`, `delete`, `typing_start`, `typing_stop`, `presence`, `reaction_add`, `reaction_remove`, `stream_start`, `stream_chunk`, `stream_end`, `interaction_followup`, `modal_send`, `channel_edit`, `channel_set_permissions`, `forum_post`, `thread_archive`, `thread_rename`, `thread_add_member`, `thread_remove_member`, `member_timeout`, `role_edit`, `reaction_users`, `poll_results`, `poll_end`, `webhook_list`, `webhook_create`, `webhook_delete`, `event_list`, `event_create`, `message_bulk_delete`, `voice_connect`, `voice_disconnect`, `voice_move`, `voice_speak`, `voice_play`, `voice_stop`, `voice_pause`, `voice_resume`, `voice_listen_start`, `voice_listen_stop`, `voice_status`, `voice_set_config`, `workflow_start`, `workflow_cancel`, `dashboard_create`, `dashboard_update`, `dashboard_delete`
 
 **stdin examples:**
 ```json
@@ -152,15 +173,35 @@ discli serve --slash-commands commands.json --status online
 {"action": "event_list", "guild_id": "111"}
 {"action": "event_create", "guild_id": "111", "name": "Hangout", "start_time": "2026-04-01T18:00:00", "location": "Park", "end_time": "2026-04-01T20:00:00"}
 {"action": "message_bulk_delete", "channel_id": "456", "message_ids": ["111", "222", "333"]}
+{"action": "voice_connect", "channel_id": "789"}
+{"action": "voice_disconnect", "channel_id": "789"}
+{"action": "voice_speak", "channel_id": "789", "text": "Hello everyone!", "tts": "elevenlabs", "voice": "alloy"}
+{"action": "voice_play", "channel_id": "789", "audio_url": "https://example.com/audio.mp3"}
+{"action": "voice_stop", "channel_id": "789"}
+{"action": "voice_pause", "channel_id": "789"}
+{"action": "voice_resume", "channel_id": "789"}
+{"action": "voice_listen_start", "channel_id": "789", "stt": "deepgram", "duration": 30}
+{"action": "voice_listen_stop", "channel_id": "789"}
+{"action": "voice_status", "channel_id": "789"}
+{"action": "voice_set_config", "channel_id": "789", "tts": "openai", "stt": "openai", "vad_enabled": true}
+{"action": "workflow_start", "guild_id": "111", "workflow_id": "wf123", "context": {"key": "value"}}
+{"action": "workflow_cancel", "guild_id": "111", "workflow_id": "wf123"}
+{"action": "dashboard_create", "guild_id": "111", "dashboard_id": "dash1", "data": {"title": "Dashboard"}}
+{"action": "dashboard_update", "guild_id": "111", "dashboard_id": "dash1", "data": {"title": "Updated Dashboard"}}
+{"action": "dashboard_delete", "guild_id": "111", "dashboard_id": "dash1"}
 ```
 
-**stdout events:** `ready`, `message`, `slash_command`, `message_edit`, `message_delete`, `reaction_add`, `reaction_remove`, `member_join`, `member_remove`, `voice_state`, `component_interaction`, `modal_submit`, `disconnected`, `resumed`, `response`, `error`
+**stdout events:** `ready`, `message`, `slash_command`, `message_edit`, `message_delete`, `reaction_add`, `reaction_remove`, `member_join`, `member_remove`, `voice_state`, `voice_speech_detected`, `voice_audio_received`, `component_interaction`, `modal_submit`, `workflow_event`, `dashboard_interaction`, `disconnected`, `resumed`, `response`, `error`
 
 **stdout event examples:**
 ```json
 {"event": "voice_state", "action": "joined", "member": "alice", "channel": "General", "channel_id": "456"}
+{"event": "voice_speech_detected", "channel_id": "789", "text": "What's up?", "confidence": 0.95, "language": "en"}
+{"event": "voice_audio_received", "channel_id": "789", "duration_ms": 2000, "member": "bob"}
 {"event": "component_interaction", "custom_id": "ok_btn", "user": "alice", "interaction_token": "itk"}
 {"event": "modal_submit", "custom_id": "myform", "fields": {"name": "Alice"}, "interaction_token": "itk"}
+{"event": "workflow_event", "workflow_id": "wf123", "event_type": "step_completed", "data": {"step": "validate"}}
+{"event": "dashboard_interaction", "dashboard_id": "dash1", "action": "button_click", "user": "charlie"}
 {"event": "disconnected"}
 {"event": "resumed"}
 ```

diff --git a/docs/plans/2026-04-11-voice-and-interactive-design.md b/docs/plans/2026-04-11-voice-and-interactive-design.md
@@ -0,0 +1,144 @@
+# Voice & Interactive Features Design
+
+**Date:** 2026-04-11
+**Status:** Approved
+**Approach:** Layered Modules (engines separate from CLI/serve)
+
+## Summary
+
+Add full-duplex voice (TTS speak + STT listen/transcribe), audio playback, and rich interactive components (modals, multi-step workflows, persistent dashboards) to discli. Both CLI commands and serve mode JSONL actions for all features.
+
+## Voice Engine
+
+### Dependencies
+
+Required:
+- `PyNaCl` — discord.py voice encryption
+- `discord-ext-voice-recv` — voice receive support (discord.py doesn't support it natively)
+- `silero-vad` — voice activity detection
+- `audioop-lts` — PCM audio manipulation (Python 3.13+ compatible)
+
+TTS providers (optional extras):
+- `elevenlabs` — best quality, ~200ms TTFB, streaming (recommended default)
+- `openai` — solid middle ground
+- `piper-tts` — local/offline, lightweight
+
+STT providers (optional extras):
+- `deepgram-sdk` — real-time WebSocket streaming, ~100-300ms (recommended default)
+- `openai` — Whisper API, batch only
+- `faster-whisper` — local, near-real-time with VAD
+
+### Architecture (`src/discli/voice_engine.py`)
+
+- **`VoiceEngine`** — connection pool (one voice client per guild), exposes async methods for connect/disconnect/move/speak/play/listen
+- **`TTSProvider` protocol** — `async synthesize(text, voice, speed) -> AsyncIterator[bytes]` (streaming PCM chunks)
+- **`STTProvider` protocol** — `async transcribe(audio_stream) -> AsyncIterator[TranscriptionResult]` (streaming partial + final results)
+- **`AudioPlayer`** — per-connection queue-based player. Handles TTS output + file/URL playback. Supports interrupt priority.
+- **`AudioListener`** — wraps discord-ext-voice-recv, per-user PCM buffers, silero-vad for speech segmentation, feeds segments to STT provider
+- **`VoiceSession`** — full-duplex: simultaneous listen + speak on one connection
+
+### Configuration (`~/.discli/config.json`)
+
+```json
+{
+  "voice": {
+    "tts_provider": "elevenlabs",
+    "tts_voice": "default",
+    "stt_provider": "deepgram",
+    "vad_threshold": 0.5,
+    "silence_duration_ms": 800,
+    "playback_volume": 1.0
+  }
+}
+```
+
+API keys via env vars: `ELEVENLABS_API_KEY`, `DEEPGRAM_API_KEY`, `OPENAI_API_KEY`.
+
+### CLI Commands (`commands/voice.py`)
+
+- `discli voice join <channel>` / `leave` / `move` / `status`
+- `discli voice speak <text> [--voice] [--speed]`
+- `discli voice play <source> [--volume]` (file path or URL)
+- `discli voice stop` / `pause` / `resume`
+- `discli voice listen [--duration] [--continuous]`
+- `discli voice converse [--channel]` — full duplex mode
+
+### Serve Mode Actions
+
+- `voice_connect`, `voice_disconnect`, `voice_move`
+- `voice_speak`, `voice_play`, `voice_stop`, `voice_pause`, `voice_resume`
+- `voice_listen_start`, `voice_listen_stop`
+- `voice_set_config`
+
+### Serve Mode Events
+
+- `voice_transcription` — `{user_id, username, text, confidence, channel_id, is_partial}`
+- `voice_playback_started` / `voice_playback_finished`
+- `voice_connected` / `voice_disconnected`
+- `voice_user_speaking` / `voice_user_silent`
+
+## Interactive Engine
+
+### Architecture (`src/discli/interact_engine.py`)
+
+**Modals & Forms:**
+- `Modal` class — builds Discord modals with text inputs (short/paragraph), validation rules
+- Serve action: `modal_send {trigger_interaction_id, title, fields}`
+- Serve event: `modal_submit {custom_id, user_id, values}`
+
+**Multi-Step Workflows:**
+- `Workflow` class — sequence of steps (message, select, modal, confirm), tracks user state per `(user_id, workflow_id)`
+- Supports conditional branching based on user input
+- Configurable per-step timeout
+- Serve action: `workflow_start {user_id, channel_id, workflow_definition}` / `workflow_cancel`
+- Serve events: `workflow_step_completed`, `workflow_finished`, `workflow_timeout`
+
+**Persistent Dashboards:**
+- `Dashboard` class — auto-updating message with embeds + components
+- Pagination, role menus, live counters
+- State in memory with optional JSON file persistence
+- Serve action: `dashboard_create` / `dashboard_update` / `dashboard_delete`
+- Serve event: `dashboard_interaction`
+
+**Interaction Router:**
+Central dispatcher for `on_interaction` events, routes by `custom_id` prefix:
+- `modal:` → modal handler
+- `wf:` → workflow handler
+- `dash:` → dashboard handler
+- `voice:` → voice engine
+
+### CLI Commands (`commands/interact.py`)
+
+- `discli interact modal <title> --field "Name:short:required" ...`
+- `discli interact workflow <definition.json>`
+- `discli interact dashboard create|update|delete|list`
+
+## Integration
+
+**Serve mode:** Both engines initialized lazily. Actions registered as thin handlers in serve.py's dispatch table. Events flow through existing JSONL emission system. Everything on the existing asyncio event loop.
+
+**CLI:** Commands follow existing pattern (Click group -> async action -> `run_discord()`). Persistent commands (listen, converse) run until Ctrl+C or --duration.
+
+**Security (security.py):**
+- New permission scopes: `voice`, `interact`
+- `readonly`: can view status but not connect/send
+- `chat`: gets `interact` but not `voice`
+- `full`: gets everything
+- `moderation`: gets `voice` (monitoring) + `interact`
+- All actions audit-logged
+
+**Error handling:**
+- Voice connection failures → clear error in JSONL/CLI
+- Provider failures → fallback to next provider if available, otherwise error event
+- Workflow timeouts → cleanup state + event
+
+## Dependency Groups (`pyproject.toml`)
+
+```toml
+[project.optional-dependencies]
+voice = ["PyNaCl", "discord-ext-voice-recv", "silero-vad", "audioop-lts", "elevenlabs", "deepgram-sdk"]
+local-voice = ["PyNaCl", "discord-ext-voice-recv", "silero-vad", "audioop-lts", "piper-tts", "faster-whisper"]
+openai-voice = ["PyNaCl", "discord-ext-voice-recv", "silero-vad", "audioop-lts", "openai"]
+interact = []
+dev = ["pytest", "pytest-asyncio"]
+```