Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@ build/
docs/dist/
docs/.astro/
docs/node_modules/
.claude/
13 changes: 10 additions & 3 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,17 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co

## Project Overview

discli is a Discord CLI for AI agents and humans — a Python command-line tool for managing Discord servers, messages, reactions, threads, DMs, and events from the terminal. Published on PyPI as `discord-cli-agent`.
discli is a Discord CLI for AI agents and humans — a Python command-line tool for managing Discord servers, messages, reactions, threads, DMs, events, voice audio, and rich interactive components (modals, workflows, dashboards) from the terminal. Published on PyPI as `discord-cli-agent`.

## Commands

```bash
# Install (editable, with dev deps)
pip install -e ".[dev]"

# Install with voice features (TTS/STT providers)
pip install -e ".[voice,elevenlabs,deepgram]"

# Run tests
pytest tests/ -v

Expand All @@ -32,13 +35,17 @@ No linter is configured. Commit style: conventional commits (`feat:`, `fix:`, `d

**Key modules:**
- `client.py` — Token resolution (flag → env var → config file), `run_discord()` sync wrapper around async Discord actions
- `security.py` — Permission profiles (full/chat/readonly/moderation), audit logging to `~/.discli/audit.log` (JSONL), token-bucket rate limiter
- `security.py` — Permission profiles (full/chat/readonly/moderation/voice/interact), audit logging to `~/.discli/audit.log` (JSONL), token-bucket rate limiter
- `utils.py` — Output formatting (text/JSON via `--json` flag), channel/guild resolvers
- `config.py` — Token storage at `~/.discli/config.json`
- `voice_engine.py` — VoiceEngine with AudioPlayer, AudioListener, VAD-based speech segmentation, audio codec handling
- `interact_engine.py` — InteractEngine for modals, workflows, and dashboards with interaction routing and state management
- `tts.py` — TTS provider protocol with ElevenLabs and OpenAI implementations
- `stt.py` — STT provider protocol with Deepgram and OpenAI Whisper implementations

**Command pattern:** Each module in `commands/` defines Click commands, takes an async action function, and calls `run_discord(ctx, action)`. Follow existing modules when adding new commands.

**`serve` command (commands/serve.py):** The largest module (~1100 lines). Runs a persistent bot with bidirectional JSONL over stdin/stdout. Features: event forwarding, action dispatch (send/reply/edit/delete/stream/typing/reactions/threads/polls/channels/members/roles/DMs), slash command registration, streaming message edits with periodic flush, Windows-compatible stdin reading via threading.
**`serve` command (commands/serve.py):** The largest module (~1200 lines). Runs a persistent bot with bidirectional JSONL over stdin/stdout. Features: event forwarding, action dispatch (send/reply/edit/delete/stream/typing/reactions/threads/polls/channels/members/roles/DMs), voice actions (connect/speak/play/listen/etc.), interactive actions (workflows, dashboards), slash command registration, streaming message edits with periodic flush, Windows-compatible stdin reading via threading.

## Adding a New Command

Expand Down
45 changes: 43 additions & 2 deletions agents/discord-agent.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,27 @@ discli event create "server" "Voice Hangout" "2026-04-01T18:00:00" --channel #vo
discli event delete "server" <event_id>
```

### Voice
```bash
discli voice join <voice_channel>
discli voice leave <voice_channel>
discli voice speak <voice_channel> "text" --tts elevenlabs|openai --voice "voice-id"
discli voice play <voice_channel> path/to/audio.mp3
discli voice stop <voice_channel>
discli voice pause <voice_channel>
discli voice resume <voice_channel>
discli voice listen <voice_channel> --stt deepgram|openai --duration 10
discli voice status <voice_channel>
discli voice config <voice_channel> --tts elevenlabs --stt deepgram --vad enabled
```

### Interactive
```bash
discli interact modal <channel> <message_id> --title "Form" --custom-id "myform" --field "name::string::true" --field "email::string::true"
discli interact workflow <channel> <message_id> --workflow-id "workflow123"
discli interact dashboard <channel> --dashboard-id "dash1" --data '{"key": "value"}'
```

### Live Monitoring
```bash
discli listen --events messages,reactions,members,edits,deletes,voice
Expand All @@ -127,7 +148,7 @@ discli listen --server "server name" --channel "#channel"
```bash
discli serve --slash-commands commands.json --status online
```
**stdin commands:** `send`, `reply`, `edit`, `delete`, `typing_start`, `typing_stop`, `presence`, `reaction_add`, `reaction_remove`, `stream_start`, `stream_chunk`, `stream_end`, `interaction_followup`, `modal_send`, `channel_edit`, `channel_set_permissions`, `forum_post`, `thread_archive`, `thread_rename`, `thread_add_member`, `thread_remove_member`, `member_timeout`, `role_edit`, `reaction_users`, `poll_results`, `poll_end`, `webhook_list`, `webhook_create`, `webhook_delete`, `event_list`, `event_create`, `message_bulk_delete`
**stdin commands:** `send`, `reply`, `edit`, `delete`, `typing_start`, `typing_stop`, `presence`, `reaction_add`, `reaction_remove`, `stream_start`, `stream_chunk`, `stream_end`, `interaction_followup`, `modal_send`, `channel_edit`, `channel_set_permissions`, `forum_post`, `thread_archive`, `thread_rename`, `thread_add_member`, `thread_remove_member`, `member_timeout`, `role_edit`, `reaction_users`, `poll_results`, `poll_end`, `webhook_list`, `webhook_create`, `webhook_delete`, `event_list`, `event_create`, `message_bulk_delete`, `voice_connect`, `voice_disconnect`, `voice_move`, `voice_speak`, `voice_play`, `voice_stop`, `voice_pause`, `voice_resume`, `voice_listen_start`, `voice_listen_stop`, `voice_status`, `voice_set_config`, `workflow_start`, `workflow_cancel`, `dashboard_create`, `dashboard_update`, `dashboard_delete`

**stdin examples:**
```json
Expand All @@ -152,15 +173,35 @@ discli serve --slash-commands commands.json --status online
{"action": "event_list", "guild_id": "111"}
{"action": "event_create", "guild_id": "111", "name": "Hangout", "start_time": "2026-04-01T18:00:00", "location": "Park", "end_time": "2026-04-01T20:00:00"}
{"action": "message_bulk_delete", "channel_id": "456", "message_ids": ["111", "222", "333"]}
{"action": "voice_connect", "channel_id": "789"}
{"action": "voice_disconnect", "channel_id": "789"}
{"action": "voice_speak", "channel_id": "789", "text": "Hello everyone!", "tts": "elevenlabs", "voice": "alloy"}
{"action": "voice_play", "channel_id": "789", "audio_url": "https://example.com/audio.mp3"}
{"action": "voice_stop", "channel_id": "789"}
{"action": "voice_pause", "channel_id": "789"}
{"action": "voice_resume", "channel_id": "789"}
{"action": "voice_listen_start", "channel_id": "789", "stt": "deepgram", "duration": 30}
{"action": "voice_listen_stop", "channel_id": "789"}
{"action": "voice_status", "channel_id": "789"}
{"action": "voice_set_config", "channel_id": "789", "tts": "openai", "stt": "openai", "vad_enabled": true}
{"action": "workflow_start", "guild_id": "111", "workflow_id": "wf123", "context": {"key": "value"}}
{"action": "workflow_cancel", "guild_id": "111", "workflow_id": "wf123"}
{"action": "dashboard_create", "guild_id": "111", "dashboard_id": "dash1", "data": {"title": "Dashboard"}}
{"action": "dashboard_update", "guild_id": "111", "dashboard_id": "dash1", "data": {"title": "Updated Dashboard"}}
{"action": "dashboard_delete", "guild_id": "111", "dashboard_id": "dash1"}
```

**stdout events:** `ready`, `message`, `slash_command`, `message_edit`, `message_delete`, `reaction_add`, `reaction_remove`, `member_join`, `member_remove`, `voice_state`, `component_interaction`, `modal_submit`, `disconnected`, `resumed`, `response`, `error`
**stdout events:** `ready`, `message`, `slash_command`, `message_edit`, `message_delete`, `reaction_add`, `reaction_remove`, `member_join`, `member_remove`, `voice_state`, `voice_speech_detected`, `voice_audio_received`, `component_interaction`, `modal_submit`, `workflow_event`, `dashboard_interaction`, `disconnected`, `resumed`, `response`, `error`

**stdout event examples:**
```json
{"event": "voice_state", "action": "joined", "member": "alice", "channel": "General", "channel_id": "456"}
{"event": "voice_speech_detected", "channel_id": "789", "text": "What's up?", "confidence": 0.95, "language": "en"}
{"event": "voice_audio_received", "channel_id": "789", "duration_ms": 2000, "member": "bob"}
{"event": "component_interaction", "custom_id": "ok_btn", "user": "alice", "interaction_token": "itk"}
{"event": "modal_submit", "custom_id": "myform", "fields": {"name": "Alice"}, "interaction_token": "itk"}
{"event": "workflow_event", "workflow_id": "wf123", "event_type": "step_completed", "data": {"step": "validate"}}
{"event": "dashboard_interaction", "dashboard_id": "dash1", "action": "button_click", "user": "charlie"}
{"event": "disconnected"}
{"event": "resumed"}
```
Expand Down
144 changes: 144 additions & 0 deletions docs/plans/2026-04-11-voice-and-interactive-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
# Voice & Interactive Features Design

**Date:** 2026-04-11
**Status:** Approved
**Approach:** Layered Modules (engines separate from CLI/serve)

## Summary

Add full-duplex voice (TTS speak + STT listen/transcribe), audio playback, and rich interactive components (modals, multi-step workflows, persistent dashboards) to discli. Both CLI commands and serve mode JSONL actions for all features.

## Voice Engine

### Dependencies

Required:
- `PyNaCl` — discord.py voice encryption
- `discord-ext-voice-recv` — voice receive support (discord.py doesn't support it natively)
- `silero-vad` — voice activity detection
- `audioop-lts` — PCM audio manipulation (Python 3.13+ compatible)

TTS providers (optional extras):
- `elevenlabs` — best quality, ~200ms TTFB, streaming (recommended default)
- `openai` — solid middle ground
- `piper-tts` — local/offline, lightweight

STT providers (optional extras):
- `deepgram-sdk` — real-time WebSocket streaming, ~100-300ms (recommended default)
- `openai` — Whisper API, batch only
- `faster-whisper` — local, near-real-time with VAD

### Architecture (`src/discli/voice_engine.py`)

- **`VoiceEngine`** — connection pool (one voice client per guild), exposes async methods for connect/disconnect/move/speak/play/listen
- **`TTSProvider` protocol** — `async synthesize(text, voice, speed) -> AsyncIterator[bytes]` (streaming PCM chunks)
- **`STTProvider` protocol** — `async transcribe(audio_stream) -> AsyncIterator[TranscriptionResult]` (streaming partial + final results)
- **`AudioPlayer`** — per-connection queue-based player. Handles TTS output + file/URL playback. Supports interrupt priority.
- **`AudioListener`** — wraps discord-ext-voice-recv, per-user PCM buffers, silero-vad for speech segmentation, feeds segments to STT provider
- **`VoiceSession`** — full-duplex: simultaneous listen + speak on one connection

### Configuration (`~/.discli/config.json`)

```json
{
"voice": {
"tts_provider": "elevenlabs",
"tts_voice": "default",
"stt_provider": "deepgram",
"vad_threshold": 0.5,
"silence_duration_ms": 800,
"playback_volume": 1.0
}
}
```

API keys via env vars: `ELEVENLABS_API_KEY`, `DEEPGRAM_API_KEY`, `OPENAI_API_KEY`.

### CLI Commands (`commands/voice.py`)

- `discli voice join <channel>` / `leave` / `move` / `status`
- `discli voice speak <text> [--voice] [--speed]`
- `discli voice play <source> [--volume]` (file path or URL)
- `discli voice stop` / `pause` / `resume`
- `discli voice listen [--duration] [--continuous]`
- `discli voice converse [--channel]` — full duplex mode

### Serve Mode Actions

- `voice_connect`, `voice_disconnect`, `voice_move`
- `voice_speak`, `voice_play`, `voice_stop`, `voice_pause`, `voice_resume`
- `voice_listen_start`, `voice_listen_stop`
- `voice_set_config`

### Serve Mode Events

- `voice_transcription` — `{user_id, username, text, confidence, channel_id, is_partial}`
- `voice_playback_started` / `voice_playback_finished`
- `voice_connected` / `voice_disconnected`
- `voice_user_speaking` / `voice_user_silent`

## Interactive Engine

### Architecture (`src/discli/interact_engine.py`)

**Modals & Forms:**
- `Modal` class — builds Discord modals with text inputs (short/paragraph), validation rules
- Serve action: `modal_send {trigger_interaction_id, title, fields}`
- Serve event: `modal_submit {custom_id, user_id, values}`

**Multi-Step Workflows:**
- `Workflow` class — sequence of steps (message, select, modal, confirm), tracks user state per `(user_id, workflow_id)`
- Supports conditional branching based on user input
- Configurable per-step timeout
- Serve action: `workflow_start {user_id, channel_id, workflow_definition}` / `workflow_cancel`
- Serve events: `workflow_step_completed`, `workflow_finished`, `workflow_timeout`

**Persistent Dashboards:**
- `Dashboard` class — auto-updating message with embeds + components
- Pagination, role menus, live counters
- State in memory with optional JSON file persistence
- Serve action: `dashboard_create` / `dashboard_update` / `dashboard_delete`
- Serve event: `dashboard_interaction`

**Interaction Router:**
Central dispatcher for `on_interaction` events, routes by `custom_id` prefix:
- `modal:` → modal handler
- `wf:` → workflow handler
- `dash:` → dashboard handler
- `voice:` → voice engine

### CLI Commands (`commands/interact.py`)

- `discli interact modal <title> --field "Name:short:required" ...`
- `discli interact workflow <definition.json>`
- `discli interact dashboard create|update|delete|list`

## Integration

**Serve mode:** Both engines initialized lazily. Actions registered as thin handlers in serve.py's dispatch table. Events flow through existing JSONL emission system. Everything on the existing asyncio event loop.

**CLI:** Commands follow existing pattern (Click group -> async action -> `run_discord()`). Persistent commands (listen, converse) run until Ctrl+C or --duration.

**Security (security.py):**
- New permission scopes: `voice`, `interact`
- `readonly`: can view status but not connect/send
- `chat`: gets `interact` but not `voice`
- `full`: gets everything
- `moderation`: gets `voice` (monitoring) + `interact`
- All actions audit-logged

**Error handling:**
- Voice connection failures → clear error in JSONL/CLI
- Provider failures → fallback to next provider if available, otherwise error event
- Workflow timeouts → cleanup state + event

## Dependency Groups (`pyproject.toml`)

```toml
[project.optional-dependencies]
voice = ["PyNaCl", "discord-ext-voice-recv", "silero-vad", "audioop-lts", "elevenlabs", "deepgram-sdk"]
local-voice = ["PyNaCl", "discord-ext-voice-recv", "silero-vad", "audioop-lts", "piper-tts", "faster-whisper"]
openai-voice = ["PyNaCl", "discord-ext-voice-recv", "silero-vad", "audioop-lts", "openai"]
interact = []
dev = ["pytest", "pytest-asyncio"]
```
Loading