Skip to content

Add AI voice control for the board#1

Open
brahim-guaali wants to merge 1 commit intomainfrom
feat/voice-control
Open

Add AI voice control for the board#1
brahim-guaali wants to merge 1 commit intomainfrom
feat/voice-control

Conversation

@brahim-guaali
Copy link
Copy Markdown
Owner

Summary

  • AI voice control via floating mic button — users speak natural language commands to control the board (select cards, create streams, change status, search, focus, etc.)
  • Persistent voice session — mic stays active across commands until the user taps to stop. After each TTS confirmation, recognition automatically restarts for the next command.
  • Conversation context — Gemini chat history is maintained across turns so users can say "mark it as done" after "select the auth card" without re-specifying which card.
  • Uses Gemini 2.5 Flash (already integrated via Firebase AI) with structured JSON output and Web Speech API (SpeechRecognition + SpeechSynthesis) — no new dependencies.

New files

File Purpose
src/types/speech-recognition.d.ts TypeScript declarations for the Web Speech API
src/hooks/useVoiceControl.ts Core hook: state machine, speech recognition, Gemini command parsing, dispatch, TTS
src/components/voice/VoiceControlButton.tsx Floating mic button with visual states (idle/listening/processing/speaking/error)
src/components/voice/VoiceToast.tsx Transient toast for AI confirmations and errors

Modified files

File Change
src/pages/ProjectPage.tsx Integrates useVoiceControl hook, mounts VoiceControlButton in canvas (gated by !isReadOnly)
src/hooks/index.ts Exports useVoiceControl
src/index.css Adds fadeIn / slideUp keyframe animations

Supported voice commands

select_stream, create_stream, branch_stream, update_status, update_title, update_description, set_due_date, add_dependency, remove_dependency, add_note, delete_stream, search, focus_stream, unfocus, reset_view

Test plan

  • Chrome: mic button visible, tap starts session (red pulse), speak "select [card name]" → card selected, sidebar opens, TTS confirms, mic resumes listening
  • "Create a task called Sprint Planning" → new card appears on canvas
  • "Mark [card] as done" → status changes, confetti fires
  • Multi-turn: "Select auth card" → "Mark it as done" → resolves "it" from context
  • "Search for auth" → search bar populates
  • "Focus on [card]" → focus mode; "Exit focus" → returns
  • "Zoom to fit" → canvas recenters
  • Gibberish → AI returns error toast, session resumes listening
  • Tap mic during processing/speaking → kills session immediately
  • Read-only board: mic button not shown
  • Unsupported browser (Firefox): mic button hidden

🤖 Generated with Claude Code

Lets users control the entire board with natural language voice commands
(select, create, update, delete, search, focus, etc.) via a persistent
voice session with conversation context across commands.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant