Conversation
|
Hey, thanks for working on adding voice support — really appreciate the effort here. I went through the implementation and there are a few areas we’ll need to address before merging:
Looking forward to the update! |
6b8acb4 to
20e3419
Compare
|
Hey
Thanks for the review. I'll look into the improvements as suggested. |
20e3419 to
5b0e29e
Compare
…Bindu#353) Worker accessed task_operation["_current_span"] but scheduler now sends primitive trace_id/span_id strings. Add _reconstruct_span() helper to rebuild a NonRecordingSpan from hex-encoded IDs with graceful fallback.
Replace math.inf buffer size with a constant of 100 to prevent unbounded memory growth while still allowing task enqueue before the worker loop is ready.
Add SpanContext, TraceFlags, NonRecordingSpan, and INVALID_SPAN_CONTEXT mocks. Register opentelemetry.trace.span submodule so worker imports resolve in the test environment.
…audio config, pipeline builder, service factory, session manager, extension class)
5592bd1 to
51e7e91
Compare
- Add asyncio.Lock to AgentBridgeProcessor to prevent concurrent processing race conditions - Change pipeline_builder.build_voice_pipeline to synchronous (non-async entry point) - Add lock-protected state updates in VoiceSessionManager.update_state - Fix conversation history to pass copy to avoid mutation during retry
- Add proper host detection for WebSocket URL with fallback to client host - Wrap audio buffer flush in try-catch to prevent transcription errors from crashing cleanup - Wrap session state updates and WebSocket close in individual try-catch blocks - Ensure robust cleanup on WebSocket disconnect regardless of errors
- Change from await send() to send_nowait() to fail fast when buffer is full - Catch anyio.WouldBlock and raise clear RuntimeError with buffer full message - Prevents API handler from hanging indefinitely on full task queue - Adds documentation about non-blocking behavior and buffer capacity
- Constrain grpcio to >=1.67.0,<1.78.0 to avoid version conflicts - Add note about bindu:// internal routing scheme in VoiceSettings - Update both base dependencies and grpc extra
ChatInput: - Add error handling for file reading with user alert fallback - Clear files and input after successful submit ChatWindow: - Add submit function with proper error handling and cleanup - Fix voice error state clearing when error resolves LiveTranscript: - Use item.ts as key instead of index for proper list updates VoiceCallButton: - Add disabled state styling and correct aria-label VoiceCallPanel: - Add proper audio playback error handling and cleanup on unmount Mime types: - Expand DOCUMENT_MIME_ALLOWLIST to include more document types Voice client: - Return boolean from sendControl to only update state on success - Fix mute/unmute to only set state when control message sent Chat store: - Only add message if text part exists and has non-empty text Voice store: - Clean up existing client before creating new one - Attach handlers only after successful session start - Add proper error handling and state reset on failure
Quiz agent: - Validate OPENROUTER_API_KEY is defined before creating LLM - Add validation for user input with proper error handling Voice example: - Mask API key in debug output instead of exposing it Test stubs: - Create proper NonRecordingSpan class that returns False for is_recording Test agent bridge: - Add assertions for assistant messages in history to verify response handling
- Update dependency version to use ~= for compatibility - Add security and authentication settings (session auth, rate limiting, etc.) - Add privacy and compliance settings (transcript storage, retention, consent) - Add production-grade AgentBridgeProcessor flow description - Add WebSocket security requirements and frame types - Update audio capture pipeline with more detailed configuration - Add integration, E2E, performance, and browser test sections - Update example voice handler signature to match A2A protocol - Add security, performance, and browser compatibility checklists
|
Hey @chandan-1427 |
Voice Agent Extension — Progress & Documentation PR
Overview
This PR introduces the initial implementation of the Voice Agent Extension for Bindu, enabling real-time voice conversations between users and agents. The extension integrates backend, frontend, and testing components, following the architecture and plan outlined in
docs/VOICE_AGENT_PLAN.md.What’s Included
Backend
bindu/extensions/voice/with:__init__.py,voice_agent_extension.py,service_factory.py,pipeline_builder.py,session_manager.py,agent_bridge.py,audio_config.pybindu/server/endpoints/voice_endpoints.py(REST + WebSocket)bindu/settings.pyupdated withVoiceSettingsbindu/server/applications.pyupdated for conditional voice route registration and session managerbindu/utils/capabilities.pyupdated for voice extension helpersbindu/penguin/bindufy.pyupdated to accept voice config and add the extensionFrontend
frontend/src/lib/services/voice-client.ts: WebSocket client, audio capture/playbackfrontend/src/lib/stores/voice.ts: Svelte stores for voice state and transcriptsfrontend/src/lib/components/voice/VoiceCallPanel.svelte,VoiceCallButton.svelte,LiveTranscript.svelte: UI components for voice sessionTests
tests/unit/extensions/voice/test_voice_extension.pytests/unit/extensions/voice/test_session_manager.pytests/unit/extensions/voice/test_service_factory.pytests/unit/extensions/voice/test_agent_bridge.pytests/unit/extensions/voice/test_voice_endpoints.pyExamples & Docs
examples/voice-agent/main.py,.env.example, andREADME.mddocs/VOICE_AGENT_PLAN.md(implementation plan)Current Progress
docs/VOICE.md) are planned.How to Test
pipecat-ai[deepgram,elevenlabs,silero]andwebsocketsare installed (seepyproject.tomlvoice group).VOICE__STT_API_KEY,VOICE__TTS_API_KEY(see.env.example)uv run pytest tests/unit/extensions/voice/ -vGET /tasks/get.Next Steps & Improvements
docs/VOICE_AGENT_PLAN.md)docs/VOICE.md)uv run pre-commit run --all-filesReferences
Contributors:
For questions or feedback, please comment on this PR.