Feature/voice agent by Co-vengers · Pull Request #386 · GetBindu/Bindu

Co-vengers · 2026-03-22T09:32:06Z

Voice Agent Extension — Progress & Documentation PR

Overview

This PR introduces the initial implementation of the Voice Agent Extension for Bindu, enabling real-time voice conversations between users and agents. The extension integrates backend, frontend, and testing components, following the architecture and plan outlined in docs/VOICE_AGENT_PLAN.md.

What’s Included

Backend

New voice extension module: bindu/extensions/voice/ with:
- __init__.py, voice_agent_extension.py, service_factory.py, pipeline_builder.py, session_manager.py, agent_bridge.py, audio_config.py
Endpoints: bindu/server/endpoints/voice_endpoints.py (REST + WebSocket)
Settings: bindu/settings.py updated with VoiceSettings
App integration: bindu/server/applications.py updated for conditional voice route registration and session manager
Capabilities: bindu/utils/capabilities.py updated for voice extension helpers
Penguin integration: bindu/penguin/bindufy.py updated to accept voice config and add the extension

Frontend

Voice UI and client:
- frontend/src/lib/services/voice-client.ts: WebSocket client, audio capture/playback
- frontend/src/lib/stores/voice.ts: Svelte stores for voice state and transcripts
- frontend/src/lib/components/voice/VoiceCallPanel.svelte, VoiceCallButton.svelte, LiveTranscript.svelte: UI components for voice session
Integration: Existing chat and agent message handler files updated for voice support

Tests

Unit tests for all major backend components:
- tests/unit/extensions/voice/test_voice_extension.py
- tests/unit/extensions/voice/test_session_manager.py
- tests/unit/extensions/voice/test_service_factory.py
- tests/unit/extensions/voice/test_agent_bridge.py
- tests/unit/extensions/voice/test_voice_endpoints.py

Examples & Docs

Example agent: examples/voice-agent/main.py, .env.example, and README.md
Plan: docs/VOICE_AGENT_PLAN.md (implementation plan)

Current Progress

All major backend, frontend, and test files are present and staged.
Integration into the main app and settings is in progress.
Endpoints and frontend integration are actively being refined.
Unit tests for the extension and its components are included.
Example agent and configuration are provided.
Documentation plan is present; full user-facing docs (docs/VOICE.md) are planned.

How to Test

Install dependencies:
- Ensure pipecat-ai[deepgram,elevenlabs,silero] and websockets are installed (see pyproject.toml voice group).
Set environment variables:
- VOICE__STT_API_KEY, VOICE__TTS_API_KEY (see .env.example)
Run backend tests:
- uv run pytest tests/unit/extensions/voice/ -v
Run frontend:
- Start the Svelte frontend and verify the voice call UI appears for voice-enabled agents.
Manual E2E:
- Start a voice session from the UI, speak, and verify agent responses and transcripts.
Check task persistence:
- After a session, verify conversation history via GET /tasks/get.

Next Steps & Improvements

Complete and verify all items in the implementation plan checklist (see docs/VOICE_AGENT_PLAN.md)
Finalize and publish user documentation (docs/VOICE.md)
Polish frontend UI/UX and error handling
Expand test coverage (integration, E2E, edge cases)
Lint and format: uv run pre-commit run --all-files
Optimize session cleanup and resource management
Add more example agents and configuration scenarios
Prepare for future extensions (telephony, WebRTC, multi-language, etc.)

References

Contributors:

@Co-vengers

For questions or feedback, please comment on this PR.

chandan-1427 · 2026-03-23T13:11:45Z

Hey, thanks for working on adding voice support — really appreciate the effort here.

I went through the implementation and there are a few areas we’ll need to address before merging:

Multi-worker compatibility: The current session handling relies on a local store, which won’t work reliably with Uvicorn’s multi-worker setup. We’ll need to move this to a centralized solution (e.g., Redis) to avoid state inconsistencies.
Transport & latency: The current flow is based on HTTP requests. For voice interactions, we should aim for a real-time streaming approach (like WebSockets or SSE) to reduce latency and improve responsiveness.
Base branch alignment: It looks like this was built on an older version of Bindu. There are conflicts with recent changes, so rebasing onto the latest main would help before proceeding.

Looking forward to the update!

Co-vengers · 2026-03-24T15:06:50Z

Hey

Hey, thanks for working on adding voice support — really appreciate the effort here.

I went through the implementation and there are a few areas we’ll need to address before merging:

Multi-worker compatibility: The current session handling relies on a local store, which won’t work reliably with Uvicorn’s multi-worker setup. We’ll need to move this to a centralized solution (e.g., Redis) to avoid state inconsistencies.

Transport & latency: The current flow is based on HTTP requests. For voice interactions, we should aim for a real-time streaming approach (like WebSockets or SSE) to reduce latency and improve responsiveness.

Base branch alignment: It looks like this was built on an older version of Bindu. There are conflicts with recent changes, so rebasing onto the latest main would help before proceeding.

Looking forward to the update!

Thanks for the review. I'll look into the improvements as suggested.

…Bindu#353) Worker accessed task_operation["_current_span"] but scheduler now sends primitive trace_id/span_id strings. Add _reconstruct_span() helper to rebuild a NonRecordingSpan from hex-encoded IDs with graceful fallback.

Replace math.inf buffer size with a constant of 100 to prevent unbounded memory growth while still allowing task enqueue before the worker loop is ready.

Add SpanContext, TraceFlags, NonRecordingSpan, and INVALID_SPAN_CONTEXT mocks. Register opentelemetry.trace.span submodule so worker imports resolve in the test environment.

…audio config, pipeline builder, service factory, session manager, extension class)

… manager

…sions with tests

…ent capabilities

…ests

…ript integration

… state handling

…ipt display

… mic capture

…ript management

- Add asyncio.Lock to AgentBridgeProcessor to prevent concurrent processing race conditions - Change pipeline_builder.build_voice_pipeline to synchronous (non-async entry point) - Add lock-protected state updates in VoiceSessionManager.update_state - Fix conversation history to pass copy to avoid mutation during retry

- Add proper host detection for WebSocket URL with fallback to client host - Wrap audio buffer flush in try-catch to prevent transcription errors from crashing cleanup - Wrap session state updates and WebSocket close in individual try-catch blocks - Ensure robust cleanup on WebSocket disconnect regardless of errors

- Change from await send() to send_nowait() to fail fast when buffer is full - Catch anyio.WouldBlock and raise clear RuntimeError with buffer full message - Prevents API handler from hanging indefinitely on full task queue - Adds documentation about non-blocking behavior and buffer capacity

- Constrain grpcio to >=1.67.0,<1.78.0 to avoid version conflicts - Add note about bindu:// internal routing scheme in VoiceSettings - Update both base dependencies and grpc extra

ChatInput: - Add error handling for file reading with user alert fallback - Clear files and input after successful submit ChatWindow: - Add submit function with proper error handling and cleanup - Fix voice error state clearing when error resolves LiveTranscript: - Use item.ts as key instead of index for proper list updates VoiceCallButton: - Add disabled state styling and correct aria-label VoiceCallPanel: - Add proper audio playback error handling and cleanup on unmount Mime types: - Expand DOCUMENT_MIME_ALLOWLIST to include more document types Voice client: - Return boolean from sendControl to only update state on success - Fix mute/unmute to only set state when control message sent Chat store: - Only add message if text part exists and has non-empty text Voice store: - Clean up existing client before creating new one - Attach handlers only after successful session start - Add proper error handling and state reset on failure

Quiz agent: - Validate OPENROUTER_API_KEY is defined before creating LLM - Add validation for user input with proper error handling Voice example: - Mask API key in debug output instead of exposing it Test stubs: - Create proper NonRecordingSpan class that returns False for is_recording Test agent bridge: - Add assertions for assistant messages in history to verify response handling

- Update dependency version to use ~= for compatibility - Add security and authentication settings (session auth, rate limiting, etc.) - Add privacy and compliance settings (transcript storage, retention, consent) - Add production-grade AgentBridgeProcessor flow description - Add WebSocket security requirements and frame types - Update audio capture pipeline with more detailed configuration - Add integration, E2E, performance, and browser test sections - Update example voice handler signature to match A2A protocol - Add security, performance, and browser compatibility checklists

Co-vengers · 2026-03-28T14:01:23Z

Hey @chandan-1427
I have implemented redis backed multi worker voice session management plus real time WebSocket streaming for low latency interactions, with config/lifecycle wiring and pre commit cleanup.

Co-vengers force-pushed the feature/voice-agent branch from 6b8acb4 to 20e3419 Compare March 24, 2026 14:52

Co-vengers force-pushed the feature/voice-agent branch from 20e3419 to 5b0e29e Compare March 25, 2026 18:52

raahulrahl requested a review from chandan-1427 March 27, 2026 08:57

Co-vengers added 25 commits March 28, 2026 14:06

fix(scheduler): replace unbounded stream buffer with bounded limit

6d7e850

Replace math.inf buffer size with a constant of 100 to prevent unbounded memory growth while still allowing task enqueue before the worker loop is ready.

test: add opentelemetry.trace.span stubs for NonRecordingSpan imports

071bedf

Add SpanContext, TraceFlags, NonRecordingSpan, and INVALID_SPAN_CONTEXT mocks. Register opentelemetry.trace.span submodule so worker imports resolve in the test environment.

chore(voice): register voice extension in extensions module

f105e0b

feat(voice): add backend voice extension module (init, agent bridge, …

a3685b0

…audio config, pipeline builder, service factory, session manager, extension class)

feat(bindufy): support voice extension config in agent creation

2b7e1a3

feat(server): add conditional voice endpoint registration and session…

99f0943

… manager

feat(voice-endpoints): add REST and WebSocket endpoints for voice ses…

9738558

…sions with tests

feat(settings): add VoiceSettings and extension config with tests

91bc550

feat(capabilities): add helper for extracting voice extension from ag…

3257dcb

…ent capabilities

feat(service-factory): add service factory for STT/TTS with tests

a0e500c

feat(session-manager): add session manager for voice sessions with tests

4fbff61

feat(agent-bridge): add agent bridge processor for STT↔A2A↔TTS with t…

16d73a9

…ests

feat(frontend): update ChatInput.svelte for voice session integration

e1e7496

feat(frontend): update ChatWindow.svelte for voice overlay and transc…

b56a1b0

…ript integration

feat(frontend): add voice MIME type support in constants

689f3a0

feat(frontend): update chat store for voice session state integration

8f80680

feat(frontend): update agent message handler for voice transcript and…

d67b5f4

… state handling

feat(frontend): add LiveTranscript.svelte for real-time voice transcr…

8ff3a89

…ipt display

feat(frontend): add VoiceCallButton.svelte for starting voice sessions

a72b263

feat(frontend): add VoiceCallPanel.svelte for voice call overlay UI

d929bf8

feat(frontend): add voice-client.ts for WebSocket audio transport and…

584bdb8

… mic capture

feat(frontend): add voice.ts store for voice session state and transc…

2f7273d

…ript management

chore(utils): update __init__.py for voice extension support

564e7ea

docs(examples): update README with voice agent example info

aba0a80

Co-vengers and others added 11 commits March 28, 2026 14:16

docs(voice): add VOICE_AGENT_PLAN.md implementation plan

270e76f

feat(examples): add example voice agent and config files

5f101ba

test(extensions): add __init__.py for extensions unit tests

f606b9a

test(voice): add __init__.py for voice extension unit tests

9034752

WIP: local changes before rebase

ffa804f

Update agent_handler_pb2.py: regenerate protobuf bindings

90c9283

Update agent_handler_pb2.pyi: regenerate type stubs for protobuf

e75e216

Update agent_handler_pb2_grpc.py: regenerate gRPC service bindings

53f1922

Update pyproject.toml: adjust dependencies or build config

18db5b1

Update uv.lock: sync lockfile with dependency changes

e4eb8fc

adding quiz agent in examples

51e7e91

Co-vengers force-pushed the feature/voice-agent branch from 5592bd1 to 51e7e91 Compare March 28, 2026 08:52

Co-vengers added 14 commits March 28, 2026 18:19

chore: pin grpcio version for compatibility with existing dependencies

bda8f9c

- Constrain grpcio to >=1.67.0,<1.78.0 to avoid version conflicts - Add note about bindu:// internal routing scheme in VoiceSettings - Update both base dependencies and grpc extra

chore: update lock file for dependency changes

4183b4f

Use voice session manager factory in app lifecycle

46c55f1

Add VoiceSession serialization helpers for backend storage

08ae112

Add voice session backend and Redis configuration settings

ab0d549

Add factory helpers for voice session manager backends

d0287f6

Add Redis-backed voice session manager implementation

8042e0c

Apply pre-commit fixes and track frontend .env.example

49f1fc9

Run full pytest suite in CI coverage job

4cf1ea4

raahulrahl assigned chandan-1427 Mar 30, 2026

raahulrahl added the inprogress label Mar 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/voice agent#386

Feature/voice agent#386
Co-vengers wants to merge 53 commits intoGetBindu:mainfrom
Co-vengers:feature/voice-agent

Co-vengers commented Mar 22, 2026

Uh oh!

chandan-1427 commented Mar 23, 2026

Uh oh!

Co-vengers commented Mar 24, 2026

Uh oh!

Co-vengers commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Co-vengers commented Mar 22, 2026

Voice Agent Extension — Progress & Documentation PR

Overview

What’s Included

Backend

Frontend

Tests

Examples & Docs

Current Progress

How to Test

Next Steps & Improvements

References

Uh oh!

chandan-1427 commented Mar 23, 2026

Uh oh!

Co-vengers commented Mar 24, 2026

Uh oh!

Co-vengers commented Mar 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants