feat(voice): Add PCM audio debugging utilities and fix binary storage#51
Conversation
- Add VoiceHandler with WebSocket support for real-time audio/text communication - Implement voice session management integrated with existing chat sessions - Add voice message logging with transcript capture and confidence scoring - Create voice data models with base64 audio/text support and validation - Configure voice handler in dev environment Core Features: - WebSocket connection to existing chat sessions via /voice/ws - Audio PCM and text message processing with size validation - Real-time transcript logging with duration and confidence metrics - Voice session lifecycle tracking (start/end events) - Integration with ADK agents for character voice responses Technical Implementation: - Voice module with handler, models, and config classes - Extended ChatLogger with voice-specific logging methods - Character model enhanced with optional voice_id field - Async WebSocket handling with proper authentication - Base64 audio chunk processing with configurable limits 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Create organized test suite in test/scripts/voice/ directory - Add setup_voice_test.py for quick session creation and HTML generation - Add interactive HTML test page with enhanced features: - Connection retry logic with exponential backoff - Export functionality for debug logs and transcripts - Real-time WebSocket message monitoring - Push-to-talk and text messaging support - Include automated test_voice_backend.py for CI/CD integration - Add extensive README documentation with troubleshooting guides Features: - Three testing approaches: interactive, automated, and manual - Browser-based testing with pre-filled credentials - Comprehensive test coverage for auth, WebSocket, audio, and transcripts - Performance benchmarks and debug tips - Support for custom credentials and multiple sessions The test suite validates voice backend functionality including: - WebSocket connection and message flow - Audio streaming with PCM format - Real-time transcript updates (partial and final) - Session lifecycle management - Error handling and recovery 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit introduces a feature to log raw incoming PCM audio streams from clients during voice sessions. This is intended for debugging and analysis purposes and is strictly limited to non-production environments. The implementation follows best practices by: - Introducing a structured `EnvironmentInfo` model and a corresponding `get_environment_info` dependency to provide clean, injectable access to the current deployment environment. - Modifying the `VoiceHandler` to inject this new dependency and conditionally log audio based on the environment (`dev` or `beta`). - Adding a `log_pcm_audio` method to `ChatLogger`, which handles writing the audio data to a separate `voice_logs` directory in the storage backend, ensuring separation from standard chat logs. This provides a powerful debugging tool without affecting production performance or data privacy. gemini: - Added EnvironmentInfo model to common/models.py - Added get_environment_info dependency to server/dependencies.py - Injected EnvironmentInfo into VoiceHandler to conditionally log audio - Implemented log_pcm_audio in ChatLogger to save raw audio data
- Fix PCM audio logging bug: Change storage.write() to storage.write_bytes() in chat_logger.py for proper binary data handling - Add debug_audio.py utility for reassembling and analyzing PCM chunks: * info command: Show session audio statistics and timing * reassemble command: Combine PCM chunks into playable WAV files * play command: Playback reassembled audio for debugging - Update voice testing README.md with comprehensive audio debugging documentation - Support 16-bit PCM, 16kHz, mono format matching Gemini Live API requirements - Enable environment-based audio recording (dev/beta only, not production) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
|
Claude finished @xCatG's task —— View job Voice Chat PCM Audio Debug - Code Review CompleteI've conducted a thorough review of this PR adding PCM audio debugging utilities and fixing binary storage. Here's my comprehensive analysis: Review Checklist
🔧 Critical Bug Fix Analysis✅ Binary Storage Fix (chat_logger.py:539)Issue: PCM audio data was being written as text using Fix: Correctly changed to Impact: Critical fix - without this, all PCM audio debugging was broken. The fix is minimal, focused, and correct. 🎯 Code Quality Assessment✅ debug_audio.py - Well Structured UtilityStrengths:
Minor Suggestions:
✅ Voice Handler (handler.py) - Solid ArchitectureStrengths:
Areas for improvement:
🔒 Security Analysis✅ Audio Logging SecurityPositive aspects:
Recommendations:
⚡ Performance Considerations✅ Efficient ImplementationGood practices:
Potential optimizations:
📊 Test Coverage Analysis✅ Comprehensive Testing InfrastructureExcellent test suite:
Documentation quality:
🐛 Potential Issues Found
|
|
Claude finished @xCatG's task —— View job Voice Chat PCM Audio Debug - Code Review CompleteI've conducted a thorough review of this PR adding PCM audio debugging utilities and fixing binary storage. Here's my comprehensive analysis: Review Checklist
🔧 Critical Bug Fix Analysis✅ Binary Storage Fix (chat_logger.py:539)Issue: PCM audio data was being written as text using Fix: Correctly changed to Impact: Critical fix - without this, all PCM audio debugging was broken. The fix is minimal, focused, and correct. 🎯 Code Quality Assessment✅ debug_audio.py - Well Structured UtilityStrengths:
Minor Suggestions:
✅ Voice Handler (handler.py) - Solid ArchitectureStrengths:
Areas for improvement:
🔒 Security Analysis✅ Audio Logging SecurityPositive aspects:
Recommendations:
⚡ Performance Considerations✅ Efficient ImplementationGood practices:
Potential optimizations:
📊 Test Coverage Analysis✅ Comprehensive Testing InfrastructureExcellent test suite:
Documentation quality:
🐛 Potential Issues Found
|
- Fix async context manager mocking for storage.lock in conftest - Remove problematic start_server fixture that caused shutdown messages - Add comprehensive unit tests for VoiceHandler (14 tests) - Add VoiceRequest model validation tests - Add PCM audio logging tests to ChatLogger test suite - Add debug_audio.py utility tests (with 3 skipped for simpleaudio) - Update test requirements with websockets dependency - Remove broken integration tests that relied on non-existent APIs Test Results: - 14/14 voice tests passing - 322 total tests passing (3 skipped) - No more server shutdown messages after tests - 57.51% overall test coverage 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
|
Claude finished @xCatG's task —— View job Voice Chat PCM Audio Debug - Code Review CompleteI've conducted a thorough review of this PR adding PCM audio debugging utilities and fixing binary storage. Here's my comprehensive analysis: Review Checklist
🔧 Critical Bug Fix Analysis✅ Binary Storage Fix (chat_logger.py:539)Issue: PCM audio data was being written as text using Fix: Correctly changed to Impact: Critical fix - without this, all PCM audio debugging was broken. The fix is minimal, focused, and correct. 🎯 Code Quality Assessment✅ debug_audio.py - Well Structured UtilityStrengths:
Minor Suggestions:
✅ Voice Handler (handler.py) - Solid ArchitectureStrengths:
Areas for improvement:
🔒 Security Analysis✅ Audio Logging SecurityPositive aspects:
Recommendations:
⚡ Performance Considerations✅ Efficient ImplementationGood practices:
Potential optimizations:
📊 Test Coverage Analysis✅ Comprehensive Testing InfrastructureExcellent test suite:
Documentation quality:
🐛 Potential Issues Found
|
- Add explicit production check in log_pcm_audio() to prevent accidental data collection - Document JWT in query params risk and future ticket-based auth approach - Document DoS vulnerability in session limit check with production warning - Add comprehensive security considerations to VoiceHandler class documentation Security improvements: - PCM audio logging now double-checked for production environment - Clear documentation of current limitations for beta deployment - Explicit acknowledgment of DoS risk with production mitigation plan - Future security TODOs documented for production readiness 🤖 Generated with Claude Code Co-Authored-By: Claude <noreply@anthropic.com>
|
Claude finished @xCatG's task —— View job Voice Chat PCM Audio Debug - Code Review CompleteI've conducted a thorough review of this PR adding PCM audio debugging utilities and fixing binary storage. Here's my comprehensive analysis: Review Checklist
🔧 Critical Bug Fix Analysis✅ Binary Storage Fix (chat_logger.py:549)Issue: PCM audio data was being written as text using Fix: Correctly changed to Impact: Critical fix - without this, all PCM audio debugging was broken. The fix is minimal, focused, and correct. 🎯 Code Quality Assessment✅ debug_audio.py - Well Structured UtilityStrengths:
Minor Suggestions:
✅ Voice Handler (handler.py) - Solid ArchitectureStrengths:
Areas for improvement:
🔒 Security Analysis✅ Audio Logging SecurityPositive aspects:
Recommendations:
⚡ Performance Considerations✅ Efficient ImplementationGood practices:
Potential optimizations:
📊 Test Coverage Analysis✅ Comprehensive Testing InfrastructureExcellent test suite:
Documentation quality:
🐛 Potential Issues Found
|
Summary
• Fixed critical PCM audio logging bug by changing
storage.write()tostorage.write_bytes()for proper binary data handling• Added comprehensive PCM audio debugging utility (
debug_audio.py) for reassembling and analyzing recorded audio chunks• Enhanced voice testing documentation with detailed audio debugging workflows
• Fixed voice chat tests and removed problematic server fixture
• Addressed security review feedback with defensive measures and documentation
Changes Made
Bug Fix
chat_logger.py:539by usingwrite_bytes()instead ofwrite()Audio Debug Utility
debug_audio.pyscript with three commands:info <session_dir>- Show audio session statistics, timing, and chunk analysisreassemble <session_dir>- Combine PCM chunks into playable WAV filesplay <session_dir>- Playback reassembled audio for debugging (requires simpleaudio)Test Improvements
storage.lockin confteststart_serverfixture that caused shutdown messages after testsSecurity Improvements
log_pcm_audio()method to prevent accidental data collection even if environment is misconfiguredDocumentation
Security Review Response
✅ Security Improvements Implemented
🔄 Security Considerations for Future PRs
📋 What Security Review Confirmed Was Done Well
Test Results
TODO for Future PRs
Testing Enhancements
Add error handling tests for ChatLogger
Implement simpleaudio mocking for debug_audio tests
Add WebSocket integration tests for VoiceHandler
Security Enhancements (Production Readiness)
Implement DoS protection via session limits
Implement ticket-based WebSocket authentication
Add role-based authorization for voice features
Code Quality & Robustness Improvements
Fix message numbering for voice transcripts
message_number=-1which could interfere with session sequencingAdd PCM debug file cleanup mechanism
Improve error handling and debugging
Add disk usage protection for voice sessions
Strengthen filename collision prevention
Technical Notes
users/{user_id}/voice_logs/{session_id}/audio_in_{timestamp}.pcm🤖 Generated with Claude Code