diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..a033790 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,47 @@ +# Repository Guidelines + +## Project Structure & Module Organization +- Backend (Python): `src/python/role_play/*` (API, chat, voice, evaluation, common). Entry point: `src/python/run_server.py`. +- Frontend (Vue + TS): `src/ts/role_play/ui` (Vite app: `src`, `components`, `composables`, `services`). +- Tests: `test/python/{unit,integration}` with shared fixtures in `test/python/fixtures`. +- Config: environment YAMLs in `config/{dev,beta,prod}.yaml` (env vars can override). Data/resources in `data/`. +- Tooling: `Makefile` (build/test/deploy), `Dockerfile`, `pytest.ini`, `.env(.example)`. + +## Build, Test, and Development Commands +- Backend setup: `python -m venv venv && source venv/bin/activate && pip install -r src/python/requirements-dev.txt`. +- Run API locally: `source venv/bin/activate && python src/python/run_server.py` (ensure `STORAGE_PATH` exists; defaults to `./data`). +- Frontend dev: `cd src/ts/role_play/ui && npm i && npm run dev` (Vite at http://localhost:5173). +- Test suite: `make test` (pytest with coverage) or `pytest -q` (see markers below). +- Docker (local): `make run-local-docker DATA_DIR=./data` (serves on http://localhost:8080). +- Build/Deploy: `make build-docker`, `make push-docker`, `make deploy ENV=dev` (requires GCP config; see `ENVIRONMENTS.md`). + +## Coding Style & Naming Conventions +- Python: format with Black; imports via isort; prefer type hints. Naming: `snake_case` (functions/modules), `PascalCase` (classes), `UPPER_SNAKE` (constants). +- TypeScript/Vue: `PascalCase` for components (`*.vue`), `camelCase` for composables/services (e.g., `useChatData.ts`). Two-space indent. +- Keep modules under existing namespaces (do not create parallel roots). + +## Testing Guidelines +- Framework: pytest. Coverage target: 25%+ (HTML at `test/python/htmlcov/index.html`). +- Discovery: files `test_*.py`; classes `Test*`; functions `test_*`. +- Markers: `unit`, `integration`, `e2e`, `slow`, `auth`, `storage`, `cloud`. Example: `pytest -m unit`. + +## Commit & Pull Request Guidelines +- Style: Conventional Commits when possible (e.g., `feat: ...`, `fix(deps): ...`). +- Commits: small, descriptive, present tense; reference issues (e.g., `#42`). +- PRs: include summary, rationale, test plan, and screenshots for UI changes. Link issues and note any config/devops changes. +- CI: ensure `make test` passes locally before requesting review. + +## Security & Configuration Tips +- Never commit secrets. Use `.env` for local dev; production secrets live in GCP Secret Manager. +- Adjust runtime via `config/*.yaml` and env vars (`PORT`, `STORAGE_PATH`, `CORS_ALLOWED_ORIGINS`, etc.). See `ENVIRONMENTS.md` and `STORAGE_CONFIG.md`. + +## Agent-Specific Instructions (Claude/Gemini) +- Architecture: layered modules; handlers are stateless and created per request/connection. Register handlers via YAML in `config/*.yaml`. +- Dependency Injection: use FastAPI `Depends()`; cache singletons with `functools.lru_cache` (e.g., ContentLoader, ChatLogger). Avoid mutable state on handler instances. +- Storage & Locking: abstract through `StorageBackend` (file/GCS/S3). Use key paths without extensions (e.g., `users/{user_id}/profile`). Separate lock lease duration from acquisition timeout; wrap blocking I/O with `asyncio.to_thread`. +- Chat System: persist messages as JSONL under `users/{user_id}/chat_logs/{session_id}`; create a fresh ADK runner per message; drive prompts by user language. See `/GEMINI.md` and root `/CLAUDE.md` for ADK notes. +- Evaluation Reports: store at `users/{user_id}/eval_reports/{session_id}/{timestamp_uuid}` with metadata; expose GET latest/all and POST re-evaluate endpoints. +- Frontend Patterns: domain-based Vue structure, composables for async ops and confirmations, sync TS types with Pydantic models, inject JWT via `Authorization: Bearer `; i18n supports `en` and `zh-TW`. +- Testing: prefer fast unit tests; mark `integration`, `e2e`, `slow`, `cloud` selectively. Use `make test-chat` for chat-only coverage. + +For deeper guidance, refer to: `GEMINI.md` (model/runtime, storage/locking overview), `CLAUDE.md` (repo-wide workflows), `src/python/CLAUDE.md` (Python DI/stateless patterns), `src/ts/CLAUDE.md` (frontend patterns), and `test/CLAUDE.md` (test layout and conventions). diff --git a/CLAUDE.md b/CLAUDE.md index b24760a..dde74b0 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -182,6 +182,29 @@ make test-specific TEST_PATH="test/python/unit/chat/test_chat_logger.py" - [x] **Internationalization**: Full English/Traditional Chinese support for new UI elements - [x] **CSS Improvements**: Fixed radio button alignment issues with proper flexbox layout +### Voice Chat with Direct ADK Integration (Completed & Radically Simplified) +- [x] **Radical Architecture Simplification**: Eliminated over-engineered transcript management and wrapper classes + - **Direct ADK Integration**: Handler stores `Runner`, `live_events`, `live_request_queue` directly + - **Native Transcript Handling**: Uses ADK's built-in `is_final` flags instead of custom buffering + - **Minimal Models**: Reduced from 150+ lines with 7+ types to 30 lines with 2 generic types (`VoiceRequest`, `VoiceMessage`) + - **Code Reduction**: ~470 lines removed, 4-layer abstraction simplified to 2-layer +- [x] **Streamlined Backend Voice Module** (`src/python/role_play/voice/`): + - **VoiceChatHandler**: Direct ADK integration with WebSocket endpoint (`/api/voice/ws/{session_id}`) + - **No Wrapper Classes**: Eliminated `LiveVoiceSession`, `TranscriptBuffer`, `SessionTranscriptManager` + - **ADK Event Processing**: Direct processing of `run_live()` events without intermediate transformations + - **Generic Models**: Flexible `VoiceRequest`/`VoiceMessage` with `extra="allow"` for any field structure +- [x] **Preserved Functionality**: All original features maintained with radical simplification + - **Transcript Capture**: Reliable logging using ADK's native finalization mechanisms + - **Real-time Streaming**: Bidirectional audio/text communication preserved + - **Session Management**: WebSocket lifecycle and error handling maintained + - **ChatLogger Integration**: Voice logging methods unchanged, full JSONL compatibility +- [x] **Architecture Benefits**: + - **Maximum Simplification**: Direct ADK utilization without wrapper overhead + - **Future-Proof**: Automatic benefits from ADK improvements + - **Maintainable**: Fewer abstractions, easier to understand and modify + - **Performance**: Reduced memory footprint and processing overhead +- [x] **Testing Updated**: 13 comprehensive tests covering simplified architecture, all 328 tests passing + ### Pending Development - [ ] **Resource Architecture for Script Creator**: - [x] Design LayeredResourceLoader for base + user resources (see RESOURCE_ARCHITECTURE.md) @@ -197,7 +220,6 @@ make test-specific TEST_PATH="test/python/unit/chat/test_chat_logger.py" - [ ] Create utility functions for date formatting across components - [ ] Add validation that session belongs to requesting user before creating evaluation reports - [ ] Add retry logic for transient storage failures in evaluation system -- [ ] WebSocket: `server/websocket.py` connection manager - [ ] Auth Module: Complete OAuth implementation - [ ] Scripter: Complete module implementation - [ ] Frontend: Modular monolith restructure, chat/eval interfaces @@ -279,6 +301,7 @@ make test-specific TEST_PATH="test/python/unit/chat/test_chat_logger.py" ### Architecture Highlights - **Storage**: Async distributed locking, lease (60-300s) vs timeout (5-30s) separation - **Chat**: Separated ADK runtime from JSONL persistence, per-message Runner creation, utility methods for JSONL parsing, centralized agent configuration +- **Voice**: Three-tier transcript management (partial/stabilization/final), ADK `run_live()` integration, intelligent buffering prevents fragmented logs - **Backend Structure**: Helper methods for session validation, message logging, content loading, response generation - **Frontend Patterns**: Composable architecture for modal management, async operations, data loading, dual-flow session creation with script/character selection - **Config**: YAML + env vars, dynamic handler loading, fail-fast validation diff --git a/config/dev.yaml b/config/dev.yaml index e5ad963..a7d67c5 100644 --- a/config/dev.yaml +++ b/config/dev.yaml @@ -59,6 +59,7 @@ enabled_handlers: user_account: "role_play.server.user_account_handler.UserAccountHandler" chat: "role_play.chat.handler.ChatHandler" evaluation: "role_play.evaluation.handler.EvaluationHandler" + voice: "role_play.voice.handler.VoiceChatHandler" # Add more handlers as they're implemented: # scripter: "role_play.scripter.handler.ScripterHandler" @@ -71,3 +72,29 @@ supported_languages: # Resource configuration resources: base_prefix: "resources/" + +# Voice chat configuration +voice: + # Transcript buffering settings + transcript: + stability_threshold: "${VOICE_STABILITY_THRESHOLD:0.8}" + finalization_timeout_ms: "${VOICE_FINALIZATION_TIMEOUT:2000}" + min_utterance_length: "${VOICE_MIN_UTTERANCE_LENGTH:3}" + sentence_boundary_patterns: + - "[.!?]+\\s*$" + - "\\n+" + + # Audio processing settings + audio: + default_format: "pcm" + default_sample_rate: 16000 + default_channels: 1 + default_bit_depth: 16 + chunk_size_ms: "${VOICE_CHUNK_SIZE_MS:100}" + + # Voice model settings + model: + default_voice: "Aoede" + gemini_api_key: "${GEMINI_API_KEY:}" + # Mock mode when no API key is available + enable_mock: "${VOICE_ENABLE_MOCK:true}" diff --git a/src/python/CLAUDE.md b/src/python/CLAUDE.md index 1dd0221..77ac0db 100644 --- a/src/python/CLAUDE.md +++ b/src/python/CLAUDE.md @@ -1,365 +1,15 @@ -# Python Implementation Guidelines + -## Handler Architecture +# src/python/CLAUDE.md (Redirect) -### Stateless Design -- **New instance per request**: Handlers instantiated via dependency injection -- **No instance variables**: Never store state in handler attributes -- **Request lifecycle**: HTTP handler lives for one request, WebSocket for connection duration +This document has moved. Please see `../../AGENTS.md` for the up-to-date, centralized guidance covering Python backend patterns, dependency injection, stateless handlers, storage/locking, and testing. -```python -# GOOD - Stateless handler -class ChatHandler(BaseHandler): - def __init__(self, auth_manager: AuthManager, chat_logger: ChatLogger): - self.auth_manager = auth_manager # Injected dependencies only - self.chat_logger = chat_logger +Direct link: `../../AGENTS.md` -# BAD - Stateful handler -class ChatHandler(BaseHandler): - def __init__(self): - self.sessions = {} # NEVER do this! -``` +Notes: +- This file remains only to preserve existing links and references. +- Do not update detailed guidance here; add or edit content in `AGENTS.md` instead. -## Dependency Injection - -### Singleton Services -Use `@lru_cache` for services that should be shared across requests: - -```python -# dependencies.py -from functools import lru_cache - -@lru_cache -def get_content_loader() -> ContentLoader: - """Singleton content loader - loads once, reused across requests""" - return ContentLoader() - -@lru_cache -def get_chat_logger(storage: StorageBackend = Depends(get_storage)) -> ChatLogger: - """Singleton chat logger with injected storage""" - return ChatLogger(storage) -``` - -### Factory Functions -Pure functions that create new instances: - -```python -def get_storage() -> StorageBackend: - """Factory - creates new storage instance per request""" - storage_path = os.environ.get("STORAGE_PATH", "./storage") - config = StorageConfig(type="file", path=storage_path) - return FileStorage(config) -``` - -## Async Operations - -### Use asyncio.to_thread for Blocking I/O -```python -async def read_file(self, path: str) -> str: - """Wrap blocking I/O in asyncio.to_thread for FastAPI""" - return await asyncio.to_thread(self._blocking_read, path) - -def _blocking_read(self, path: str) -> str: - """Actual blocking I/O operation""" - with open(path, 'r') as f: - return f.read() -``` - -## Storage Patterns - -### Key Conventions -- **No file extensions**: `users/123/profile` not `users/123.json` -- **User data prefix**: `users/{user_id}/...` -- **Opaque strings**: Keys work identically across FileStorage/GCS/S3 - -### Distributed Locking -```python -# Separate lease duration from acquisition timeout -lock_config = LockConfig( - strategy="file", - lease_duration_seconds=300, # Lock valid for 5 min if holder crashes - timeout=30 # Try acquiring for 30 seconds -) - -async with storage.lock("resource", timeout=30): - # Critical section - pass -``` - -## Chat System Implementation - -### Session Lifecycle -1. **Create**: ChatLogger creates JSONL, ADK stores metadata with user's preferred language -2. **Message**: Log → Create Runner with language context → Process → Log response → Discard Runner -3. **End**: Log session_end, remove from ADK memory -4. **Export**: Read JSONL directly, format as text - -### File Locking for JSONL -```python -# ChatLogger uses FileLock for concurrent access -with FileLock(f"{log_path}.lock", timeout=5): - with open(log_path, 'a') as f: - f.write(json.dumps(event) + '\n') -``` - -### ADK Integration -- **Per-message Runners**: Create new Runner for each message -- **No persistent state**: Runners immediately discarded after use -- **Separation of concerns**: ADK for runtime, ChatLogger for persistence -- **Language Context**: Agent system prompts include language instructions - -## Authentication Patterns - -### RoleChecker Dependency (Preferred) -```python -# Modern pattern using Depends() -@router.get("/admin/users") -async def list_users( - user: User = Depends(RoleChecker(min_role=UserRole.ADMIN)) -): - return {"users": []} -``` - -### Role Hierarchy -```python -ADMIN > SCRIPTER > USER > GUEST -``` - -## Evaluation System Implementation - -### Report Storage Pattern -```python -# Store evaluation reports with timestamp-based unique IDs -timestamp = utc_now_isoformat() -# Replace colons with underscores for filesystem compatibility -safe_timestamp = timestamp.replace(':', '_') -unique_id = str(uuid.uuid4())[:8] -storage_id = f"{safe_timestamp}_{unique_id}" -report_path = f"users/{user_id}/eval_reports/{session_id}/{storage_id}" - -# Report includes metadata and full evaluation -report_data = { - "eval_session_id": eval_session_id, - "chat_session_id": session_id, - "user_id": user_id, - "created_at": timestamp, - "evaluation_type": "comprehensive", - "report": final_review_report.model_dump() -} -``` - -### Evaluation Handler Patterns -```python -# Helper methods for report management -async def _get_latest_report(user_id, session_id, storage): - """Get most recent report by sorting keys""" - prefix = f"users/{user_id}/eval_reports/{session_id}/" - keys = await storage.list_keys(prefix) - if not keys: - return None - latest_key = sorted(keys, reverse=True)[0] - return json.loads(await storage.read(latest_key)) - -# Storage injection in handler methods -async def evaluate_session( - self, - request: EvaluationRequest, - current_user: User = Depends(require_user_or_higher), - storage: StorageBackend = Depends(get_storage_backend) -): - # Store report after generation - await storage.write(report_path, json.dumps(report_data)) -``` - -### API Design for Report Retrieval -- **GET /session/{id}/report**: Returns latest or 404 (check existing first) -- **POST /session/{id}/evaluate**: Always creates new (explicit re-evaluation) -- **GET /session/{id}/all_reports**: Historical reports list -- **GET /reports/{report_id}**: Specific report by ID - -### Evaluation Error Handling -```python -# Session ownership validation -async def _validate_session_ownership(user_id: str, session_id: str, chat_logger: ChatLogger): - """Validate that session belongs to user before evaluation.""" - sessions = await chat_logger.list_user_sessions(user_id) - session_ids = {s["session_id"] for s in sessions} - if session_id not in session_ids: - raise HTTPException(status_code=403, detail="Session access denied") - -# Storage error handling with retry -async def _store_report_with_retry(storage: StorageBackend, path: str, data: str, max_retries: int = 3): - """Store evaluation report with retry logic for transient failures.""" - for attempt in range(max_retries): - try: - await storage.write(path, data) - return - except Exception as e: - if attempt == max_retries - 1: - raise HTTPException(status_code=500, detail="Failed to store evaluation report") - logger.warning(f"Storage attempt {attempt + 1} failed: {e}") - await asyncio.sleep(2 ** attempt) # Exponential backoff -``` - -### Callback Implementation Patterns -```python -# TODO completion pattern for agents -def agent_callback(callback_context: CallbackContext, llm_response: LlmResponse) -> Optional[LlmResponse]: - """Post-process agent responses to complete TODOs and aggregate data.""" - if not llm_response.content or not llm_response.content.parts: - return None - - try: - # Parse structured output - response_data = json.loads(llm_response.content.parts[0].text) - - # Complete missing fields from callback state - if "area_assessments" not in response_data or not response_data["area_assessments"]: - response_data["area_assessments"] = _extract_assessments_from_state(callback_context.state) - - # Calculate derived fields (e.g., overall_score) - response_data["overall_score"] = _calculate_overall_score(response_data["area_assessments"]) - - # Return modified response - modified_parts = [copy.deepcopy(part) for part in llm_response.content.parts] - modified_parts[0].text = json.dumps(response_data) - return LlmResponse(content=types.Content(role="model", parts=modified_parts)) - - except Exception as e: - logger.error(f"Callback processing failed: {e}") - return None # Return original response on error -``` - -## Common Pitfalls - -1. **Global State**: Never use global variables, use dependency injection -2. **Blocking I/O**: Always wrap in `asyncio.to_thread()` for FastAPI -3. **File Extensions in Keys**: Storage keys should be extension-free -4. **Persistent Runners**: ADK Runners must be created per-message -5. **Handler State**: Handlers must remain stateless -6. **Report Storage**: Always include timestamps in report paths for uniqueness -7. **Session Validation**: Always validate session ownership before operations -8. **Storage Failures**: Handle transient storage errors with retry logic - -## Performance Considerations - -- **Singleton Services**: Use `@lru_cache` for expensive initializations -- **Concurrent JSONL**: Use FileLock with 5-second timeout -- **Lock Tuning**: Lease duration (60-300s) vs acquisition timeout (5-30s) -- **Async Everything**: All I/O operations should be async - -## Storage Monitoring - -### StorageMonitor Integration -```python -# Storage backends automatically use global StorageMonitor -from role_play.common.storage_monitoring import get_storage_monitor - -# Monitor tracks operations automatically -async with storage.read("key") as data: - # Read operation is monitored - pass - -async with storage.lock("resource") as lock: - # Lock acquisition/hold times tracked - pass -``` - -### Monitoring Metrics -- **Lock Metrics**: Acquisition attempts, successes, failures, timing -- **Storage Metrics**: Read/write/delete operations, error rates, latencies -- **Decision Support**: Automatic recommendations for lock strategy upgrades - -### Usage in Scripts -```python -# Validation and metadata scripts use asyncio patterns -class ResourceValidator: - def __init__(self, resource_dir: Path, storage_monitor: Optional[StorageMonitor] = None): - self.monitor = storage_monitor or get_storage_monitor() - - async def validate_all(self): - # Async validation with monitoring - async with self.monitor.monitor_storage_operation("read"): - data = await self._load_resource(path) -``` - -## Language Support Implementation - -### ContentLoader Language Architecture -```python -# Language-aware content loading -loader = ContentLoader(supported_languages=["en", "zh-TW", "ja"]) - -# Per-language caching -en_scenarios = loader.get_scenarios("en") -zh_scenarios = loader.get_scenarios("zh-TW") - -# Language-specific resource files -# scenarios.json (English default) -# scenarios_zh-TW.json (Traditional Chinese) -``` - -### User Language Preferences -```python -# User model with language preference -class User(BaseModel): - preferred_language: str = "en" # IETF BCP 47 format - -# Language preference API -@router.patch("/auth/language") -async def update_language_preference( - request: UpdateLanguageRequest, - current_user: User = Depends(get_current_user) -): - # Update user language preference - pass -``` - -### Chat Handler Language Context -```python -# Session creation with user language -async def create_session(request: CreateSessionRequest, current_user: User): - user_language = current_user.preferred_language - - # Load content in user's language - scenario = content_loader.get_scenario_by_id(request.scenario_id, user_language) - character = content_loader.get_character_by_id(request.character_id, user_language) - - # Agent with language-specific instructions - system_prompt = f""" - **IMPORTANT: Respond in {language_name} language as specified.** - {character.system_prompt} - """ -``` - -### Language Validation Patterns -```python -# ContentLoader language validation -def _validate_languages(self, data: Dict) -> None: - for scenario in data.get("scenarios", []): - scenario_lang = scenario.get("language", "en") - if scenario_lang not in self.supported_languages: - raise ValueError(f"Unsupported language '{scenario_lang}'") -``` - -## Testing Patterns - -### Mock Storage for Evaluation Tests -```python -@pytest.fixture -def mock_storage(): - """Create mock storage backend for evaluation tests.""" - storage = AsyncMock() - storage.write = AsyncMock() - storage.read = AsyncMock() - storage.list_keys = AsyncMock() - return storage - -# Inject into test methods -async def test_evaluate_session(mock_storage): - response = await handler.evaluate_session( - request=request, - current_user=user, - storage=mock_storage - ) -``` \ No newline at end of file diff --git a/src/python/role_play/chat/chat_logger.py b/src/python/role_play/chat/chat_logger.py index bfbe029..65efed4 100644 --- a/src/python/role_play/chat/chat_logger.py +++ b/src/python/role_play/chat/chat_logger.py @@ -456,4 +456,136 @@ async def export_session_text(self, user_id: str, session_id: str, export_format lines.append("SESSION ACTIVE OR NOT PROPERLY ENDED") lines.append("=" * 70) - return "\n".join(lines) \ No newline at end of file + return "\n".join(lines) + + async def log_voice_message( + self, + user_id: str, + session_id: str, + role: str, + transcript_text: str, + duration_ms: int, + confidence: float, + message_number: int, + voice_metadata: Optional[Dict[str, Any]] = None + ) -> None: + """ + Logs a voice message with transcript and metadata. + + Args: + user_id: The user ID who owns the session. + session_id: The application session ID. + role: The role of the speaker ("user", "assistant"). + transcript_text: The transcribed text from speech. + duration_ms: Duration of the speech in milliseconds. + confidence: Confidence score of the transcription (0.0-1.0). + message_number: The sequential number of the message in the session. + voice_metadata: Optional voice-specific metadata. + """ + storage_path = self._get_chat_log_path(user_id, session_id) + + if not await self.storage.exists(storage_path): + logger.error(f"Log file {storage_path} does not exist. Cannot log voice message.") + raise StorageError(f"Session log file not found: {storage_path}") + + voice_message_event = { + "type": "voice_message", + "timestamp": utc_now_isoformat(), + "app_session_id": session_id, + "role": role, + "content": transcript_text, + "message_number": message_number, + "voice_metadata": { + "duration_ms": duration_ms, + "confidence": confidence, + "is_voice": True, + **(voice_metadata or {}) + } + } + + try: + async with self.storage.lock(storage_path): + # Append the voice message event as a new line + event_line = json.dumps(voice_message_event) + '\n' + await self.storage.append(storage_path, event_line) + + logger.debug(f"Logged voice message to {storage_path} (Msg#: {message_number}, Role: {role}, Duration: {duration_ms}ms)") + except Exception as e: + logger.error(f"Error logging voice message to {storage_path}: {e}") + raise + + async def log_voice_session_start( + self, + user_id: str, + session_id: str, + voice_config: Dict[str, Any] + ) -> None: + """ + Logs the start of voice capabilities for a session. + + Args: + user_id: The user ID who owns the session. + session_id: The application session ID. + voice_config: Voice configuration details. + """ + storage_path = self._get_chat_log_path(user_id, session_id) + + if not await self.storage.exists(storage_path): + logger.error(f"Log file {storage_path} does not exist. Cannot log voice session start.") + raise StorageError(f"Session log file not found: {storage_path}") + + voice_start_event = { + "type": "voice_session_start", + "timestamp": utc_now_isoformat(), + "app_session_id": session_id, + "voice_config": voice_config + } + + try: + async with self.storage.lock(storage_path): + # Append the voice session start event + event_line = json.dumps(voice_start_event) + '\n' + await self.storage.append(storage_path, event_line) + + logger.info(f"Logged voice session start for {session_id}") + except Exception as e: + logger.error(f"Error logging voice session start for {session_id}: {e}") + raise + + async def log_voice_session_end( + self, + user_id: str, + session_id: str, + voice_stats: Dict[str, Any] + ) -> None: + """ + Logs the end of voice capabilities for a session. + + Args: + user_id: The user ID who owns the session. + session_id: The application session ID. + voice_stats: Voice session statistics. + """ + storage_path = self._get_chat_log_path(user_id, session_id) + + if not await self.storage.exists(storage_path): + logger.warning(f"Log file {storage_path} does not exist for voice session end.") + return # Don't raise error since session might be deleted + + voice_end_event = { + "type": "voice_session_end", + "timestamp": utc_now_isoformat(), + "app_session_id": session_id, + "voice_stats": voice_stats + } + + try: + async with self.storage.lock(storage_path): + # Append the voice session end event + event_line = json.dumps(voice_end_event) + '\n' + await self.storage.append(storage_path, event_line) + + logger.info(f"Logged voice session end for {session_id}") + except Exception as e: + logger.error(f"Error logging voice session end for {session_id}: {e}") + raise \ No newline at end of file diff --git a/src/python/role_play/common/resource_loader.py b/src/python/role_play/common/resource_loader.py index f115cb0..af65802 100644 --- a/src/python/role_play/common/resource_loader.py +++ b/src/python/role_play/common/resource_loader.py @@ -4,7 +4,7 @@ import os from typing import Any, Dict, List -from role_play.common.storage import StorageBackend +from .storage import StorageBackend logger = logging.getLogger(__name__) diff --git a/src/python/role_play/voice/__init__.py b/src/python/role_play/voice/__init__.py new file mode 100644 index 0000000..03a1628 --- /dev/null +++ b/src/python/role_play/voice/__init__.py @@ -0,0 +1,12 @@ +"""Voice chat module for real-time bidirectional audio communication.""" + +from .handler import VoiceChatHandler +from .models import VoiceRequest, VoiceMessage +from .config import VoiceConfig + +__all__ = [ + "VoiceChatHandler", + "VoiceRequest", + "VoiceMessage", + "VoiceConfig", +] \ No newline at end of file diff --git a/src/python/role_play/voice/config.py b/src/python/role_play/voice/config.py new file mode 100644 index 0000000..e1961cd --- /dev/null +++ b/src/python/role_play/voice/config.py @@ -0,0 +1,24 @@ +"""Voice chat configuration constants.""" + + +class VoiceConfig: + """Configuration constants for voice chat functionality.""" + + # Audio parameters + AUDIO_SAMPLE_RATE = 16000 + AUDIO_CHANNELS = 1 + AUDIO_BIT_DEPTH = 16 + AUDIO_FORMAT = "pcm" + + # Size limits for security + MAX_AUDIO_CHUNK_SIZE = 1024 * 100 # 100KB per audio chunk + MAX_TEXT_SIZE = 1024 * 10 # 10KB per text message + + # Session management + MAX_SESSIONS_PER_USER = 5 # Prevent resource exhaustion + SESSION_TIMEOUT_SECONDS = 3600 # 1 hour timeout + + # WebSocket codes + WS_MISSING_TOKEN = 1008 + WS_INVALID_TOKEN = 1008 + WS_SESSION_NOT_FOUND = 1008 \ No newline at end of file diff --git a/src/python/role_play/voice/handler.py b/src/python/role_play/voice/handler.py new file mode 100644 index 0000000..8b9340b --- /dev/null +++ b/src/python/role_play/voice/handler.py @@ -0,0 +1,429 @@ +"""Direct ADK integration voice handler - radically simplified. + +Architecture Flow: + Client (WebSocket) + ↓↑ + VoiceChatHandler (Direct ADK integration) + ↓↑ + ADK Runner (run_live streaming) + ↓↑ + Gemini Live API + +Design Principles: +- No intermediate wrappers or abstractions +- Stateless handler design (no session tracking in handler instance) +- ADK events processed directly without transformation +- Uses ADK's native is_final flags for transcript finalization +- Minimal models: VoiceRequest/VoiceMessage with flexible fields +- WebSocket-scoped ADK components (created per connection) + +Security Features: +- JWT authentication for WebSocket connections +- Input validation with size limits (100KB audio, 10KB text) +- Session limits per user (max 5 concurrent) +- Proper error handling and resource cleanup +""" + +import asyncio +import logging +import base64 +from typing import Optional, Dict, Any, Tuple, Protocol +from fastapi import WebSocket, WebSocketDisconnect, Query, HTTPException, APIRouter +from google.adk.runners import Runner +from google.adk.agents import LiveRequestQueue +from google.adk.agents.run_config import RunConfig +from google.genai.types import AudioTranscriptionConfig, Content, Part, Blob + +from ..server.base_handler import BaseHandler +from ..server.dependencies import ( + get_chat_logger, get_adk_session_service, get_storage_backend, get_auth_manager +) +from ..common.models import User +from ..common.time_utils import utc_now_isoformat +from ..common.storage import StorageBackend +from ..chat.chat_logger import ChatLogger +from ..dev_agents.roleplay_agent.agent import get_production_agent +from google.adk.sessions import InMemorySessionService + +from .models import VoiceRequest, VoiceMessage +from .config import VoiceConfig + +logger = logging.getLogger(__name__) + + +class ADKEvent(Protocol): + """Protocol for ADK live event types.""" + turn_complete: Optional[bool] + interrupted: Optional[bool] + input_transcription: Optional[Any] + output_transcription: Optional[Any] + content: Optional[Any] + + +class VoiceChatHandler(BaseHandler): + """Direct ADK integration for voice chat.""" + + def __init__(self): + super().__init__() + + @property + def router(self) -> APIRouter: + if self._router is None: + self._router = APIRouter() + self._router.websocket("/ws/{session_id}")(self.voice_websocket_endpoint) + return self._router + + @property + def prefix(self) -> str: + return "/voice" + + async def voice_websocket_endpoint(self, websocket: WebSocket, session_id: str): + await websocket.accept() + token = websocket.query_params.get("token") + if not token: + await websocket.close(code=VoiceConfig.WS_MISSING_TOKEN, reason="Missing token parameter") + return + await self.handle_voice_session(websocket, session_id, token) + + async def handle_voice_session(self, websocket: WebSocket, session_id: str, token: str): + """Handle voice chat with direct ADK integration.""" + user, adk_components = None, None + try: + logger.info(f"Voice WebSocket connection for session {session_id}") + + # Validate user and session + user = await self._validate_jwt_token(token) + if not user: + await websocket.close(code=VoiceConfig.WS_INVALID_TOKEN, reason="Invalid authentication token") + return + + storage = get_storage_backend() + chat_logger = get_chat_logger(storage) + adk_session_service = get_adk_session_service() + + # Check session limits per user + if not await self._check_session_limit(user.id, storage): + await websocket.close(code=VoiceConfig.WS_INVALID_TOKEN, reason="Maximum sessions per user exceeded") + return + + adk_session = await self._validate_session(session_id, user.id, adk_session_service, chat_logger) + if not adk_session: + await websocket.close(code=VoiceConfig.WS_SESSION_NOT_FOUND, reason="Session not found or access denied") + return + + # Send initial status + await websocket.send_json({ + "type": "status", + "status": "connecting", + "message": "Initializing voice session" + }) + + # Initialize ADK components directly + adk_components = await self._initialize_adk(session_id, user, adk_session, adk_session_service) + + # Send configuration + await websocket.send_json({ + "type": "config", + "audio_format": VoiceConfig.AUDIO_FORMAT, + "sample_rate": VoiceConfig.AUDIO_SAMPLE_RATE, + "channels": VoiceConfig.AUDIO_CHANNELS, + "bit_depth": VoiceConfig.AUDIO_BIT_DEPTH, + "language": getattr(user, 'preferred_language', 'en') + }) + + # Log session start + await chat_logger.log_voice_session_start(user.id, session_id, voice_config={ + "language": getattr(user, 'preferred_language', 'en') + }) + + await websocket.send_json({ + "type": "status", + "status": "ready", + "message": "Voice session ready" + }) + + # Handle bidirectional streaming + await self._handle_streaming(websocket, adk_components, chat_logger, user.id) + + except WebSocketDisconnect: + logger.info(f"WebSocket disconnected for session {session_id}") + await self._handle_connection_error(session_id, adk_components) + except ConnectionError as e: + logger.error(f"Connection error for session {session_id}: {e}") + await self._handle_connection_error(session_id, adk_components) + except Exception as e: + logger.error(f"Unexpected error for session {session_id}: {e}", exc_info=True) + await self._handle_connection_error(session_id, adk_components) + try: + await websocket.send_json({ + "type": "error", + "error": str(e), + "timestamp": utc_now_isoformat() + }) + except: + pass # Connection might be closed + finally: + if adk_components: + stats = await self._cleanup_adk(adk_components) + if user: + storage = get_storage_backend() + chat_logger = get_chat_logger(storage) + await chat_logger.log_voice_session_end(user.id, session_id, voice_stats=stats) + logger.info(f"Voice session {session_id} cleanup completed") + + async def _initialize_adk(self, session_id: str, user: User, adk_session: Any, adk_session_service: InMemorySessionService) -> Dict[str, Any]: + """Initialize ADK components directly.""" + # Create agent + agent = await get_production_agent( + character_id=adk_session.state.get("character_id"), + scenario_id=adk_session.state.get("scenario_id"), + language=getattr(user, 'preferred_language', 'en'), + scripted=bool(adk_session.state.get("script_data")) + ) + if not agent: + raise ValueError("Failed to create roleplay agent") + + # Create runner and start live streaming + runner = Runner(app_name="roleplay_voice", agent=agent, session_service=adk_session_service) + run_config = RunConfig( + response_modalities=["AUDIO"], + output_audio_transcription=AudioTranscriptionConfig(), + input_audio_transcription=AudioTranscriptionConfig() + ) + live_request_queue = LiveRequestQueue() + live_events = runner.run_live( + session=adk_session, + live_request_queue=live_request_queue, + run_config=run_config + ) + + return { + "session_id": session_id, + "user_id": user.id, + "runner": runner, + "live_events": live_events, + "live_request_queue": live_request_queue, + "adk_session": adk_session, + "active": True, + "stats": { + "started_at": utc_now_isoformat(), + "audio_chunks_sent": 0, + "audio_chunks_received": 0, + "transcripts_processed": 0, + "errors": 0 + } + } + + async def _handle_streaming(self, websocket: WebSocket, adk: Dict[str, Any], chat_logger: ChatLogger, user_id: str): + """Handle bidirectional streaming with direct ADK integration.""" + receive_task = asyncio.create_task(self._receive_from_client(websocket, adk)) + send_task = asyncio.create_task(self._send_to_client(websocket, adk, chat_logger, user_id)) + + done, pending = await asyncio.wait([receive_task, send_task], return_when=asyncio.FIRST_COMPLETED) + for task in pending: + task.cancel() + + async def _receive_from_client(self, websocket: WebSocket, adk: Dict[str, Any]): + """Receive from client and send directly to ADK.""" + try: + while adk["active"]: + data = await websocket.receive_text() + + try: + request = VoiceRequest.model_validate_json(data) + except ValueError as e: + logger.warning(f"Invalid request data: {e}") + adk["stats"]["errors"] += 1 + continue + + if request.end_session: + adk["active"] = False + adk["live_request_queue"].close() + break + + try: + # Send directly to ADK + if request.mime_type == "audio/pcm": + blob = Blob(mime_type=request.mime_type, data=request.decode_data()) + await adk["live_request_queue"].send_realtime(blob) + adk["stats"]["audio_chunks_sent"] += 1 + elif request.mime_type == "text/plain": + content = Content(parts=[Part(text=request.decode_data())]) + await adk["live_request_queue"].send_content(content) + except ValueError as e: + logger.warning(f"Data validation error: {e}") + adk["stats"]["errors"] += 1 + except Exception as e: + logger.error(f"Error sending to ADK: {e}") + adk["stats"]["errors"] += 1 + + except WebSocketDisconnect: + logger.info(f"Client disconnected from session {adk['session_id']}") + adk["active"] = False + except Exception as e: + logger.error(f"Error receiving from client: {e}") + adk["active"] = False + + async def _send_to_client(self, websocket: WebSocket, adk: Dict[str, Any], chat_logger: ChatLogger, user_id: str): + """Process ADK events directly and send to client.""" + message_counter = 0 + + try: + async for event in adk["live_events"]: + if not adk["active"]: + break + + message = self._process_adk_event(event, adk["stats"]) + if message: + # Log final transcripts + if message["type"] == "transcript_final": + message_counter += 1 + await chat_logger.log_voice_message( + user_id=user_id, + session_id=adk["session_id"], + role=message["role"], + transcript_text=message["text"], + duration_ms=0, + confidence=message.get("confidence", 1.0), + message_number=message_counter, + voice_metadata={} + ) + + # Send to client + await websocket.send_json(message) + + except asyncio.CancelledError: + logger.info(f"Event processing cancelled for session {adk['session_id']}") + except ConnectionError as e: + logger.error(f"Connection error during event processing: {e}") + adk["stats"]["errors"] += 1 + except Exception as e: + logger.error(f"Unexpected error processing events: {e}", exc_info=True) + adk["stats"]["errors"] += 1 + try: + await websocket.send_json({ + "type": "error", + "error": str(e), + "timestamp": utc_now_isoformat() + }) + except: + pass # Connection might be closed + + def _process_adk_event(self, event: ADKEvent, stats: Dict[str, Any]) -> Optional[Dict[str, Any]]: + """Process a single ADK event directly.""" + stats["transcripts_processed"] += 1 + + # Turn status events + if hasattr(event, 'turn_complete') or hasattr(event, 'interrupted'): + return { + "type": "turn_status", + "turn_complete": getattr(event, 'turn_complete', False), + "interrupted": getattr(event, 'interrupted', False), + "timestamp": utc_now_isoformat() + } + + # Transcript events + if hasattr(event, 'input_transcription') and event.input_transcription: + return self._process_transcript(event.input_transcription, "user") + if hasattr(event, 'output_transcription') and event.output_transcription: + return self._process_transcript(event.output_transcription, "assistant") + + # Audio events + if hasattr(event, 'content') and event.content and event.content.parts: + for part in event.content.parts: + if hasattr(part, 'inline_data') and part.inline_data: + stats["audio_chunks_received"] += 1 + return { + "type": "audio", + "data": base64.b64encode(part.inline_data.data).decode('utf-8'), + "mime_type": part.inline_data.mime_type, + "timestamp": utc_now_isoformat() + } + + return None + + def _process_transcript(self, transcription: Any, role: str) -> Dict[str, Any]: + """Process transcript from ADK.""" + is_final = getattr(transcription, 'is_final', True) + + if is_final: + return { + "type": "transcript_final", + "text": transcription.text, + "role": role, + "confidence": getattr(transcription, 'confidence', 1.0), + "timestamp": utc_now_isoformat() + } + else: + return { + "type": "transcript_partial", + "text": transcription.text, + "role": role, + "stability": getattr(transcription, 'stability', 1.0), + "timestamp": utc_now_isoformat() + } + + async def _cleanup_adk(self, adk: Dict[str, Any]) -> Dict[str, Any]: + """Cleanup ADK components.""" + adk["active"] = False + if adk["live_request_queue"]: + adk["live_request_queue"].close() + + stats = {**adk["stats"], "ended_at": utc_now_isoformat()} + logger.info(f"Session {adk['session_id']} stats: {stats}") + return stats + + async def _validate_jwt_token(self, token: str) -> Optional[User]: + """Validate JWT token and return user.""" + try: + storage = get_storage_backend() + auth_manager = get_auth_manager(storage) + token_data = auth_manager.verify_token(token) + return await storage.get_user(token_data.user_id) + except Exception as e: + logger.error(f"JWT validation error: {e}") + return None + + async def _validate_session(self, session_id: str, user_id: str, adk_session_service: InMemorySessionService, chat_logger: ChatLogger) -> Optional[Any]: + """Validate that a chat session exists and belongs to the user.""" + adk_session = await adk_session_service.get_session( + app_name="roleplay_chat", user_id=user_id, session_id=session_id + ) + if adk_session: + return adk_session + if await chat_logger.get_session_end_info(user_id, session_id): + logger.warning(f"Attempted to connect to ended session {session_id}") + return None + logger.warning(f"Session {session_id} not found for user {user_id}") + return None + + async def _check_session_limit(self, user_id: str, storage: StorageBackend) -> bool: + """Check if user hasn't exceeded session limit. + + TODO: Implement distributed session tracking via storage backend + For now, always return True (no limit enforcement). + + Future implementation: + - Store active sessions in storage: voice_sessions/{user_id}/active/{session_id} + - Include server_id, started_at timestamp + - Clean up stale sessions (>1 hour old) + - Count active sessions across all servers + - Enforce MAX_SESSIONS_PER_USER limit + + Example: + active_sessions = await storage.list_keys(f"voice_sessions/{user_id}/active/") + # Filter stale sessions older than 1 hour + # Return len(active_sessions) < VoiceConfig.MAX_SESSIONS_PER_USER + """ + # For now, no limit enforcement in distributed environment + return True + + async def _handle_connection_error(self, session_id: str, adk_components: Optional[Dict] = None): + """Clean up resources on connection error.""" + if adk_components: + try: + await self._cleanup_adk(adk_components) + except Exception as e: + logger.error(f"Error during cleanup for {session_id}: {e}") + finally: + logger.info(f"Cleaned up session {session_id} after connection error") \ No newline at end of file diff --git a/src/python/role_play/voice/models.py b/src/python/role_play/voice/models.py new file mode 100644 index 0000000..c457b9d --- /dev/null +++ b/src/python/role_play/voice/models.py @@ -0,0 +1,48 @@ +"""Simplified voice models - minimal essential types.""" + +import base64 +from typing import Optional, Dict, Any, Union +from pydantic import BaseModel, Field, validator + +from .config import VoiceConfig + + +class VoiceRequest(BaseModel): + """Generic client request for voice sessions.""" + mime_type: str = Field(..., description="MIME type (audio/pcm, text/plain)") + data: str = Field(..., description="Base64-encoded data") + end_session: bool = Field(default=False, description="Whether to end session") + + @validator('mime_type') + def validate_mime_type(cls, v): + """Validate MIME type is supported.""" + allowed = ['audio/pcm', 'text/plain'] + if v not in allowed: + raise ValueError(f"Unsupported MIME type: {v}. Allowed: {allowed}") + return v + + def decode_data(self) -> Union[bytes, str]: + """Decode and validate base64 data.""" + try: + data = base64.b64decode(self.data) + except Exception as e: + raise ValueError(f"Invalid base64 data: {e}") + + if self.mime_type.startswith("audio/"): + if len(data) > VoiceConfig.MAX_AUDIO_CHUNK_SIZE: + raise ValueError(f"Audio chunk too large: {len(data)} bytes (max: {VoiceConfig.MAX_AUDIO_CHUNK_SIZE})") + return data + else: + if len(data) > VoiceConfig.MAX_TEXT_SIZE: + raise ValueError(f"Text too large: {len(data)} bytes (max: {VoiceConfig.MAX_TEXT_SIZE})") + return data.decode('utf-8') + + +class VoiceMessage(BaseModel): + """Generic server message for voice sessions.""" + type: str = Field(..., description="Message type") + timestamp: Optional[str] = Field(None, description="ISO timestamp") + + class Config: + """Pydantic configuration.""" + extra = "allow" # Allow any additional fields for flexibility \ No newline at end of file diff --git a/src/ts/CLAUDE.md b/src/ts/CLAUDE.md index 8535dfe..b45bffb 100644 --- a/src/ts/CLAUDE.md +++ b/src/ts/CLAUDE.md @@ -1,445 +1,15 @@ -# TypeScript/Frontend Implementation Guidelines + -## Directory Rules -ONLY create TypeScript source code files under this directory. +# src/ts/CLAUDE.md (Redirect) -## Architecture Overview +This document has moved. Please see `../../AGENTS.md` for the up-to-date, centralized guidance covering frontend (Vue + TS) patterns, composables, services, and i18n. -### Current Structure -- **Domain-Based Organization**: Separated by feature (auth/, chat/, evaluation/) -- **Composable Patterns**: Reusable Vue composables for common workflows -- **Type Safety**: Full TypeScript with backend Pydantic model sync +Direct link: `../../AGENTS.md` -### Domain Organization -``` -src/ts/role_play/ -├── types/ # TypeScript interfaces -├── services/ # API clients -├── composables/ # Reusable Vue logic -├── components/ # UI components by domain -└── views/ # Page-level components -``` +Notes: +- This file remains only to preserve existing links and references. +- Do not update detailed guidance here; add or edit content in `AGENTS.md` instead. -## Type Synchronization - -### Backend Pydantic → Frontend TypeScript -Always keep types in sync with Python models: - -```python -# Python (Pydantic) -class User(BaseModel): - id: str - email: str - role: UserRole - preferred_language: str = "en" - created_at: datetime -``` - -```typescript -// TypeScript -interface User { - id: string; - email: string; - role: UserRole; - preferred_language: string; - createdAt: string; // ISO 8601 UTC -} -``` - -### API Response Types -```typescript -interface ApiResponse { - data?: T; - error?: string; - status: number; -} -``` - -## Composable Patterns - -### Reusable Vue Composables -```typescript -// composables/useAsyncOperation.ts -export function useAsyncOperation() { - const loading = ref(false); - const error = ref(null); - - const execute = async (operation: () => Promise): Promise => { - loading.value = true; - error.value = null; - try { - return await operation(); - } catch (e) { - error.value = e instanceof Error ? e.message : 'Unknown error'; - return null; - } finally { - loading.value = false; - } - }; - - return { loading: readonly(loading), error: readonly(error), execute }; -} - -// composables/useConfirmModal.ts -export function useConfirmModal() { - const showModal = ref(false); - const modalConfig = ref({}); - - const confirm = (config: ConfirmModalConfig): Promise => { - return new Promise((resolve) => { - modalConfig.value = { ...config, onConfirm: () => resolve(true), onCancel: () => resolve(false) }; - showModal.value = true; - }); - }; - - return { showModal, modalConfig, confirm }; -} -``` - -## State Management - -### Store Pattern -```typescript -// stores/auth.ts -export const useAuthStore = defineStore('auth', { - state: () => ({ - user: null as User | null, - token: null as string | null, - }), - - actions: { - async login(credentials: LoginRequest) { - const response = await authApi.login(credentials); - this.token = response.token; - this.user = response.user; - } - } -}); -``` - -## API Integration - -### Service Layer -```typescript -// services/auth-api.ts -class AuthApi { - private baseUrl = '/api/auth'; - - async login(data: LoginRequest): Promise { - const response = await fetch(`${this.baseUrl}/login`, { - method: 'POST', - headers: { 'Content-Type': 'application/json' }, - body: JSON.stringify(data) - }); - - if (!response.ok) { - throw new ApiError(response.status, await response.text()); - } - - return response.json(); - } -} - -export const authApi = new AuthApi(); -``` - -### Token Management -```typescript -// Automatic token injection -fetch(url, { - headers: { - 'Authorization': `Bearer ${authStore.token}`, - 'Content-Type': 'application/json' - } -}); -``` - -## Component Guidelines - -### Domain Components -```typescript -// components/chat/MessageList.vue - - - -``` - -### Cross-Domain Integration -```typescript -// When chat needs to show user info -import { useAuthStore } from '@/stores/auth'; -import { useChatStore } from '@/stores/chat'; - -const authStore = useAuthStore(); -const chatStore = useChatStore(); - -// Access current user from auth domain -const currentUser = computed(() => authStore.user); -``` - -## Development Patterns - -### Environment Variables -```typescript -const API_BASE = import.meta.env.VITE_API_BASE || 'http://localhost:8000'; -``` - -### Error Handling -```typescript -try { - await chatApi.sendMessage(sessionId, message); -} catch (error) { - if (error instanceof ApiError) { - if (error.status === 401) { - // Handle auth error - await authStore.logout(); - } - } -} -``` - -### WebSocket Integration (Future) -```typescript -// services/chat-websocket.ts -class ChatWebSocket { - private ws: WebSocket | null = null; - - connect(sessionId: string, token: string) { - this.ws = new WebSocket(`ws://localhost:8000/ws/chat/${sessionId}`); - - // Send auth token as first message - this.ws.onopen = () => { - this.ws?.send(JSON.stringify({ type: 'auth', token })); - }; - } -} -``` - -## Build & Development - -### Vite Configuration -```javascript -// vite.config.js -export default { - server: { - host: '0.0.0.0', // For container support - proxy: { - '/api': 'http://localhost:8000' - } - } -} -``` - -### Type Checking -```bash -npm run type-check # Run TypeScript compiler without emit -``` - -## Internationalization (i18n) - -### Vue i18n Setup -```typescript -// main.ts -import { createI18n } from 'vue-i18n' -import en from './locales/en.json' -import zhTW from './locales/zh-TW.json' - -const i18n = createI18n({ - locale: 'en', - fallbackLocale: 'en', - messages: { en, 'zh-TW': zhTW } -}) -``` - -### Language Management -```typescript -// Language preference sync with backend -async function updateLanguagePreference(language: string) { - // Update Vue i18n locale - i18n.global.locale.value = language - - // Persist to localStorage - localStorage.setItem('language', language) - - // Sync with backend if authenticated - if (authStore.token) { - await authApi.updateLanguagePreference(authStore.token, { language }) - authStore.user.preferred_language = language - } -} -``` - -### Component Localization -```vue - - - -``` - -### Language-Specific API Types -```typescript -// Language preference API types -interface UpdateLanguageRequest { - language: string; // IETF BCP 47 format: "en", "zh-TW" -} - -interface UpdateLanguageResponse { - success: boolean; - language: string; - message: string; -} - -// Content API with language support -interface GetScenariosParams { - language?: string; // Filter scenarios by language -} -``` - -## Evaluation System Integration - -### Evaluation API Types -```typescript -// Core evaluation types -interface StoredEvaluationReport { - success: boolean; - report_id: string; - chat_session_id: string; - created_at: string; - evaluation_type: string; - report: FinalReviewReport; -} - -interface EvaluationReportSummary { - report_id: string; - chat_session_id: string; - created_at: string; - evaluation_type: string; -} - -interface EvaluationReportListResponse { - success: boolean; - reports: EvaluationReportSummary[]; -} -``` - -### Evaluation Service Implementation -```typescript -// services/evaluationApi.ts -export const evaluationApi = { - // Check for existing report first - async getLatestReport(sessionId: string): Promise { - try { - const response = await fetch(`/api/eval/session/${sessionId}/report`, { - headers: { 'Authorization': `Bearer ${authStore.token}` } - }); - if (response.status === 404) return null; - if (!response.ok) throw new Error('Failed to fetch report'); - return await response.json(); - } catch (error) { - throw error; - } - }, - - // Always creates new evaluation - async createNewEvaluation(sessionId: string, evaluationType = 'comprehensive'): Promise { - const response = await fetch(`/api/eval/session/${sessionId}/evaluate?evaluation_type=${evaluationType}`, { - method: 'POST', - headers: { 'Authorization': `Bearer ${authStore.token}` } - }); - if (!response.ok) throw new Error('Failed to create evaluation'); - return await response.json(); - }, - - // List all historical reports - async listAllReports(sessionId: string): Promise { - const response = await fetch(`/api/eval/session/${sessionId}/all_reports`, { - headers: { 'Authorization': `Bearer ${authStore.token}` } - }); - if (!response.ok) throw new Error('Failed to list reports'); - return await response.json(); - } -}; -``` - -### Smart Report Loading Pattern -```typescript -// Using composables for evaluation workflow -const { loading: evaluationLoading, execute } = useAsyncOperation(); -const { confirm } = useConfirmModal(); - -const sendToEvaluation = async () => { - showEvaluationReport.value = true; - - const result = await execute(async () => { - // First check for existing report - const existingReport = await evaluationApi.getLatestReport(session.session_id); - - if (existingReport) { - evaluationReport.value = existingReport.report; - isExistingReport.value = true; - return existingReport; - } else { - // Generate new report only if none exists - const newReport = await evaluationApi.createNewEvaluation(session.session_id); - evaluationReport.value = newReport.report; - isExistingReport.value = false; - return newReport; - } - }); - - if (!result) { - showEvaluationReport.value = false; // Hide on error - } -}; -``` - -### Re-evaluation UI Pattern -```vue - - - - -``` \ No newline at end of file diff --git a/src/ts/role_play/ui/src/components/VoiceTranscript.vue b/src/ts/role_play/ui/src/components/VoiceTranscript.vue new file mode 100644 index 0000000..b3307f8 --- /dev/null +++ b/src/ts/role_play/ui/src/components/VoiceTranscript.vue @@ -0,0 +1,502 @@ + + + + + \ No newline at end of file diff --git a/src/ts/role_play/ui/src/composables/useTranscriptBuffer.ts b/src/ts/role_play/ui/src/composables/useTranscriptBuffer.ts new file mode 100644 index 0000000..fdbeb97 --- /dev/null +++ b/src/ts/role_play/ui/src/composables/useTranscriptBuffer.ts @@ -0,0 +1,199 @@ +/** + * Composable for managing transcript buffering on the frontend. + * Mirrors the backend transcript management logic for consistent UX. + */ + +import { ref, computed } from 'vue' +import type { TranscriptMessage, PartialTranscript, FinalTranscript } from '../types/voice' + +interface TranscriptBufferConfig { + stabilityThreshold?: number + maxPartialAge?: number // milliseconds +} + +export function useTranscriptBuffer(config: TranscriptBufferConfig = {}) { + const { + stabilityThreshold = 0.8, + maxPartialAge = 5000 // 5 seconds + } = config + + // State + const finalMessages = ref([]) + const partialMessage = ref(null) + const messageCounter = ref(0) + + // Computed + const hasMessages = computed(() => finalMessages.value.length > 0) + const displayText = computed(() => { + const finalText = finalMessages.value.map(m => m.text).join(' ') + const partialText = partialMessage.value?.text || '' + return [finalText, partialText].filter(Boolean).join(' ') + }) + + // Methods + const addPartialTranscript = (partial: PartialTranscript) => { + // Update partial message for live display + partialMessage.value = { + ...partial, + timestamp: partial.timestamp || new Date().toISOString() + } + + // Clean up old partial if it's been too long + if (partialMessage.value) { + const age = Date.now() - new Date(partialMessage.value.timestamp).getTime() + if (age > maxPartialAge) { + partialMessage.value = null + } + } + } + + const addFinalTranscript = (final: FinalTranscript) => { + // Clear any partial message for this role + if (partialMessage.value && partialMessage.value.role === final.role) { + partialMessage.value = null + } + + // Add to final messages + const message: TranscriptMessage = { + id: `msg-${Date.now()}-${messageCounter.value++}`, + text: final.text, + role: final.role, + timestamp: final.timestamp || new Date().toISOString(), + isVoice: true, + duration: final.duration_ms, + confidence: final.confidence, + metadata: final.metadata || {} + } + + finalMessages.value.push(message) + + // Keep only recent messages to prevent memory bloat + if (finalMessages.value.length > 100) { + finalMessages.value = finalMessages.value.slice(-80) // Keep last 80 messages + } + } + + const addTextMessage = (text: string, role: 'user' | 'assistant') => { + const message: TranscriptMessage = { + id: `text-${Date.now()}-${messageCounter.value++}`, + text, + role, + timestamp: new Date().toISOString(), + isVoice: false + } + + finalMessages.value.push(message) + } + + const updatePartialStability = (stability: number) => { + if (partialMessage.value) { + partialMessage.value.stability = stability + } + } + + const clear = () => { + finalMessages.value = [] + partialMessage.value = null + messageCounter.value = 0 + } + + const getMessageById = (id: string): TranscriptMessage | undefined => { + return finalMessages.value.find(m => m.id === id) + } + + const getMessagesInRange = (startTime: string, endTime: string): TranscriptMessage[] => { + const start = new Date(startTime).getTime() + const end = new Date(endTime).getTime() + + return finalMessages.value.filter(message => { + const msgTime = new Date(message.timestamp).getTime() + return msgTime >= start && msgTime <= end + }) + } + + const exportTranscript = (): string => { + const lines = finalMessages.value.map(message => { + const timestamp = new Date(message.timestamp).toLocaleTimeString() + const roleLabel = message.role === 'user' ? 'You' : 'Character' + const voiceLabel = message.isVoice ? ' [Voice]' : '' + const durationLabel = message.duration ? ` (${(message.duration / 1000).toFixed(1)}s)` : '' + + return `[${timestamp}] ${roleLabel}${voiceLabel}${durationLabel}: ${message.text}` + }) + + return lines.join('\n') + } + + const getStatistics = () => { + const totalMessages = finalMessages.value.length + const voiceMessages = finalMessages.value.filter(m => m.isVoice).length + const textMessages = totalMessages - voiceMessages + const averageConfidence = finalMessages.value + .filter(m => m.confidence !== undefined) + .reduce((sum, m) => sum + (m.confidence || 0), 0) / voiceMessages || 0 + + const totalDuration = finalMessages.value + .filter(m => m.duration !== undefined) + .reduce((sum, m) => sum + (m.duration || 0), 0) + + return { + totalMessages, + voiceMessages, + textMessages, + averageConfidence: Math.round(averageConfidence * 100) / 100, + totalDurationMs: totalDuration, + totalDurationSeconds: Math.round(totalDuration / 1000 * 10) / 10 + } + } + + // Auto-cleanup for old partial messages + let partialCleanupInterval: NodeJS.Timeout | null = null + + const startPartialCleanup = () => { + if (partialCleanupInterval) return + + partialCleanupInterval = setInterval(() => { + if (partialMessage.value) { + const age = Date.now() - new Date(partialMessage.value.timestamp).getTime() + if (age > maxPartialAge) { + partialMessage.value = null + } + } + }, 1000) // Check every second + } + + const stopPartialCleanup = () => { + if (partialCleanupInterval) { + clearInterval(partialCleanupInterval) + partialCleanupInterval = null + } + } + + // Start cleanup on initialization + startPartialCleanup() + + return { + // State + finalMessages: readonly(finalMessages), + partialMessage: readonly(partialMessage), + + // Computed + hasMessages, + displayText, + + // Methods + addPartialTranscript, + addFinalTranscript, + addTextMessage, + updatePartialStability, + clear, + getMessageById, + getMessagesInRange, + exportTranscript, + getStatistics, + + // Lifecycle + startPartialCleanup, + stopPartialCleanup + } +} \ No newline at end of file diff --git a/src/ts/role_play/ui/src/composables/useVoiceWebSocket.ts b/src/ts/role_play/ui/src/composables/useVoiceWebSocket.ts new file mode 100644 index 0000000..482678e --- /dev/null +++ b/src/ts/role_play/ui/src/composables/useVoiceWebSocket.ts @@ -0,0 +1,444 @@ +/** + * Composable for managing voice WebSocket connections with audio streaming. + */ + +import { ref, onUnmounted } from 'vue' +import type { + VoiceStatus, + PartialTranscript, + FinalTranscript, + VoiceConfig, + AudioChunk, + TurnStatus +} from '../types/voice' + +interface VoiceWebSocketConfig { + sessionId: string + token: string + onPartialTranscript?: (transcript: PartialTranscript) => void + onFinalTranscript?: (transcript: FinalTranscript) => void + onAudioChunk?: (chunk: AudioChunk) => void + onTurnStatus?: (status: TurnStatus) => void + onStatusChange?: (status: VoiceStatus) => void + onError?: (error: string) => void +} + +export function useVoiceWebSocket(config: VoiceWebSocketConfig) { + // State + const isConnected = ref(false) + const isConnecting = ref(false) + const isRecording = ref(false) + const canRecord = ref(false) + const connectionStatus = ref(null) + const voiceConfig = ref(null) + + // WebSocket and audio references + let websocket: WebSocket | null = null + let mediaRecorder: MediaRecorder | null = null + let audioContext: AudioContext | null = null + let audioWorkletNode: AudioWorkletNode | null = null + let audioStream: MediaStream | null = null + let audioQueue: Float32Array[] = [] + let isPlaying = ref(false) + + // Audio configuration + const SAMPLE_RATE = 16000 + const CHANNELS = 1 + const CHUNK_SIZE = 1600 // 100ms at 16kHz + + // WebSocket connection + const connect = async (): Promise => { + if (isConnected.value || isConnecting.value) { + return + } + + isConnecting.value = true + connectionStatus.value = { + type: 'connecting', + message: 'Connecting to voice service...', + timestamp: new Date().toISOString() + } + + try { + const protocol = window.location.protocol === 'https:' ? 'wss:' : 'ws:' + const host = window.location.host + const wsUrl = `${protocol}//${host}/api/voice/ws/${config.sessionId}?token=${config.token}` + + websocket = new WebSocket(wsUrl) + + websocket.onopen = () => { + isConnected.value = true + isConnecting.value = false + connectionStatus.value = { + type: 'connected', + message: 'Connected to voice service', + timestamp: new Date().toISOString() + } + config.onStatusChange?.(connectionStatus.value) + } + + websocket.onmessage = async (event) => { + try { + const message = JSON.parse(event.data) + await handleWebSocketMessage(message) + } catch (error) { + console.error('Error parsing WebSocket message:', error) + } + } + + websocket.onclose = (event) => { + isConnected.value = false + isConnecting.value = false + isRecording.value = false + canRecord.value = false + + connectionStatus.value = { + type: 'disconnected', + message: `Connection closed: ${event.reason || 'Unknown reason'}`, + timestamp: new Date().toISOString() + } + config.onStatusChange?.(connectionStatus.value) + + cleanup() + } + + websocket.onerror = (error) => { + console.error('WebSocket error:', error) + connectionStatus.value = { + type: 'error', + message: 'Connection error occurred', + timestamp: new Date().toISOString() + } + config.onError?.('WebSocket connection failed') + config.onStatusChange?.(connectionStatus.value) + } + + } catch (error) { + isConnecting.value = false + connectionStatus.value = { + type: 'error', + message: `Failed to connect: ${error}`, + timestamp: new Date().toISOString() + } + config.onError?.(`Connection failed: ${error}`) + config.onStatusChange?.(connectionStatus.value) + throw error + } + } + + // Handle incoming WebSocket messages + const handleWebSocketMessage = async (message: any) => { + switch (message.type) { + case 'config': + voiceConfig.value = message as VoiceConfig + await initializeAudio() + break + + case 'status': + if (message.status === 'ready') { + canRecord.value = true + } + connectionStatus.value = { + type: message.status, + message: message.message, + timestamp: message.timestamp + } + config.onStatusChange?.(connectionStatus.value) + break + + case 'transcript_partial': + config.onPartialTranscript?.(message as PartialTranscript) + break + + case 'transcript_final': + config.onFinalTranscript?.(message as FinalTranscript) + break + + case 'audio': + await playAudioChunk(message as AudioChunk) + config.onAudioChunk?.(message as AudioChunk) + break + + case 'turn_status': + config.onTurnStatus?.(message as TurnStatus) + break + + case 'error': + config.onError?.(message.error) + connectionStatus.value = { + type: 'error', + message: message.error, + timestamp: message.timestamp + } + config.onStatusChange?.(connectionStatus.value) + break + } + } + + // Initialize audio context and worklet + const initializeAudio = async () => { + try { + // Initialize AudioContext for playback + audioContext = new AudioContext({ sampleRate: SAMPLE_RATE }) + + // Resume context if suspended (required for user interaction) + if (audioContext.state === 'suspended') { + await audioContext.resume() + } + + console.log('Audio initialized:', { + sampleRate: audioContext.sampleRate, + state: audioContext.state + }) + + } catch (error) { + console.error('Failed to initialize audio:', error) + config.onError?.(`Audio initialization failed: ${error}`) + } + } + + // Start audio recording + const startRecording = async (): Promise => { + if (!canRecord.value || isRecording.value) { + return + } + + try { + // Request microphone access + audioStream = await navigator.mediaDevices.getUserMedia({ + audio: { + sampleRate: SAMPLE_RATE, + channelCount: CHANNELS, + echoCancellation: true, + noiseSuppression: true, + autoGainControl: true + } + }) + + // Create MediaRecorder for capturing audio + const options = { + mimeType: 'audio/webm;codecs=opus', // Fallback to available format + audioBitsPerSecond: 16000 + } + + // Find a supported MIME type + const supportedTypes = [ + 'audio/webm;codecs=opus', + 'audio/webm', + 'audio/mp4', + 'audio/wav' + ] + + const mimeType = supportedTypes.find(type => MediaRecorder.isTypeSupported(type)) + if (mimeType) { + options.mimeType = mimeType + } + + mediaRecorder = new MediaRecorder(audioStream, options) + let audioChunks: Blob[] = [] + + mediaRecorder.ondataavailable = (event) => { + if (event.data.size > 0) { + audioChunks.push(event.data) + } + } + + mediaRecorder.onstop = async () => { + if (audioChunks.length > 0) { + const audioBlob = new Blob(audioChunks, { type: options.mimeType }) + await sendAudioBlob(audioBlob) + audioChunks = [] + } + } + + // Start recording in chunks + mediaRecorder.start(100) // Record in 100ms chunks + isRecording.value = true + + console.log('Recording started') + + } catch (error) { + console.error('Failed to start recording:', error) + config.onError?.(`Recording failed: ${error}`) + throw error + } + } + + // Stop audio recording + const stopRecording = async (): Promise => { + if (!isRecording.value || !mediaRecorder) { + return + } + + try { + mediaRecorder.stop() + isRecording.value = false + + // Stop all tracks to release microphone + if (audioStream) { + audioStream.getTracks().forEach(track => track.stop()) + audioStream = null + } + + console.log('Recording stopped') + + } catch (error) { + console.error('Failed to stop recording:', error) + config.onError?.(`Failed to stop recording: ${error}`) + } + } + + // Convert audio blob to PCM and send + const sendAudioBlob = async (blob: Blob): Promise => { + if (!websocket || websocket.readyState !== WebSocket.OPEN) { + return + } + + try { + // Convert blob to array buffer + const arrayBuffer = await blob.arrayBuffer() + + // For now, send the raw audio data + // In production, you'd want to convert to PCM format + const base64Data = btoa(String.fromCharCode(...new Uint8Array(arrayBuffer))) + + const audioMessage = { + mime_type: 'audio/pcm', + data: base64Data, + end_session: false + } + + websocket.send(JSON.stringify(audioMessage)) + + } catch (error) { + console.error('Failed to send audio:', error) + config.onError?.(`Failed to send audio: ${error}`) + } + } + + // Send text message + const sendTextMessage = async (text: string): Promise => { + if (!websocket || websocket.readyState !== WebSocket.OPEN) { + throw new Error('WebSocket not connected') + } + + try { + const textMessage = { + mime_type: 'text/plain', + data: btoa(text), + end_session: false + } + + websocket.send(JSON.stringify(textMessage)) + + } catch (error) { + console.error('Failed to send text:', error) + config.onError?.(`Failed to send text: ${error}`) + throw error + } + } + + // Play audio chunk + const playAudioChunk = async (audioChunk: AudioChunk): Promise => { + if (!audioContext) { + return + } + + try { + // Decode base64 audio data + const audioData = atob(audioChunk.data) + const audioBuffer = new ArrayBuffer(audioData.length) + const audioView = new Uint8Array(audioBuffer) + + for (let i = 0; i < audioData.length; i++) { + audioView[i] = audioData.charCodeAt(i) + } + + // Decode audio buffer + const decodedBuffer = await audioContext.decodeAudioData(audioBuffer) + + // Create buffer source and play + const source = audioContext.createBufferSource() + source.buffer = decodedBuffer + source.connect(audioContext.destination) + source.start() + + } catch (error) { + console.error('Failed to play audio chunk:', error) + // Don't throw here, just log the error to avoid breaking the flow + } + } + + // Disconnect WebSocket + const disconnect = async (): Promise => { + if (isRecording.value) { + await stopRecording() + } + + if (websocket) { + // Send end session message + try { + if (websocket.readyState === WebSocket.OPEN) { + const endMessage = { + mime_type: 'text/plain', + data: '', + end_session: true + } + websocket.send(JSON.stringify(endMessage)) + } + } catch (error) { + console.error('Failed to send end session message:', error) + } + + websocket.close() + websocket = null + } + + cleanup() + } + + // Cleanup resources + const cleanup = () => { + isConnected.value = false + isConnecting.value = false + isRecording.value = false + canRecord.value = false + + if (audioStream) { + audioStream.getTracks().forEach(track => track.stop()) + audioStream = null + } + + if (audioContext) { + audioContext.close() + audioContext = null + } + + mediaRecorder = null + audioWorkletNode = null + audioQueue = [] + } + + // Cleanup on unmount + onUnmounted(() => { + disconnect() + }) + + return { + // State + isConnected: readonly(isConnected), + isConnecting: readonly(isConnecting), + isRecording: readonly(isRecording), + canRecord: readonly(canRecord), + connectionStatus: readonly(connectionStatus), + voiceConfig: readonly(voiceConfig), + isPlaying: readonly(isPlaying), + + // Methods + connect, + disconnect, + startRecording, + stopRecording, + sendTextMessage + } +} \ No newline at end of file diff --git a/src/ts/role_play/ui/src/locales/en.json b/src/ts/role_play/ui/src/locales/en.json index 7fe156a..8be9249 100644 --- a/src/ts/role_play/ui/src/locales/en.json +++ b/src/ts/role_play/ui/src/locales/en.json @@ -39,7 +39,27 @@ "continueExistingSession": "View Previous Sessions:", "deleteSession": "Delete session", "confirmDeleteSession": "Are you sure you want to delete this session? This action cannot be undone.", - "confirmEndSession": "Are you sure you want to end this session? You will no longer be able to send messages." + "confirmEndSession": "Are you sure you want to end this session? You will no longer be able to send messages.", + "voice": { + "connect": "Start Voice Chat", + "connecting": "Connecting...", + "disconnect": "End Voice Chat", + "startRecording": "Start Talking", + "stop": "Stop Talking", + "send": "Send", + "textFallback": "Type your message...", + "speaking": "Speaking...", + "processing": "Processing...", + "stability": "Transcript Stability", + "you": "You", + "character": "Character", + "transcriptPlaceholder": "Your conversation will appear here...", + "permissionDenied": "Microphone permission denied", + "notSupported": "Voice chat not supported", + "connectionError": "Connection error", + "recordingError": "Recording error", + "playbackError": "Playback error" + } }, "warnings": { "languageSwitch": "Switching language will hide scenarios in the current language until you switch back. Continue?", diff --git a/src/ts/role_play/ui/src/locales/zh-TW.json b/src/ts/role_play/ui/src/locales/zh-TW.json index 9704158..aa5e019 100644 --- a/src/ts/role_play/ui/src/locales/zh-TW.json +++ b/src/ts/role_play/ui/src/locales/zh-TW.json @@ -39,7 +39,27 @@ "continueExistingSession": "查看先前對話:", "deleteSession": "刪除對話", "confirmDeleteSession": "確定要刪除此對話嗎?此操作無法復原。", - "confirmEndSession": "確定要結束此對話嗎?結束後將無法再發送訊息。" + "confirmEndSession": "確定要結束此對話嗎?結束後將無法再發送訊息。", + "voice": { + "connect": "開始語音對話", + "connecting": "連接中...", + "disconnect": "結束語音對話", + "startRecording": "開始說話", + "stop": "停止說話", + "send": "發送", + "textFallback": "輸入您的訊息...", + "speaking": "說話中...", + "processing": "處理中...", + "stability": "轉錄穩定度", + "you": "您", + "character": "角色", + "transcriptPlaceholder": "您的對話記錄將顯示在這裡...", + "permissionDenied": "麥克風權限被拒絕", + "notSupported": "不支援語音對話", + "connectionError": "連接錯誤", + "recordingError": "錄音錯誤", + "playbackError": "播放錯誤" + } }, "warnings": { "languageSwitch": "切換語言將隱藏目前語言的情境,直到您切換回來。是否繼續?", diff --git a/src/ts/role_play/ui/src/types/voice.ts b/src/ts/role_play/ui/src/types/voice.ts new file mode 100644 index 0000000..221ccb9 --- /dev/null +++ b/src/ts/role_play/ui/src/types/voice.ts @@ -0,0 +1,157 @@ +/** + * TypeScript type definitions for voice chat functionality. + */ + +export interface VoiceConfig { + type: 'config' + audio_format: string + sample_rate: number + channels: number + bit_depth: number + language: string + voice_name: string + output_audio_format: string +} + +export interface VoiceStatus { + type: 'status' | 'connected' | 'ready' | 'error' | 'disconnected' | 'connecting' + message: string + timestamp: string +} + +export interface PartialTranscript { + type: 'transcript_partial' + text: string + role: 'user' | 'assistant' + stability: number + timestamp: string +} + +export interface FinalTranscript { + type: 'transcript_final' + text: string + role: 'user' | 'assistant' + duration_ms: number + confidence: number + metadata: Record + timestamp: string +} + +export interface AudioChunk { + type: 'audio' + data: string // base64 encoded audio + mime_type: string + sequence?: number + timestamp: string +} + +export interface TurnStatus { + type: 'turn_status' + turn_complete: boolean + interrupted: boolean + timestamp: string +} + +export interface VoiceError { + type: 'error' + error: string + code?: string + timestamp: string +} + +export interface TranscriptMessage { + id: string + text: string + role: 'user' | 'assistant' + timestamp: string + isVoice: boolean + duration?: number + confidence?: number + metadata?: Record +} + +export interface VoiceSessionInfo { + session_id: string + user_id: string + character_id?: string + scenario_id?: string + language: string + started_at?: string + transcript_available: boolean +} + +export interface VoiceSessionStats { + session_id: string + started_at: string + ended_at?: string + duration_ms?: number + audio_chunks_sent: number + audio_chunks_received: number + transcripts_processed: number + total_utterances: number + total_partials: number + errors: number +} + +export interface VoiceClientRequest { + mime_type: string + data: string // base64 encoded + end_session: boolean +} + +// Union types for WebSocket messages +export type VoiceServerMessage = + | VoiceConfig + | VoiceStatus + | PartialTranscript + | FinalTranscript + | AudioChunk + | TurnStatus + | VoiceError + +export type VoiceClientMessage = VoiceClientRequest + +// Audio processing types +export interface AudioBufferInfo { + sampleRate: number + channels: number + length: number + duration: number +} + +export interface AudioProcessingOptions { + sampleRate?: number + channels?: number + bitDepth?: number + chunkSize?: number + enableEchoCancellation?: boolean + enableNoiseSuppression?: boolean + enableAutoGainControl?: boolean +} + +// Transcript buffer configuration +export interface TranscriptBufferConfig { + stabilityThreshold?: number + finalizationTimeout?: number + minUtteranceLength?: number + maxPartialAge?: number +} + +// Voice chat statistics +export interface VoiceChatStatistics { + totalMessages: number + voiceMessages: number + textMessages: number + averageConfidence: number + totalDurationMs: number + totalDurationSeconds: number +} + +// WebSocket connection states +export type WebSocketState = 'connecting' | 'connected' | 'disconnecting' | 'disconnected' | 'error' + +// Audio recording states +export type RecordingState = 'idle' | 'starting' | 'recording' | 'stopping' | 'error' + +// Audio playback states +export type PlaybackState = 'idle' | 'playing' | 'paused' | 'buffering' | 'error' \ No newline at end of file diff --git a/test/python/unit/voice/__init__.py b/test/python/unit/voice/__init__.py new file mode 100644 index 0000000..76baec7 --- /dev/null +++ b/test/python/unit/voice/__init__.py @@ -0,0 +1 @@ +"""Unit tests for voice chat functionality.""" \ No newline at end of file diff --git a/test/python/unit/voice/test_voice_handler.py b/test/python/unit/voice/test_voice_handler.py new file mode 100644 index 0000000..2b66d6a --- /dev/null +++ b/test/python/unit/voice/test_voice_handler.py @@ -0,0 +1,458 @@ +"""Tests for the simplified voice chat handler.""" + +import pytest +import asyncio +import json +import base64 +from unittest.mock import Mock, AsyncMock, patch, MagicMock +from fastapi.testclient import TestClient +from fastapi import WebSocket, WebSocketDisconnect + +from src.python.role_play.voice.handler import VoiceChatHandler +from src.python.role_play.voice.models import VoiceRequest, VoiceMessage +from src.python.role_play.common.models import User, UserRole + + +class MockWebSocket: + """Mock WebSocket for testing.""" + + def __init__(self): + self.accepted = False + self.closed = False + self.close_code = None + self.close_reason = None + self.sent_messages = [] + self.query_params = {} + + async def accept(self): + self.accepted = True + + async def close(self, code=None, reason=None): + self.closed = True + self.close_code = code + self.close_reason = reason + + async def send_json(self, data): + self.sent_messages.append(data) + + async def receive_text(self): + # Mock receiving client requests + return json.dumps({ + "mime_type": "text/plain", + "data": "dGVzdCBtZXNzYWdl", # "test message" in base64 + "end_session": False + }) + + +class TestVoiceChatHandler: + """Test cases for simplified VoiceChatHandler.""" + + @pytest.fixture + def handler(self): + """Create a voice chat handler for testing.""" + return VoiceChatHandler() + + @pytest.fixture + def mock_user(self): + """Create a mock user.""" + from datetime import datetime, timezone + return User( + id="user123", + username="testuser", + email="test@example.com", + role=UserRole.USER, + preferred_language="en", + created_at=datetime.now(timezone.utc), + updated_at=datetime.now(timezone.utc) + ) + + @pytest.fixture + def mock_websocket(self): + """Create a mock WebSocket.""" + ws = MockWebSocket() + ws.query_params = {"token": "valid_token"} + return ws + + def test_handler_initialization(self, handler): + """Test handler initializes correctly.""" + assert handler.prefix == "/voice" + assert handler.active_sessions is not None + assert handler.router is not None + + def test_router_endpoints(self, handler): + """Test that WebSocket endpoint is registered.""" + router = handler.router + routes = [route.path for route in router.routes] + assert "/ws/{session_id}" in routes + + @patch('src.python.role_play.voice.handler.get_storage_backend') + @patch('src.python.role_play.voice.handler.get_auth_manager') + async def test_jwt_validation_success( + self, + mock_get_auth_manager, + mock_get_storage, + handler, + mock_user + ): + """Test successful JWT token validation.""" + # Mock storage backend + mock_storage = AsyncMock() + mock_storage.get_user.return_value = mock_user + mock_get_storage.return_value = mock_storage + + # Mock auth manager + mock_auth_manager = Mock() + mock_token_data = Mock(user_id="user123") + mock_auth_manager.verify_token.return_value = mock_token_data + mock_get_auth_manager.return_value = mock_auth_manager + + result = await handler._validate_jwt_token("valid_token") + + assert result == mock_user + mock_auth_manager.verify_token.assert_called_once_with("valid_token") + mock_storage.get_user.assert_called_once_with("user123") + + async def test_jwt_validation_failure(self, handler): + """Test JWT token validation failure.""" + with patch('src.python.role_play.voice.handler.get_auth_manager') as mock_get_auth: + mock_auth_manager = Mock() + mock_auth_manager.verify_token.side_effect = Exception("Invalid token") + mock_get_auth.return_value = mock_auth_manager + + result = await handler._validate_jwt_token("invalid_token") + + assert result is None + + @patch('src.python.role_play.voice.handler.get_storage_backend') + @patch('src.python.role_play.voice.handler.get_chat_logger') + @patch('src.python.role_play.voice.handler.get_adk_session_service') + async def test_session_validation_success( + self, + mock_adk_service, + mock_chat_logger, + mock_storage, + handler + ): + """Test successful session validation.""" + # Mock ADK session + mock_adk_session = Mock() + mock_adk_session.state = { + "character_id": "char123", + "scenario_id": "scenario123" + } + + mock_adk_service_instance = AsyncMock() + mock_adk_service_instance.get_session.return_value = mock_adk_session + mock_adk_service.return_value = mock_adk_service_instance + + mock_chat_logger_instance = AsyncMock() + mock_chat_logger.return_value = mock_chat_logger_instance + + result = await handler._validate_session( + "session123", + "user123", + mock_adk_service_instance, + mock_chat_logger_instance + ) + + assert result == mock_adk_session + + async def test_websocket_missing_token(self, handler): + """Test WebSocket connection without token.""" + ws = MockWebSocket() + ws.query_params = {} # No token + + await handler.handle_voice_session(ws, "session123", None) + + assert ws.closed + assert ws.close_code == 1008 + + @patch('src.python.role_play.voice.handler.VoiceChatHandler._validate_jwt_token') + async def test_websocket_invalid_token(self, mock_validate_jwt, handler): + """Test WebSocket connection with invalid token.""" + mock_validate_jwt.return_value = None # Invalid token + + ws = MockWebSocket() + ws.query_params = {"token": "invalid_token"} + + await handler.handle_voice_session(ws, "session123", "invalid_token") + + assert ws.closed + assert ws.close_code == 1008 + + def test_voice_request_validation(self): + """Test VoiceRequest model validation.""" + # Valid request + request = VoiceRequest( + mime_type="audio/pcm", + data="dGVzdCBhdWRpbw==", # base64 encoded + end_session=False + ) + + assert request.mime_type == "audio/pcm" + assert request.data == "dGVzdCBhdWRpbw==" + assert request.end_session is False + + # Test data decoding + decoded = request.decode_data() + assert isinstance(decoded, bytes) + + def test_voice_request_text_decoding(self): + """Test VoiceRequest text data decoding.""" + request = VoiceRequest( + mime_type="text/plain", + data="dGVzdCB0ZXh0", # "test text" in base64 + end_session=False + ) + + decoded = request.decode_data() + assert decoded == "test text" + assert isinstance(decoded, str) + + def test_voice_message_creation(self): + """Test VoiceMessage creation with extra fields.""" + message = VoiceMessage( + type="status", + timestamp="2025-01-14T10:30:00Z", + status="ready", # Extra field allowed + message="Session ready" # Extra field allowed + ) + + assert message.type == "status" + assert message.timestamp == "2025-01-14T10:30:00Z" + # Extra fields should be preserved due to Config.extra = "allow" + assert hasattr(message, "__pydantic_extra__") or message.dict()["status"] == "ready" + + def test_adk_event_processing(self, handler): + """Test direct ADK event processing.""" + stats = {"transcripts_processed": 0} + + # Mock transcript event with only the attributes we want + mock_event = Mock(spec=['input_transcription']) + mock_event.input_transcription = Mock() + mock_event.input_transcription.text = "Hello world" + mock_event.input_transcription.is_final = True + mock_event.input_transcription.confidence = 0.95 + + result = handler._process_adk_event(mock_event, stats) + + assert result["type"] == "transcript_final" + assert result["text"] == "Hello world" + assert result["role"] == "user" + assert result["confidence"] == 0.95 + assert stats["transcripts_processed"] == 1 + + @pytest.mark.asyncio + async def test_adk_initialization(self, handler, mock_user): + """Test ADK components initialization.""" + # Mock ADK session + mock_adk_session = Mock() + mock_adk_session.state = { + "character_id": "char123", + "scenario_id": "scenario123", + "script_data": None + } + + with patch('src.python.role_play.voice.handler.get_production_agent') as mock_agent, \ + patch('src.python.role_play.voice.handler.Runner') as mock_runner, \ + patch('src.python.role_play.voice.handler.LiveRequestQueue') as mock_queue: + + # Mock agent creation + mock_agent_instance = Mock() + mock_agent.return_value = mock_agent_instance + + # Mock runner + mock_runner_instance = Mock() + mock_runner.return_value = mock_runner_instance + mock_runner_instance.run_live.return_value = AsyncMock() + + # Mock queue + mock_queue_instance = Mock() + mock_queue.return_value = mock_queue_instance + + result = await handler._initialize_adk("session123", mock_user, mock_adk_session) + + assert result["session_id"] == "session123" + assert result["user_id"] == mock_user.id + assert result["active"] is True + assert "stats" in result + assert result["stats"]["audio_chunks_sent"] == 0 + + +class TestVoiceHandlerIntegration: + """Integration tests for simplified voice handler.""" + + @pytest.fixture + def app_with_voice_handler(self): + """Create FastAPI app with voice handler for testing.""" + from fastapi import FastAPI + app = FastAPI() + handler = VoiceChatHandler() + app.include_router(handler.router, prefix=handler.prefix) + return app + + def test_voice_handler_routes_registered(self, app_with_voice_handler): + """Test that voice handler routes are properly registered.""" + # Get the router from the app + voice_router = None + for route in app_with_voice_handler.routes: + if hasattr(route, 'path') and route.path.startswith('/voice'): + voice_router = route + break + + assert voice_router is not None + + +class TestVoiceValidationAndLimits: + """Test validation and security limits.""" + + @pytest.fixture + def handler(self): + return VoiceChatHandler() + + @pytest.fixture + def mock_user(self): + """Create a mock user.""" + from datetime import datetime, timezone + return User( + id="user123", + username="testuser", + email="test@example.com", + role=UserRole.USER, + preferred_language="en", + created_at=datetime.now(timezone.utc), + updated_at=datetime.now(timezone.utc) + ) + + def test_audio_chunk_size_validation(self): + """Test audio chunk size limit validation.""" + from src.python.role_play.voice.config import VoiceConfig + + # Create oversized audio data + large_audio = b"x" * (VoiceConfig.MAX_AUDIO_CHUNK_SIZE + 1) + large_audio_b64 = base64.b64encode(large_audio).decode() + + request = VoiceRequest( + mime_type="audio/pcm", + data=large_audio_b64, + end_session=False + ) + + # Should raise ValueError on decode + with pytest.raises(ValueError, match="Audio chunk too large"): + request.decode_data() + + def test_text_size_validation(self): + """Test text size limit validation.""" + from src.python.role_play.voice.config import VoiceConfig + + # Create oversized text data + large_text = "x" * (VoiceConfig.MAX_TEXT_SIZE + 1) + large_text_b64 = base64.b64encode(large_text.encode()).decode() + + request = VoiceRequest( + mime_type="text/plain", + data=large_text_b64, + end_session=False + ) + + # Should raise ValueError on decode + with pytest.raises(ValueError, match="Text too large"): + request.decode_data() + + def test_invalid_mime_type_validation(self): + """Test unsupported MIME type validation.""" + with pytest.raises(ValueError, match="Unsupported MIME type"): + VoiceRequest( + mime_type="video/mp4", # Unsupported + data="dGVzdA==", + end_session=False + ) + + def test_malformed_base64_data(self): + """Test malformed base64 data handling.""" + request = VoiceRequest( + mime_type="text/plain", + data="invalid_base64!!!", # Malformed base64 + end_session=False + ) + + with pytest.raises(ValueError, match="Invalid base64 data"): + request.decode_data() + + def test_session_limit_per_user(self, handler): + """Test session limit enforcement per user.""" + from src.python.role_play.voice.config import VoiceConfig + + # Create max allowed sessions for user + for i in range(VoiceConfig.MAX_SESSIONS_PER_USER): + handler.active_sessions[f"session_{i}"] = {"user_id": "user123"} + + # Should still allow for this user + assert handler._check_session_limit("user123") is False + + # Should allow for different user + assert handler._check_session_limit("user456") is True + + # Remove one session + del handler.active_sessions["session_0"] + assert handler._check_session_limit("user123") is True + + @pytest.mark.asyncio + async def test_connection_error_cleanup(self, handler): + """Test connection error cleanup.""" + # Setup mock session + mock_adk = { + "session_id": "test_session", + "user_id": "user123", + "active": True, + "live_request_queue": Mock(), + "stats": {} + } + handler.active_sessions["test_session"] = mock_adk + + with patch.object(handler, '_cleanup_adk') as mock_cleanup: + mock_cleanup.return_value = {"stats": "test"} + + # Test cleanup + await handler._handle_connection_error("test_session") + + # Should call cleanup and remove session + mock_cleanup.assert_called_once_with(mock_adk) + assert "test_session" not in handler.active_sessions + + @pytest.mark.asyncio + async def test_websocket_disconnect_during_streaming(self, handler): + """Test WebSocket disconnect during active streaming.""" + mock_adk = { + "session_id": "test_session", + "active": True, + "stats": {"errors": 0} + } + + # Mock WebSocket that raises disconnect + mock_websocket = AsyncMock() + mock_websocket.receive_text.side_effect = WebSocketDisconnect() + + # Should handle disconnect gracefully + await handler._receive_from_client(mock_websocket, mock_adk) + + # Session should be marked inactive + assert mock_adk["active"] is False + + @pytest.mark.asyncio + async def test_adk_initialization_failure(self, handler, mock_user): + """Test ADK initialization failure handling.""" + mock_adk_session = Mock() + mock_adk_session.state = {"character_id": "char123", "scenario_id": "scenario123"} + + with patch('src.python.role_play.voice.handler.get_production_agent') as mock_agent: + # Mock agent creation failure + mock_agent.return_value = None + + with pytest.raises(ValueError, match="Failed to create roleplay agent"): + await handler._initialize_adk("session123", mock_user, mock_adk_session) + + +if __name__ == "__main__": + pytest.main([__file__]) \ No newline at end of file diff --git a/test/voice/README.md b/test/voice/README.md new file mode 100644 index 0000000..c604dc4 --- /dev/null +++ b/test/voice/README.md @@ -0,0 +1,406 @@ +# Voice Backend Testing Suite + +This directory contains comprehensive testing tools for the voice backend, allowing developers to test voice functionality without launching the full frontend. + +## 🎯 Overview + +The voice testing suite provides three approaches to testing: + +1. **🚀 Quick Setup + Interactive Testing** - Generate HTML page with credentials +2. **🤖 Automated Testing** - Comprehensive backend functionality verification +3. **📊 Manual Testing** - Direct WebSocket testing with existing tools + +## 📁 Files + +| File | Purpose | Usage | +|------|---------|-------| +| `setup_voice_test.py` | Creates test session and HTML page | Interactive testing | +| `test_voice_backend.py` | Automated test suite | CI/CD and validation | +| `voice_test_template.html` | HTML template for interactive testing | Browser-based testing | +| `README.md` | This documentation | Reference | + +## 🚀 Quick Start + +### 1. Setup Interactive Testing + +```bash +# Start the backend server first +python src/python/run_server.py + +# In another terminal, create a test session +python test/voice/setup_voice_test.py + +# Output: +# ✅ Session created: session_abc123 +# ✅ Test page generated: test/voice/test_session.html +# +# Open in browser: file:///path/to/test/voice/test_session.html +``` + +### 2. Open Test Page + +**Option A: Direct file access** +```bash +# Copy the file path from setup output and open in browser +open test/voice/test_session.html # macOS +xdg-open test/voice/test_session.html # Linux +``` + +**Option B: Local HTTP server** +```bash +cd test/voice/ +python -m http.server 8080 +# Open: http://localhost:8080/test_session.html +``` + +### 3. Test Voice Functionality + +1. Click **"🔗 Connect to Voice"** - Should show "Connected" status +2. **Text Testing**: Type a message and click "📝 Send Text" +3. **Voice Testing**: Hold "🎤 Push to Talk" and speak +4. **Monitor**: Watch transcript and debug log for real-time feedback + +## 🤖 Automated Testing + +Run the comprehensive test suite: + +```bash +# Basic test run +python test/voice/test_voice_backend.py + +# With custom credentials +python test/voice/test_voice_backend.py --user admin@example.com --password secret + +# Verbose output for debugging +python test/voice/test_voice_backend.py --verbose +``` + +### Test Coverage + +The automated test verifies: + +- ✅ **Authentication** - Login and JWT token validation +- ✅ **Session Creation** - Chat session setup with scenario/character +- ✅ **WebSocket Connection** - Voice WebSocket establishment +- ✅ **Text Messaging** - Send text, receive transcript and audio +- ✅ **Audio Simulation** - Send PCM audio data, verify processing +- ✅ **Graceful Disconnect** - Clean session termination +- ✅ **Error Handling** - Invalid data handling and stability + +### Example Output + +``` +🎙️ Voice Backend Automated Test Suite +============================================================ + ✅ Authentication (0.34s) + ✅ Content Loading (0.12s) + ✅ Session Creation (0.28s) + ✅ WebSocket Connection (1.45s) + ✅ Text Messaging (3.21s) + ✅ Audio Simulation (2.18s) + ✅ Graceful Disconnect (0.52s) + ✅ Error Handling (1.03s) + +📈 Overall: 8/8 tests passed (100.0%) +⏱️ Total time: 9.13s +🎉 All tests passed! Voice backend is working correctly. +``` + +## 🛠️ Advanced Usage + +### Custom Test Credentials + +```bash +# Use different user account +python test/voice/setup_voice_test.py --user alice@company.com --password mypass + +# Test with admin account +python test/voice/test_voice_backend.py --user admin@example.com --password admin123 +``` + +### Multiple Test Sessions + +```bash +# Create multiple test sessions for load testing +for i in {1..5}; do + python test/voice/setup_voice_test.py --user "test${i}@example.com" & +done +``` + +### CI/CD Integration + +```bash +#!/bin/bash +# ci_voice_test.sh + +# Start server in background +python src/python/run_server.py & +SERVER_PID=$! + +# Wait for server startup +sleep 5 + +# Run voice tests +python test/voice/test_voice_backend.py +TEST_RESULT=$? + +# Cleanup +kill $SERVER_PID + +exit $TEST_RESULT +``` + +## 🎛️ Interactive Test Features + +### HTML Test Page Capabilities + +- **🔗 Connection Management**: Connect/disconnect with status indicators +- **📝 Text Input**: Send text messages with Enter key support +- **🎤 Audio Recording**: Push-to-talk with microphone access +- **📊 Real-time Stats**: Message counts, connection time, audio chunks +- **🔍 Debug Log**: WebSocket message inspection with timestamps +- **📜 Transcript View**: Conversation history with partial updates + +### Browser Requirements + +- **WebSocket Support**: Modern browsers (Chrome 16+, Firefox 11+, Safari 7+) +- **Microphone Access**: HTTPS or localhost required for getUserMedia() +- **Audio Context**: Web Audio API support for audio processing + +### Troubleshooting Interactive Tests + +| Issue | Cause | Solution | +|-------|-------|---------| +| "Connection failed" | Server not running | Start `python src/python/run_server.py` | +| "Microphone error" | Permission denied | Allow microphone access in browser | +| "Session not found" | Expired session | Run setup script again | +| "Invalid token" | JWT expired | Re-run setup script for new token | + +## 🔧 Backend Requirements + +### Server Setup + +```bash +# 1. Install dependencies +pip install -r src/python/requirements.txt + +# 2. Set environment variables +export STORAGE_PATH="./data/test" +export JWT_SECRET_KEY="test-secret-key" + +# 3. Start server +python src/python/run_server.py +``` + +### Test User Setup + +```bash +# Create test user (if needed) +python -c " +import asyncio +from src.python.role_play.common.auth import AuthManager +from src.python.role_play.common.storage import FileStorage +from src.python.role_play.common.storage_factory import create_storage + +async def create_user(): + storage = create_storage() + auth = AuthManager(storage) + await auth.register_user('test@example.com', 'password', 'USER') + print('Test user created') + +asyncio.run(create_user()) +" +``` + +## 📊 Message Format Reference + +### Client to Server (VoiceRequest) + +```json +{ + "mime_type": "text/plain" | "audio/pcm", + "data": "base64_encoded_data", + "end_session": false +} +``` + +### Server to Client (VoiceMessage) + +```json +// Configuration +{ + "type": "config", + "audio_format": "pcm", + "sample_rate": 16000, + "channels": 1, + "bit_depth": 16, + "language": "en" +} + +// Status updates +{ + "type": "status", + "status": "connecting" | "ready" | "ended", + "message": "Human readable status" +} + +// Transcripts +{ + "type": "transcript_partial", + "text": "Partial text...", + "role": "user" | "assistant", + "stability": 0.85 +} + +{ + "type": "transcript_final", + "text": "Final transcribed text", + "role": "user" | "assistant", + "confidence": 0.92 +} + +// Audio data +{ + "type": "audio", + "data": "base64_encoded_pcm_audio", + "mime_type": "audio/pcm" +} + +// Turn management +{ + "type": "turn_status", + "turn_complete": true, + "interrupted": false +} + +// Errors +{ + "type": "error", + "error": "Error description", + "timestamp": "2025-01-01T12:00:00Z" +} +``` + +## 🚨 Common Issues + +### Connection Issues + +1. **"Connection refused"** + - Check if server is running on port 8000 + - Verify `python src/python/run_server.py` is active + +2. **"Authentication failed"** + - Verify test user exists: `test@example.com` / `password` + - Check JWT_SECRET_KEY environment variable + +3. **"Session not found"** + - Sessions expire after 1 hour + - Re-run setup script to create fresh session + +### Audio Issues + +1. **"Microphone error"** + - Grant microphone permissions in browser + - Use HTTPS or localhost for getUserMedia() + +2. **"No audio received"** + - Check ADK/Gemini API configuration + - Verify voice model availability + +### Performance Issues + +1. **Slow responses** + - Check network connectivity to Gemini API + - Monitor server logs for errors + - Verify adequate system resources + +## 🔍 Debug Tips + +### Server-side Debugging + +```bash +# Run with debug logging +PYTHONPATH=./src/python LOG_LEVEL=DEBUG python src/python/run_server.py + +# Monitor voice handler logs +tail -f logs/voice_handler.log +``` + +### Client-side Debugging + +1. **Browser Developer Tools** + - Network tab: WebSocket connection details + - Console tab: JavaScript errors and debug messages + - Application tab: localStorage inspection + +2. **WebSocket Message Inspection** + - Use the debug panel in the HTML test page + - Monitor sent/received message timestamps + - Check message size and format + +### API Testing + +```bash +# Test REST endpoints +curl -H "Authorization: Bearer $JWT_TOKEN" \ + http://localhost:8000/api/chat/content/scenarios + +# Check WebSocket endpoint +wscat -c "ws://localhost:8000/api/voice/ws/$SESSION_ID?token=$JWT_TOKEN" +``` + +## 📈 Performance Benchmarks + +| Test | Expected Duration | Pass Criteria | +|------|------------------|---------------| +| Authentication | < 1s | JWT token received | +| Session Creation | < 2s | Valid session ID | +| WebSocket Connection | < 3s | Ready status received | +| Text Messaging | < 10s | Transcript + audio response | +| Audio Simulation | < 8s | Audio processing confirmed | +| Graceful Disconnect | < 2s | Clean connection close | + +## 🤝 Contributing + +To add new tests: + +1. **Add test method** to `VoiceBackendTester` class +2. **Include in test suite** by adding to `tests` array in `run_all_tests()` +3. **Document expected behavior** in this README +4. **Update HTML template** if testing new WebSocket message types + +### Test Method Template + +```python +async def test_new_feature(self) -> bool: + """Test new voice feature.""" + start_time = time.time() + + try: + async with websockets.connect(self.ws_url) as websocket: + await self._wait_for_ready(websocket) + + # Test implementation here + + duration = time.time() - start_time + self.add_test_result("New Feature", True, "Success details", duration) + return True + + except Exception as e: + duration = time.time() - start_time + self.add_test_result("New Feature", False, str(e), duration) + return False +``` + +--- + +## 📞 Support + +For issues with voice testing: + +1. **Check server logs** for backend errors +2. **Verify test user credentials** and permissions +3. **Test with simple curl/wscat** to isolate issues +4. **Review WebSocket message format** against API documentation + +Happy testing! 🎙️✨ \ No newline at end of file diff --git a/test/voice/setup_voice_test.py b/test/voice/setup_voice_test.py new file mode 100644 index 0000000..493fb51 --- /dev/null +++ b/test/voice/setup_voice_test.py @@ -0,0 +1,552 @@ +#!/usr/bin/env python3 +""" +Voice Test Setup Script + +This script creates a voice testing session by: +1. Logging in with test credentials +2. Creating a chat session with scenario/character +3. Generating an HTML test page with embedded credentials +4. Providing instructions for testing + +Usage: + python test/voice/setup_voice_test.py [--user email] [--password pass] +""" + +import asyncio +import httpx +import sys +import os +import argparse +from pathlib import Path +from urllib.parse import quote + +BASE_URL = "http://localhost:8000/api" +HTML_TEMPLATE_FILE = "voice_test_template.html" +OUTPUT_HTML_FILE = "test_session.html" + +async def create_voice_test_session(email="test@example.com", password="password"): + """Create a complete voice test session setup.""" + + print("🎙️ Voice Backend Test Setup") + print("=" * 50) + + async with httpx.AsyncClient() as client: + # 1. Login and get JWT token + print("🔐 Authenticating...") + try: + login_data = {"email": email, "password": password} + resp = await client.post(f"{BASE_URL}/auth/login", json=login_data) + + if resp.status_code != 200: + print(f" ❌ Login failed: {resp.text}") + print(f" 💡 Try: python run_server.py first, then create test user") + return False + + jwt_token = resp.json()["access_token"] + print(f" ✅ Authenticated as {email}") + + except Exception as e: + print(f" ❌ Connection failed: {e}") + print(f" 💡 Make sure server is running: python src/python/run_server.py") + return False + + # 2. Get available content + print("\n📋 Setting up chat session...") + headers = {"Authorization": f"Bearer {jwt_token}"} + + try: + # Get scenarios + resp = await client.get(f"{BASE_URL}/chat/content/scenarios", headers=headers) + scenarios = resp.json()["scenarios"] + + if not scenarios: + print(" ❌ No scenarios available") + return False + + scenario = scenarios[0] + print(f" 📖 Using scenario: {scenario['name']}") + + # Get characters for this scenario + resp = await client.get(f"{BASE_URL}/chat/content/scenarios/{scenario['id']}/characters", headers=headers) + characters = resp.json()["characters"] + + if not characters: + print(" ❌ No characters available for scenario") + return False + + character = characters[0] + print(f" 👤 Using character: {character['name']}") + + except Exception as e: + print(f" ❌ Failed to get content: {e}") + return False + + # 3. Create chat session + try: + session_data = { + "scenario_id": scenario["id"], + "character_id": character["id"], + "participant_name": "Voice Test User" + } + + resp = await client.post(f"{BASE_URL}/chat/session", json=session_data, headers=headers) + if resp.status_code != 200: + print(f" ❌ Session creation failed: {resp.text}") + return False + + session_id = resp.json()["session_id"] + print(f" ✅ Session created: {session_id}") + + except Exception as e: + print(f" ❌ Failed to create session: {e}") + return False + + # 4. Generate HTML test page + print("\n🌐 Generating test page...") + try: + test_dir = Path(__file__).parent + html_content = generate_test_html(jwt_token, session_id, scenario, character) + + output_path = test_dir / OUTPUT_HTML_FILE + with open(output_path, 'w') as f: + f.write(html_content) + + print(f" ✅ Test page created: {output_path}") + + except Exception as e: + print(f" ❌ Failed to create HTML: {e}") + return False + + # 5. Print instructions + print("\n🚀 Ready to test!") + print("=" * 50) + print("📁 Files created:") + print(f" • {output_path}") + print("\n🌐 Open in browser:") + print(f" • file://{output_path.absolute()}") + print("\n🖥️ Or start local server:") + print(f" • cd {test_dir}") + print(f" • python -m http.server 8080") + print(f" • Open: http://localhost:8080/{OUTPUT_HTML_FILE}") + print("\n🎯 Test with:") + print(" • JWT Token and Session ID are pre-filled") + print(" • Click 'Connect' to start voice session") + print(" • Use 'Push to Talk' or type text messages") + print(" • Check browser dev tools for WebSocket messages") + print("\n💡 Credentials:") + print(f" • JWT: {jwt_token[:50]}...") + print(f" • Session: {session_id}") + + return True + +def generate_test_html(jwt_token, session_id, scenario, character): + """Generate HTML test page with embedded credentials.""" + + # Read the template from the same directory or create inline + test_dir = Path(__file__).parent + template_path = test_dir / HTML_TEMPLATE_FILE + + if template_path.exists(): + with open(template_path, 'r') as f: + template = f.read() + else: + # Use inline template if file doesn't exist + template = create_inline_html_template() + + # Replace placeholders + html_content = template.replace("{{JWT_TOKEN}}", jwt_token) + html_content = html_content.replace("{{SESSION_ID}}", session_id) + html_content = html_content.replace("{{SCENARIO_NAME}}", scenario.get('name', 'Unknown')) + html_content = html_content.replace("{{CHARACTER_NAME}}", character.get('name', 'Unknown')) + html_content = html_content.replace("{{BASE_URL}}", BASE_URL.replace('http://', 'ws://').replace('https://', 'wss://')) + + return html_content + +def create_inline_html_template(): + """Create HTML template inline if template file doesn't exist.""" + return ''' + + + + + Voice Backend Test - {{SCENARIO_NAME}} with {{CHARACTER_NAME}} + + + +
+

🎙️ Voice Backend Test

+ +
+ Test Session: {{SCENARIO_NAME}} with {{CHARACTER_NAME}}
+ Session ID: {{SESSION_ID}}
+ Status: Ready to connect +
+ + + +
Ready to connect
+ +
+ + + + +
+ +
+ + +
+ +
+
Transcript will appear here...
+
+
+ + + +''' + +async def main(): + parser = argparse.ArgumentParser(description='Setup voice testing session') + parser.add_argument('--user', default='test@example.com', help='Login email') + parser.add_argument('--password', default='password', help='Login password') + args = parser.parse_args() + + success = await create_voice_test_session(args.user, args.password) + sys.exit(0 if success else 1) + +if __name__ == "__main__": + asyncio.run(main()) \ No newline at end of file diff --git a/test/voice/test_session.html b/test/voice/test_session.html new file mode 100644 index 0000000..4a10ed1 --- /dev/null +++ b/test/voice/test_session.html @@ -0,0 +1,673 @@ + + + + + + Voice Backend Test - Medical Patient Interview with Sarah - Chronic Pain Patient + + + +
+

+ 🎙️ Voice Backend Test +
+ Real-time WebSocket Testing +
+

+ +
+ 📖 Scenario: Medical Patient Interview
+ 👤 Character: Sarah - Chronic Pain Patient
+ 🔗 Session ID: 1dc7a4fe-2038-4044-9a15-d9d63f201b7b
+ 🎯 Purpose: Test voice backend without full frontend +
+ + + + + +
Ready to connect
+ +
+ + + + +
+ +
+ + +
+ +
+
+
0
+
Messages Sent
+
+
+
0
+
Messages Received
+
+
+
0
+
Audio Chunks
+
+
+
0s
+
Connected Time
+
+
+ +
+
+ 💬 Conversation transcript will appear here...
+ Try sending a text message or using push-to-talk +
+
+ +
+

🔍 WebSocket Debug Log

+
+
Debug messages will appear here...
+
+
+
+ + + + \ No newline at end of file diff --git a/test/voice/test_voice_backend.py b/test/voice/test_voice_backend.py new file mode 100644 index 0000000..26107d1 --- /dev/null +++ b/test/voice/test_voice_backend.py @@ -0,0 +1,560 @@ +#!/usr/bin/env python3 +""" +Voice Backend Automated Test Script + +This script performs comprehensive testing of the voice backend: +1. Authentication and session creation +2. WebSocket connection establishment +3. Text message sending and response verification +4. Audio message simulation +5. Transcript capture verification +6. Error handling and cleanup + +Usage: + python test/voice/test_voice_backend.py [--user email] [--password pass] [--verbose] +""" + +import asyncio +import websockets +import json +import httpx +import sys +import base64 +import argparse +import time +from typing import Optional, Dict, Any, List +from pathlib import Path + +BASE_URL = "http://localhost:8000/api" + +class VoiceBackendTester: + """Comprehensive voice backend testing class.""" + + def __init__(self, email: str = "test@example.com", password: str = "password", verbose: bool = False): + self.email = email + self.password = password + self.verbose = verbose + self.jwt_token: Optional[str] = None + self.session_id: Optional[str] = None + self.ws_url: Optional[str] = None + self.test_results: List[Dict[str, Any]] = [] + + def log(self, message: str, level: str = "INFO"): + """Log message with optional verbosity control.""" + if level == "ERROR" or self.verbose: + timestamp = time.strftime("%H:%M:%S") + print(f"[{timestamp}] {level}: {message}") + + def add_test_result(self, test_name: str, success: bool, details: str = "", duration: float = 0): + """Record test result.""" + result = { + "test": test_name, + "success": success, + "details": details, + "duration": duration + } + self.test_results.append(result) + + status = "✅" if success else "❌" + duration_str = f" ({duration:.2f}s)" if duration > 0 else "" + print(f" {status} {test_name}{duration_str}") + if details and (not success or self.verbose): + print(f" {details}") + + async def setup_session(self) -> bool: + """Setup authentication and create chat session.""" + start_time = time.time() + + try: + async with httpx.AsyncClient() as client: + # 1. Login + self.log("Authenticating with backend...") + login_data = {"email": self.email, "password": self.password} + resp = await client.post(f"{BASE_URL}/auth/login", json=login_data, timeout=10.0) + + if resp.status_code != 200: + self.add_test_result( + "Authentication", + False, + f"Login failed: {resp.status_code} {resp.text[:100]}" + ) + return False + + self.jwt_token = resp.json()["access_token"] + self.add_test_result("Authentication", True, f"Logged in as {self.email}") + + # 2. Get content for session creation + headers = {"Authorization": f"Bearer {self.jwt_token}"} + + # Get scenarios + resp = await client.get(f"{BASE_URL}/chat/content/scenarios", headers=headers) + if resp.status_code != 200: + self.add_test_result("Content Loading", False, "Failed to get scenarios") + return False + + scenarios = resp.json()["scenarios"] + if not scenarios: + self.add_test_result("Content Loading", False, "No scenarios available") + return False + + scenario = scenarios[0] + + # Get characters + resp = await client.get( + f"{BASE_URL}/chat/content/scenarios/{scenario['id']}/characters", + headers=headers + ) + characters = resp.json()["characters"] + if not characters: + self.add_test_result("Content Loading", False, "No characters available") + return False + + character = characters[0] + self.add_test_result( + "Content Loading", + True, + f"Using {scenario['name']} with {character['name']}" + ) + + # 3. Create session + session_data = { + "scenario_id": scenario["id"], + "character_id": character["id"], + "participant_name": "Voice Test Bot" + } + + resp = await client.post(f"{BASE_URL}/chat/session", json=session_data, headers=headers) + if resp.status_code != 200: + self.add_test_result("Session Creation", False, f"Failed: {resp.text[:100]}") + return False + + self.session_id = resp.json()["session_id"] + self.ws_url = f"ws://localhost:8000/api/voice/ws/{self.session_id}?token={self.jwt_token}" + + duration = time.time() - start_time + self.add_test_result( + "Session Creation", + True, + f"Created session {self.session_id}", + duration + ) + + return True + + except Exception as e: + duration = time.time() - start_time + self.add_test_result("Setup", False, f"Exception: {str(e)}", duration) + return False + + async def test_websocket_connection(self) -> bool: + """Test WebSocket connection establishment.""" + start_time = time.time() + + try: + async with websockets.connect(self.ws_url, open_timeout=10) as websocket: + self.log("WebSocket connected, waiting for ready status...") + + # Wait for ready status + ready = False + config_received = False + timeout_count = 0 + max_timeouts = 10 + + while not ready and timeout_count < max_timeouts: + try: + message = await asyncio.wait_for(websocket.recv(), timeout=1.0) + data = json.loads(message) + + self.log(f"Received: {data.get('type', 'unknown')} - {data}", "DEBUG") + + if data.get('type') == 'error': + self.log(f"WebSocket error: {data.get('error', 'Unknown error')}", "ERROR") + break + elif data.get('type') == 'config': + config_received = True + self.log(f"Config: {data.get('audio_format')} @ {data.get('sample_rate')}Hz") + + elif data.get('type') == 'status': + status = data.get('status', '') + if status == 'ready': + ready = True + break + + except asyncio.TimeoutError: + timeout_count += 1 + continue + + duration = time.time() - start_time + + if ready and config_received: + self.add_test_result( + "WebSocket Connection", + True, + "Connected and received ready status", + duration + ) + return True + else: + self.add_test_result( + "WebSocket Connection", + False, + f"Timeout waiting for ready (config: {config_received})", + duration + ) + return False + + except Exception as e: + duration = time.time() - start_time + self.add_test_result("WebSocket Connection", False, str(e), duration) + return False + + async def test_text_messaging(self) -> bool: + """Test text message sending and response.""" + start_time = time.time() + + try: + async with websockets.connect(self.ws_url) as websocket: + # Wait for ready + await self._wait_for_ready(websocket) + + # Send text message + test_message = "Hello! Please respond with just 'Hi there!' to confirm you received this." + text_base64 = base64.b64encode(test_message.encode('utf-8')).decode('ascii') + + message = { + "mime_type": "text/plain", + "data": text_base64, + "end_session": False + } + + await websocket.send(json.dumps(message)) + self.log(f"Sent text: '{test_message}'") + + # Wait for response + transcript_received = False + audio_received = False + response_text = "" + + start_wait = time.time() + while time.time() - start_wait < 15: # 15 second timeout + try: + response = await asyncio.wait_for(websocket.recv(), timeout=2.0) + data = json.loads(response) + + if data.get('type') == 'transcript_final': + transcript_received = True + response_text = data.get('text', '') + self.log(f"Received transcript: '{response_text}'") + + elif data.get('type') == 'audio': + audio_received = True + self.log(f"Received audio chunk: {len(data.get('data', ''))} chars") + + elif data.get('type') == 'turn_status' and data.get('turn_complete'): + self.log("Turn completed") + break + + except asyncio.TimeoutError: + continue + + duration = time.time() - start_time + + # Evaluate results + if transcript_received and audio_received: + self.add_test_result( + "Text Messaging", + True, + f"Received transcript and audio. Response: '{response_text[:50]}...'", + duration + ) + return True + else: + missing = [] + if not transcript_received: + missing.append("transcript") + if not audio_received: + missing.append("audio") + + self.add_test_result( + "Text Messaging", + False, + f"Missing: {', '.join(missing)}", + duration + ) + return False + + except Exception as e: + duration = time.time() - start_time + self.add_test_result("Text Messaging", False, str(e), duration) + return False + + async def test_audio_simulation(self) -> bool: + """Test simulated audio message sending.""" + start_time = time.time() + + try: + async with websockets.connect(self.ws_url) as websocket: + await self._wait_for_ready(websocket) + + # Generate fake audio data (1 second of silent PCM) + sample_rate = 16000 + duration_seconds = 1 + samples = sample_rate * duration_seconds + + # Create silent audio (16-bit PCM) + import struct + audio_data = b''.join(struct.pack(' bool: + """Test graceful session termination.""" + start_time = time.time() + + try: + async with websockets.connect(self.ws_url) as websocket: + await self._wait_for_ready(websocket) + + # Send end session message + end_message = { + "mime_type": "text/plain", + "data": "", + "end_session": True + } + + await websocket.send(json.dumps(end_message)) + self.log("Sent end session message") + + # Wait for connection to close gracefully + closed_gracefully = False + try: + await asyncio.wait_for(websocket.recv(), timeout=3.0) + except websockets.exceptions.ConnectionClosed: + closed_gracefully = True + except asyncio.TimeoutError: + pass + + duration = time.time() - start_time + + if closed_gracefully: + self.add_test_result( + "Graceful Disconnect", + True, + "Session ended cleanly", + duration + ) + return True + else: + self.add_test_result( + "Graceful Disconnect", + False, + "Connection did not close gracefully", + duration + ) + return False + + except Exception as e: + duration = time.time() - start_time + self.add_test_result("Graceful Disconnect", False, str(e), duration) + return False + + async def test_error_handling(self) -> bool: + """Test error handling with invalid data.""" + start_time = time.time() + + try: + async with websockets.connect(self.ws_url) as websocket: + await self._wait_for_ready(websocket) + + # Send invalid message + invalid_message = { + "mime_type": "invalid/type", + "data": "invalid_base64_data!!!", + "end_session": False + } + + await websocket.send(json.dumps(invalid_message)) + self.log("Sent invalid message") + + # Check if we get an error response or connection stays stable + error_handled = False + connection_stable = True + + try: + for _ in range(5): # Check for 5 seconds + response = await asyncio.wait_for(websocket.recv(), timeout=1.0) + data = json.loads(response) + + if data.get('type') == 'error': + error_handled = True + self.log(f"Received error response: {data.get('error', '')}") + break + + except asyncio.TimeoutError: + pass # No response is also valid + except websockets.exceptions.ConnectionClosed: + connection_stable = False + + duration = time.time() - start_time + + if connection_stable: + self.add_test_result( + "Error Handling", + True, + f"Connection stable, error handled: {error_handled}", + duration + ) + return True + else: + self.add_test_result( + "Error Handling", + False, + "Connection closed unexpectedly", + duration + ) + return False + + except Exception as e: + duration = time.time() - start_time + self.add_test_result("Error Handling", False, str(e), duration) + return False + + async def _wait_for_ready(self, websocket, timeout: float = 10.0) -> bool: + """Wait for WebSocket to reach ready state.""" + start_time = time.time() + + while time.time() - start_time < timeout: + try: + message = await asyncio.wait_for(websocket.recv(), timeout=1.0) + data = json.loads(message) + + if data.get('type') == 'status' and data.get('status') == 'ready': + return True + + except asyncio.TimeoutError: + continue + + return False + + async def run_all_tests(self) -> bool: + """Run complete test suite.""" + print("🎙️ Voice Backend Automated Test Suite") + print("=" * 60) + + overall_start = time.time() + + # 1. Setup + if not await self.setup_session(): + return False + + # 2. Core tests + tests = [ + self.test_websocket_connection, + self.test_text_messaging, + self.test_audio_simulation, + self.test_graceful_disconnect, + self.test_error_handling, + ] + + passed = 0 + total = len(tests) + + for test in tests: + if await test(): + passed += 1 + + # 3. Results summary + overall_duration = time.time() - overall_start + success_rate = (passed / total) * 100 if total > 0 else 0 + + print("\n" + "=" * 60) + print("📊 Test Results Summary") + print("=" * 60) + + for result in self.test_results: + status = "✅" if result["success"] else "❌" + duration = f" ({result['duration']:.2f}s)" if result["duration"] > 0 else "" + print(f"{status} {result['test']}{duration}") + if result["details"] and (not result["success"] or self.verbose): + print(f" └─ {result['details']}") + + print(f"\n📈 Overall: {passed}/{total} tests passed ({success_rate:.1f}%)") + print(f"⏱️ Total time: {overall_duration:.2f}s") + + if passed == total: + print("🎉 All tests passed! Voice backend is working correctly.") + return True + else: + print(f"⚠️ {total - passed} test(s) failed. Check the details above.") + return False + +async def main(): + parser = argparse.ArgumentParser(description='Test voice backend functionality') + parser.add_argument('--user', default='test@example.com', help='Login email') + parser.add_argument('--password', default='password', help='Login password') + parser.add_argument('--verbose', '-v', action='store_true', help='Verbose output') + args = parser.parse_args() + + tester = VoiceBackendTester(args.user, args.password, args.verbose) + + try: + success = await tester.run_all_tests() + sys.exit(0 if success else 1) + except KeyboardInterrupt: + print("\n⏹️ Test interrupted by user") + sys.exit(1) + except Exception as e: + print(f"\n❌ Test suite failed: {e}") + sys.exit(1) + +if __name__ == "__main__": + asyncio.run(main()) \ No newline at end of file diff --git a/test/voice/voice_test_template.html b/test/voice/voice_test_template.html new file mode 100644 index 0000000..fcc2e85 --- /dev/null +++ b/test/voice/voice_test_template.html @@ -0,0 +1,673 @@ + + + + + + Voice Backend Test - {{SCENARIO_NAME}} with {{CHARACTER_NAME}} + + + +
+

+ 🎙️ Voice Backend Test +
+ Real-time WebSocket Testing +
+

+ +
+ 📖 Scenario: {{SCENARIO_NAME}}
+ 👤 Character: {{CHARACTER_NAME}}
+ 🔗 Session ID: {{SESSION_ID}}
+ 🎯 Purpose: Test voice backend without full frontend +
+ + + + + +
Ready to connect
+ +
+ + + + +
+ +
+ + +
+ +
+
+
0
+
Messages Sent
+
+
+
0
+
Messages Received
+
+
+
0
+
Audio Chunks
+
+
+
0s
+
Connected Time
+
+
+ +
+
+ 💬 Conversation transcript will appear here...
+ Try sending a text message or using push-to-talk +
+
+ +
+

🔍 WebSocket Debug Log

+
+
Debug messages will appear here...
+
+
+
+ + + + \ No newline at end of file