xCatG · xCatG · Aug 14, 2025 · Aug 15, 2025 · Aug 15, 2025 · Aug 15, 2025
diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,47 @@
+# Repository Guidelines
+
+## Project Structure & Module Organization
+- Backend (Python): `src/python/role_play/*` (API, chat, voice, evaluation, common). Entry point: `src/python/run_server.py`.
+- Frontend (Vue + TS): `src/ts/role_play/ui` (Vite app: `src`, `components`, `composables`, `services`).
+- Tests: `test/python/{unit,integration}` with shared fixtures in `test/python/fixtures`.
+- Config: environment YAMLs in `config/{dev,beta,prod}.yaml` (env vars can override). Data/resources in `data/`.
+- Tooling: `Makefile` (build/test/deploy), `Dockerfile`, `pytest.ini`, `.env(.example)`.
+
+## Build, Test, and Development Commands
+- Backend setup: `python -m venv venv && source venv/bin/activate && pip install -r src/python/requirements-dev.txt`.
+- Run API locally: `source venv/bin/activate && python src/python/run_server.py` (ensure `STORAGE_PATH` exists; defaults to `./data`).
+- Frontend dev: `cd src/ts/role_play/ui && npm i && npm run dev` (Vite at http://localhost:5173).
+- Test suite: `make test` (pytest with coverage) or `pytest -q` (see markers below).
+- Docker (local): `make run-local-docker DATA_DIR=./data` (serves on http://localhost:8080).
+- Build/Deploy: `make build-docker`, `make push-docker`, `make deploy ENV=dev` (requires GCP config; see `ENVIRONMENTS.md`).
+
+## Coding Style & Naming Conventions
+- Python: format with Black; imports via isort; prefer type hints. Naming: `snake_case` (functions/modules), `PascalCase` (classes), `UPPER_SNAKE` (constants).
+- TypeScript/Vue: `PascalCase` for components (`*.vue`), `camelCase` for composables/services (e.g., `useChatData.ts`). Two-space indent.
+- Keep modules under existing namespaces (do not create parallel roots).
+
+## Testing Guidelines
+- Framework: pytest. Coverage target: 25%+ (HTML at `test/python/htmlcov/index.html`).
+- Discovery: files `test_*.py`; classes `Test*`; functions `test_*`.
+- Markers: `unit`, `integration`, `e2e`, `slow`, `auth`, `storage`, `cloud`. Example: `pytest -m unit`.
+
+## Commit & Pull Request Guidelines
+- Style: Conventional Commits when possible (e.g., `feat: ...`, `fix(deps): ...`).
+- Commits: small, descriptive, present tense; reference issues (e.g., `#42`).
+- PRs: include summary, rationale, test plan, and screenshots for UI changes. Link issues and note any config/devops changes.
+- CI: ensure `make test` passes locally before requesting review.
+
+## Security & Configuration Tips
+- Never commit secrets. Use `.env` for local dev; production secrets live in GCP Secret Manager.
+- Adjust runtime via `config/*.yaml` and env vars (`PORT`, `STORAGE_PATH`, `CORS_ALLOWED_ORIGINS`, etc.). See `ENVIRONMENTS.md` and `STORAGE_CONFIG.md`.
+
+## Agent-Specific Instructions (Claude/Gemini)
+- Architecture: layered modules; handlers are stateless and created per request/connection. Register handlers via YAML in `config/*.yaml`.
+- Dependency Injection: use FastAPI `Depends()`; cache singletons with `functools.lru_cache` (e.g., ContentLoader, ChatLogger). Avoid mutable state on handler instances.
+- Storage & Locking: abstract through `StorageBackend` (file/GCS/S3). Use key paths without extensions (e.g., `users/{user_id}/profile`). Separate lock lease duration from acquisition timeout; wrap blocking I/O with `asyncio.to_thread`.
+- Chat System: persist messages as JSONL under `users/{user_id}/chat_logs/{session_id}`; create a fresh ADK runner per message; drive prompts by user language. See `/GEMINI.md` and root `/CLAUDE.md` for ADK notes.
+- Evaluation Reports: store at `users/{user_id}/eval_reports/{session_id}/{timestamp_uuid}` with metadata; expose GET latest/all and POST re-evaluate endpoints.
+- Frontend Patterns: domain-based Vue structure, composables for async ops and confirmations, sync TS types with Pydantic models, inject JWT via `Authorization: Bearer <token>`; i18n supports `en` and `zh-TW`.
+- Testing: prefer fast unit tests; mark `integration`, `e2e`, `slow`, `cloud` selectively. Use `make test-chat` for chat-only coverage.
+
+For deeper guidance, refer to: `GEMINI.md` (model/runtime, storage/locking overview), `CLAUDE.md` (repo-wide workflows), `src/python/CLAUDE.md` (Python DI/stateless patterns), `src/ts/CLAUDE.md` (frontend patterns), and `test/CLAUDE.md` (test layout and conventions).
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -182,6 +182,29 @@ make test-specific TEST_PATH="test/python/unit/chat/test_chat_logger.py"
 - [x] **Internationalization**: Full English/Traditional Chinese support for new UI elements
 - [x] **CSS Improvements**: Fixed radio button alignment issues with proper flexbox layout
 
+### Voice Chat with Direct ADK Integration (Completed & Radically Simplified)
+- [x] **Radical Architecture Simplification**: Eliminated over-engineered transcript management and wrapper classes
+  - **Direct ADK Integration**: Handler stores `Runner`, `live_events`, `live_request_queue` directly
+  - **Native Transcript Handling**: Uses ADK's built-in `is_final` flags instead of custom buffering
+  - **Minimal Models**: Reduced from 150+ lines with 7+ types to 30 lines with 2 generic types (`VoiceRequest`, `VoiceMessage`)
+  - **Code Reduction**: ~470 lines removed, 4-layer abstraction simplified to 2-layer
+- [x] **Streamlined Backend Voice Module** (`src/python/role_play/voice/`):
+  - **VoiceChatHandler**: Direct ADK integration with WebSocket endpoint (`/api/voice/ws/{session_id}`)
+  - **No Wrapper Classes**: Eliminated `LiveVoiceSession`, `TranscriptBuffer`, `SessionTranscriptManager`
+  - **ADK Event Processing**: Direct processing of `run_live()` events without intermediate transformations
+  - **Generic Models**: Flexible `VoiceRequest`/`VoiceMessage` with `extra="allow"` for any field structure
+- [x] **Preserved Functionality**: All original features maintained with radical simplification
+  - **Transcript Capture**: Reliable logging using ADK's native finalization mechanisms
+  - **Real-time Streaming**: Bidirectional audio/text communication preserved
+  - **Session Management**: WebSocket lifecycle and error handling maintained
+  - **ChatLogger Integration**: Voice logging methods unchanged, full JSONL compatibility
+- [x] **Architecture Benefits**:
+  - **Maximum Simplification**: Direct ADK utilization without wrapper overhead
+  - **Future-Proof**: Automatic benefits from ADK improvements
+  - **Maintainable**: Fewer abstractions, easier to understand and modify
+  - **Performance**: Reduced memory footprint and processing overhead
+- [x] **Testing Updated**: 13 comprehensive tests covering simplified architecture, all 328 tests passing
+
 ### Pending Development
 - [ ] **Resource Architecture for Script Creator**:
   - [x] Design LayeredResourceLoader for base + user resources (see RESOURCE_ARCHITECTURE.md)
@@ -197,7 +220,6 @@ make test-specific TEST_PATH="test/python/unit/chat/test_chat_logger.py"
   - [ ] Create utility functions for date formatting across components
   - [ ] Add validation that session belongs to requesting user before creating evaluation reports
   - [ ] Add retry logic for transient storage failures in evaluation system
-- [ ] WebSocket: `server/websocket.py` connection manager
 - [ ] Auth Module: Complete OAuth implementation
 - [ ] Scripter: Complete module implementation  
 - [ ] Frontend: Modular monolith restructure, chat/eval interfaces
@@ -279,6 +301,7 @@ make test-specific TEST_PATH="test/python/unit/chat/test_chat_logger.py"
 ### Architecture Highlights
 - **Storage**: Async distributed locking, lease (60-300s) vs timeout (5-30s) separation
 - **Chat**: Separated ADK runtime from JSONL persistence, per-message Runner creation, utility methods for JSONL parsing, centralized agent configuration
+- **Voice**: Three-tier transcript management (partial/stabilization/final), ADK `run_live()` integration, intelligent buffering prevents fragmented logs
 - **Backend Structure**: Helper methods for session validation, message logging, content loading, response generation
 - **Frontend Patterns**: Composable architecture for modal management, async operations, data loading, dual-flow session creation with script/character selection
 - **Config**: YAML + env vars, dynamic handler loading, fail-fast validation

diff --git a/config/dev.yaml b/config/dev.yaml
@@ -59,6 +59,7 @@ enabled_handlers:
   user_account: "role_play.server.user_account_handler.UserAccountHandler"
   chat: "role_play.chat.handler.ChatHandler"
   evaluation: "role_play.evaluation.handler.EvaluationHandler"
+  voice: "role_play.voice.handler.VoiceChatHandler"
   # Add more handlers as they're implemented:
   # scripter: "role_play.scripter.handler.ScripterHandler"
 
@@ -71,3 +72,29 @@ supported_languages:
 # Resource configuration
 resources:
   base_prefix: "resources/"
+
+# Voice chat configuration
+voice:
+  # Transcript buffering settings
+  transcript:
+    stability_threshold: "${VOICE_STABILITY_THRESHOLD:0.8}"
+    finalization_timeout_ms: "${VOICE_FINALIZATION_TIMEOUT:2000}"
+    min_utterance_length: "${VOICE_MIN_UTTERANCE_LENGTH:3}"
+    sentence_boundary_patterns:
+      - "[.!?]+\\s*$"
+      - "\\n+"
+
+  # Audio processing settings
+  audio:
+    default_format: "pcm"
+    default_sample_rate: 16000
+    default_channels: 1
+    default_bit_depth: 16
+    chunk_size_ms: "${VOICE_CHUNK_SIZE_MS:100}"
+
+  # Voice model settings
+  model:
+    default_voice: "Aoede"
+    gemini_api_key: "${GEMINI_API_KEY:}"
+    # Mock mode when no API key is available
+    enable_mock: "${VOICE_ENABLE_MOCK:true}"