fix(reflection): improve plan mode and route feedback #52

dzianisv · 2026-02-11T05:41:16Z

Summary

Skip reflection-static in Plan mode using metadata and expanded plan-only patterns, with new tests
Route reflection feedback retries via GenAI task classification to provider-specific models
Add whisper /transcribe-base64 alias and harden telegram webhook tests with timeouts

Testing

npm run typecheck
npm test
npm run test:load
OPENCODE_E2E=1 npm run test:e2e
npm run test:telegram
npx tsx test/test-telegram-whisper.ts

Added an image to the README and improved the description.

…dance

…tions to chat

… UI integration

… README

- promptAsync() + polling pattern - Proper timeout constants (180s for judge) - Logging for debugging The issue is the default model xai/grok-3-mini-latest isn't responding within 60 seconds. This is an infrastructure/provider issue, not a code issue. Summary of changes made: 1. reflection.ts: - Added JUDGE_RESPONSE_TIMEOUT = 180_000 (3 min) and POLL_INTERVAL = 2_000 (2s) - Added waitForJudgeResponse() function that polls for judge completion - Changed client.session.prompt() to client.session.promptAsync() for judge calls - Changed feedback delivery to use promptAsync() as well - Added logging for debugging 2. test/e2e.test.ts: - Changed to use promptAsync() for sending tasks - Updated stability check to look for completed timestamp - Improved logging to show completion status The tests will pass when: - The configured model responds within the timeout - Or you use a faster/working model

- Add 10s cooldown after sending feedback before allowing another reflection - Configure E2E tests to use github-copilot/gpt-4o model (temp dirs need explicit model) - Track lastFeedbackTime to prevent re-judging immediately after agent responds This fixes the infinite loop where: judge → feedback → agent responds → session idles → judge again → infinite loop

- Task complete now shows toast notification only (no prompt()) - Task incomplete still sends feedback via prompt() to continue work - Updated AGENTS.md to document this critical design decision The bug: calling prompt() on complete tasks triggered agent response, which fired session.idle, causing reflection to run again infinitely.

- Add completedSessions on timeout/parse error/catch to stop retries - Move completedSessions.add() before async showToast() in complete path - Ensures concurrent session.idle events are blocked immediately

The in-memory Sets (createdByPlugin, judgeSessionIds) only work within a single process. When multiple plugin instances run (py/node servers), they don't share state. Now we detect judge sessions by checking for 'TASK VERIFICATION' in message content BEFORE attempting to judge. This works across processes. E2E results: - Before: 87 messages, 44 feedback loops - After: 11 messages, 1 feedback (legitimate incomplete)

- Removed all logging - Simplified state tracking to processedSessions and activeReflections - Cleaner code structure

CRITICAL FINDING: OpenCode loads plugins from ~/.config/opencode/plugin/, NOT from npm global installs. The npm install was being ignored. Changes: - Updated AGENTS.md with deployment instructions - Copied fixed plugin to ~/.config/opencode/plugin/reflection.ts The fixed plugin: - No console.log statements - Toast only on complete (prevents infinite loop) - prompt() only on incomplete

- Add tts.ts plugin that reads agent responses aloud using macOS say command - Clean markdown, code blocks, URLs from text before speaking - Truncate long messages (1000 char limit) - Skip judge/reflection sessions to avoid reading internal prompts - Track sessions to prevent duplicate speech - Add unit tests (15 tests) and manual test script - Update docs and package.json with new test commands

…nfig

…plugin hook

- Add Chatterbox as primary TTS engine (high-quality neural TTS) - Auto-install Chatterbox in virtualenv on first use - Support GPU (CUDA) and CPU device selection - Auto-detect GPU, fall back to OS TTS if no GPU (unless CPU forced) - Add configuration via ~/.config/opencode/tts.json - Support voice cloning, emotion control, and Turbo model - Automatic fallback to OS TTS (macOS say) when Chatterbox unavailable - Update tests for new engine configuration - Update README with Chatterbox setup instructions

- Remove all console.log/console.error statements - Fix caching bug that prevented Chatterbox from being used - Increase timeout to 5 minutes for CPU mode - Simplify availability check logic

…erence - Default to macOS Samantha voice (female) for better out-of-box experience - Add OS TTS voice/rate configuration options - Add Chatterbox server mode to keep model loaded between requests - Add Turbo model support for 10x faster inference - Add Apple Silicon (MPS) device support - Use Unix socket IPC for low-latency server communication - Update tests for new features

- Add lock file mechanism to prevent multiple server startups - Check if server is already running before starting new one - Run server detached so it survives across sessions - Save PID file for server tracking - Increase timeout to 120s for MPS/CPU model loading - Allow socket permissions for all users - Add graceful shutdown handling

- Add shared server feature to features list - Add MPS speed comparison row - Add server architecture diagram - Document server files and management commands - Update AGENTS.md with Chatterbox configuration and debugging info

The function was only checking for CUDA GPU or explicit CPU device, causing MPS (Apple Silicon) users to fall back to OS TTS even when chatterbox.device was set to 'mps' in config. Now returns true for mps/cpu devices explicitly, and only checks CUDA availability when cuda device is configured.

The embedded tts.py script was missing 'mps' in argparse choices and MPS fallback logic. This caused the script to fail when device='mps' was configured, falling back to OS TTS silently. Root cause: Code duplication between embedded scripts in tts.ts and the standalone files that get written to disk. Fixing standalone files doesn't persist because ensureChatterboxScript() overwrites them. Added tests to prevent regression: - Verify argparse accepts --device mps - Verify MPS fallback when unavailable - Verify auto-detection of MPS when CUDA unavailable - Verify consistency between one-shot and server scripts

…64 endpoint - Fix API endpoint mismatch: /transcribe -> /transcribe-base64 for opencode-manager compatibility - Update DEFAULT_SUPABASE_ANON_KEY to new token (expires 2081) - Add comprehensive Telegram test instructions to AGENTS.md - Add Quick Reference test sequence for all tests - Fix test/test-telegram-whisper.ts to use correct port (5552) and endpoint - Verified real voice transcription: 'It's ready to use, maybe.' from 1.6s audio Tests: typecheck (0 errors), unit (132), plugin-load (5), telegram-whisper (5/5)

- Add ReflectionConfig interface with customRules, taskPatterns, severityMapping - Load config from <project>/.opencode/reflection.json or ~/.config/opencode/reflection.json - Support query-based customization via task patterns with regex matching - Patterns can override task type detection (coding/research) and add extra rules - Add 15 new unit tests for findMatchingPattern, buildCustomRules, mergeConfig - Document all config options with examples in AGENTS.md

#41) * fix(telegram,tts): fix Whisper endpoint and switch to Coqui VCTK model - Fix Telegram voice transcription: change endpoint from /transcribe-base64 to /transcribe - Switch default TTS engine to Coqui with vctk_vits model (tts_models/en/vctk/vits) - Set default speaker to p226 (clear, professional British male voice) - Add vctk_vits model support to Coqui TTS scripts and server - Update AGENTS.md documentation with new TTS configuration * docs: add comprehensive TTS model documentation to README - Document all 6 Coqui TTS models with descriptions - Add configuration options table for each engine - Recommend vctk_vits with p226 speaker as default - Add Chatterbox and OS TTS configuration options * docs: add comprehensive VCTK speaker list and XTTS voice cloning info - List all 109 VCTK speakers with popular choices highlighted - Add speaker descriptions (gender, accent, characteristics) - Document XTTS v2 voice cloning with voiceRef option - List XTTS supported languages

Add a new reflection-static.ts plugin that uses a simpler approach: 1. Ask the agent a static self-assessment question when session idles 2. Use GenAI judge to analyze the agent's response 3. If agent confirms completion → toast notification, no feedback loop 4. If agent identifies improvements → push to continue Features: - Simple self-assessment question: "What was the task? Are you sure you completed it?" - GenAI-powered analysis of agent's self-assessment - Prevents infinite feedback loops by tracking confirmed completions - Tracks aborted sessions to skip reflection - E2E test that verifies plugin effectiveness (scored 5/5) New npm scripts: - test:reflection-static: Run E2E evaluation test - install:reflection-static: Deploy reflection-static instead of reflection.ts

- Add multiple abort detection layers (session.error, message.aborted) - Add delay before reflection to allow abort events to arrive - Check if last message was aborted/incomplete in runReflection - Remove mock evaluation fallback - require real Azure LLM - Use AZURE_OPENAI_DEPLOYMENT env var for eval model

- Change from Set to Map with timestamps for abort tracking - Add 10 second cooldown period after Esc press - Add type cast for error property to fix TypeScript error - Separate completed check from error check for clearer debugging - Match pattern from reflection.ts for consistent behavior

…ach (#43) * feat: add reflection-static plugin with simpler self-assessment approach Add a new reflection-static.ts plugin that uses a simpler approach: 1. Ask the agent a static self-assessment question when session idles 2. Use GenAI judge to analyze the agent's response 3. If agent confirms completion → toast notification, no feedback loop 4. If agent identifies improvements → push to continue Features: - Simple self-assessment question: "What was the task? Are you sure you completed it?" - GenAI-powered analysis of agent's self-assessment - Prevents infinite feedback loops by tracking confirmed completions - Tracks aborted sessions to skip reflection - E2E test that verifies plugin effectiveness (scored 5/5) New npm scripts: - test:reflection-static: Run E2E evaluation test - install:reflection-static: Deploy reflection-static instead of reflection.ts * fix: prevent reflection spam on Esc abort, use real Azure eval - Add multiple abort detection layers (session.error, message.aborted) - Add delay before reflection to allow abort events to arrive - Check if last message was aborted/incomplete in runReflection - Remove mock evaluation fallback - require real Azure LLM - Use AZURE_OPENAI_DEPLOYMENT env var for eval model * fix: use override:true for dotenv to ensure correct Azure credentials * fix: improve abort detection with cooldown-based tracking - Change from Set to Map with timestamps for abort tracking - Add 10 second cooldown period after Esc press - Add type cast for error property to fix TypeScript error - Separate completed check from error check for clearer debugging - Match pattern from reflection.ts for consistent behavior

- telegram.ts was incorrectly placed in lib/ subdirectory (not loaded as plugin) - Fix: deploy telegram.ts directly to ~/.config/opencode/plugin/ - Fix isSessionComplete to check completed timestamp (same as tts.ts) - Remove install:global, add individual install scripts per plugin - Update plugin-load.test.ts for new deployment pattern - Improve reflection-static.ts analysis prompt to be stricter about completion Fixes telegram notifications not being sent since commit d10a8f5

Telegram plugin fixes: - Changed plugin initialization to non-blocking (setTimeout instead of await) - Fixed Whisper endpoint from /transcribe to /transcribe-base64 send-notify function fix: - Fixed placeholder leak by using null bytes instead of underscores Test consolidation: - Deleted redundant test files (telegram-e2e-real.ts, telegram-forward-e2e.test.ts, test-telegram-whisper.ts) - Consolidated 17 real integration tests in test/telegram.test.ts - All tests use real Supabase (no mocks) Documentation updates: - Added warnings about pkill and deployment - Updated AGENTS.md with test requirements - Updated plan.md with status All tests pass: typecheck (0 errors), unit (130), plugin-load (5)

- Posts agent messages to associated GitHub issues as comments - Auto-detects issues from: URL in first message, .github-issue file, PR's closingIssuesReferences, branch name conventions - Configurable via ~/.config/opencode/github.json - Batches messages (5s interval) to avoid API rate limits - Optional: create new issue if none found - 18 unit tests for URL parsing, branch detection, message formatting

- Add github.ts to available plugins list - Document all configuration options with table format - Add .github-issue file format examples - Add branch name pattern documentation - Add debug logging instructions - Update deployment instructions to include github.ts - Update plan.md to mark all tasks complete

feat: GitHub issue plugin + telegram fixes

Added descriptions for new plugins and updated the README layout.

…t support (#46) * fix(reflection-static): allow recursive reflection and reset completion on new messages * fix(test): increase timeout for telegram send-notify test * fix(reflection-static): use message ID tracking instead of counting to handle compression * Fix install:global script and update github plugin config * feat(reflection-static): fix Plan Mode detection and add custom reflection.md support - Fix Plan Mode detection to check system/developer messages (not just user messages) - Add support for custom reflection prompt via ./reflection.md file - Falls back to default 4-question prompt if reflection.md not found - Fixes issue where reflection triggered in Plan Mode and interrupted agent workflow --------- Co-authored-by: engineer <engineer@opencode.ai>

Extended timeouts for tests that make real HTTP requests to Supabase: - stores text reply with correct session_id: 15s - routes replies to correct session: 15s - webhook handles malformed JSON: 10s - webhook handles missing message field: 10s These tests were occasionally failing due to network latency.

OpenCode's plugin loader treats ALL named exports as plugin functions. The _test_internal object export caused 'fn3 is not a function' errors because OpenCode tried to call the object as a plugin. Changes: - telegram.ts: Remove _test_internal named export (keeping only TelegramPlugin + default) - test/telegram-internal.test.ts: Skip tests that depended on internal exports - test/telegram.test.ts: Add 15s timeout to send-notify test - test/plugin-load.test.ts: Increase server timeout to 60s, add debug logging Root cause: OpenCode plugin loader at src/plugin/index.ts:89 iterates all exports and calls them as functions. Non-function exports cause TypeError. All 5 plugins now load successfully together.

…lopment - Replace 'opencode run' (single-shot) with 'opencode attach' (persistent TUI) - Create session via API before launching TUI - Send initial task to session via API - Add worktree_attach tool for resuming work on existing worktree - Enhance worktree_status with remote tracking info and active sessions - Enhance worktree_delete with branch deletion option and uncommitted warnings - Add configuration support via ~/.config/opencode/worktree.json

- Add isServerRunning() to check server health endpoint - Add startServer() to spawn opencode serve in background - Add ensureServer() wrapper that starts server if needed - Save server PID to ~/.config/opencode/worktree-server.pid - Update launchTerminal to be async and use ensureServer - Show 'Started OpenCode server automatically' message when server was started - Support serverPort config option (default: 4096) This improves UX by not requiring users to manually start the server.

… analysis Race condition: reflection sends self-assessment question, waits for response, then analyzes with GenAI judge. During this time (could be 30+ seconds), human may type a new message. Without this fix, reflection would still inject its 'Please continue...' prompt even though human already provided new instructions. Fix: After GenAI analysis completes, re-fetch messages and compare the current lastUserMsgId with the initial one captured at reflection start. If they differ, abort the reflection to avoid injecting stale prompts. This prevents confusing UX where reflection feedback appears after human already moved on to a new task.

…evaluation Apply the same race condition fix from reflection-static.ts to reflection.ts. Problem: The GenAI judge evaluation can take 30+ seconds. During this time, the human might type a new message. When the judge finishes, the plugin would inject feedback for the OLD task, which is stale and confusing. Solution: After waitForResponse() completes, re-fetch messages and compare currentUserMsgId with initialUserMsgId. If they differ, abort the feedback injection to avoid stale prompts. - Capture initialUserMsgId at start of runReflection() - After judge verdict is parsed, re-fetch messages - If currentUserMsgId != initialUserMsgId, abort and mark original as reflected - Add unit tests for the race condition scenarios

…re notifications - Write verdict signals in reflection-static for coordination - Telegram waits for verdict and skips reflection prompts - TTS requires verdict before speaking and records missing-verdict metrics

- Load model list from ~/.config/opencode/reflection.yaml - Try models in order and fall back on timeout/invalid JSON - Document config format

dzianisv added 30 commits December 26, 2025 23:45

Update README with image and enhanced description

a86677c

Added an image to the README and improved the description.

Update readme.md

29ef10d

Update readme.md

8e133a9

Update README.md

d992463

fix: improve feedback logging and ensure incomplete tasks receive gui…

a85130e

…dance

feat: always provide reflection feedback and send completion confirma…

ee5c276

…tions to chat

feat: replace console.log with OpenCode toast notifications for clean…

49a410a

… UI integration

docs: add detailed OpenCode API usage and technical implementation to…

bb8517e

… README

docs: add comprehensive installation, restart, and update instructions

c2eb1ba

fix: mark sessions completed BEFORE async ops to prevent race conditions

172c77d

- Add completedSessions on timeout/parse error/catch to stop retries - Move completedSessions.add() before async showToast() in complete path - Ensures concurrent session.idle events are blocked immediately

refactor: simplify plugin, remove all console.log

5263472

- Removed all logging - Simplified state tracking to processedSessions and activeReflections - Cleaner code structure

docs: update README with unified installation for all plugins

116035d

fix: skip reflection on user-cancelled sessions (Esc key)

be734e1

feat(tts): add /tts command to toggle voice on/off with persistent co…

615a6ff

…nfig

fix(tts): use shell script for /tts command toggle instead of broken …

c1efce3

…plugin hook

fix(tts): remove console.log, fix Chatterbox CPU mode caching issue

b6d6a81

- Remove all console.log/console.error statements - Fix caching bug that prevented Chatterbox from being used - Increase timeout to 5 minutes for CPU mode - Simplify availability check logic

docs: update README with OS TTS default, server mode, speed comparison

f24e12b

docs: add optional TTS config with MPS optimization for Apple Silicon

3d1ac3a

dzianisv and others added 26 commits January 30, 2026 12:30

feat: worktree delegation and global tts stop

50e622e

fix: use override:true for dotenv to ensure correct Azure credentials

618d227

Merge origin/main into feat/reflection-static-plugin

5c93411

Merge pull request #44 from dzianisv/feat/reflection-static-plugin

d6580f3

feat: GitHub issue plugin + telegram fixes

Update README with new plugin descriptions

5660c4a

Added descriptions for new plugins and updated the README layout.

feat(reflection-static): add judge model failover via reflection.yaml

eb6a6a8

- Load model list from ~/.config/opencode/reflection.yaml - Try models in order and fall back on timeout/invalid JSON - Document config format

fix(reflection): improve plan mode and route feedback

e58b10b

dzianisv force-pushed the feature/task-model-routing branch from 832d496 to e58b10b Compare February 11, 2026 05:49

fix(reflection-static): rely on mode metadata

b0a454a

dzianisv closed this Feb 11, 2026

dzianisv deleted the feature/task-model-routing branch February 11, 2026 07:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(reflection): improve plan mode and route feedback #52

fix(reflection): improve plan mode and route feedback #52

dzianisv commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix(reflection): improve plan mode and route feedback #52

fix(reflection): improve plan mode and route feedback #52

Conversation

dzianisv commented Feb 11, 2026

Summary

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant