Skip to content

Conversation

@dzianisv
Copy link
Owner

Summary

  • Skip reflection-static in Plan mode using metadata and expanded plan-only patterns, with new tests
  • Route reflection feedback retries via GenAI task classification to provider-specific models
  • Add whisper /transcribe-base64 alias and harden telegram webhook tests with timeouts

Testing

  • npm run typecheck
  • npm test
  • npm run test:load
  • OPENCODE_E2E=1 npm run test:e2e
  • npm run test:telegram
  • npx tsx test/test-telegram-whisper.ts

Added an image to the README and improved the description.
- promptAsync() + polling pattern
- Proper timeout constants (180s for judge)
- Logging for debugging
The issue is the default model xai/grok-3-mini-latest isn't responding within 60 seconds. This is an infrastructure/provider issue, not a code issue.
Summary of changes made:
1. reflection.ts:
   - Added JUDGE_RESPONSE_TIMEOUT = 180_000 (3 min) and POLL_INTERVAL = 2_000 (2s)
   - Added waitForJudgeResponse() function that polls for judge completion
   - Changed client.session.prompt() to client.session.promptAsync() for judge calls
   - Changed feedback delivery to use promptAsync() as well
   - Added logging for debugging
2. test/e2e.test.ts:
   - Changed to use promptAsync() for sending tasks
   - Updated stability check to look for completed timestamp
   - Improved logging to show completion status
The tests will pass when:
- The configured model responds within the timeout
- Or you use a faster/working model
- Add 10s cooldown after sending feedback before allowing another reflection
- Configure E2E tests to use github-copilot/gpt-4o model (temp dirs need explicit model)
- Track lastFeedbackTime to prevent re-judging immediately after agent responds

This fixes the infinite loop where: judge → feedback → agent responds →
session idles → judge again → infinite loop
- Task complete now shows toast notification only (no prompt())
- Task incomplete still sends feedback via prompt() to continue work
- Updated AGENTS.md to document this critical design decision

The bug: calling prompt() on complete tasks triggered agent response,
which fired session.idle, causing reflection to run again infinitely.
- Add completedSessions on timeout/parse error/catch to stop retries
- Move completedSessions.add() before async showToast() in complete path
- Ensures concurrent session.idle events are blocked immediately
The in-memory Sets (createdByPlugin, judgeSessionIds) only work within
a single process. When multiple plugin instances run (py/node servers),
they don't share state.

Now we detect judge sessions by checking for 'TASK VERIFICATION' in
message content BEFORE attempting to judge. This works across processes.

E2E results:
- Before: 87 messages, 44 feedback loops
- After: 11 messages, 1 feedback (legitimate incomplete)
- Removed all logging
- Simplified state tracking to processedSessions and activeReflections
- Cleaner code structure
CRITICAL FINDING: OpenCode loads plugins from ~/.config/opencode/plugin/,
NOT from npm global installs. The npm install was being ignored.

Changes:
- Updated AGENTS.md with deployment instructions
- Copied fixed plugin to ~/.config/opencode/plugin/reflection.ts

The fixed plugin:
- No console.log statements
- Toast only on complete (prevents infinite loop)
- prompt() only on incomplete
- Add tts.ts plugin that reads agent responses aloud using macOS say command
- Clean markdown, code blocks, URLs from text before speaking
- Truncate long messages (1000 char limit)
- Skip judge/reflection sessions to avoid reading internal prompts
- Track sessions to prevent duplicate speech
- Add unit tests (15 tests) and manual test script
- Update docs and package.json with new test commands
- Add Chatterbox as primary TTS engine (high-quality neural TTS)
- Auto-install Chatterbox in virtualenv on first use
- Support GPU (CUDA) and CPU device selection
- Auto-detect GPU, fall back to OS TTS if no GPU (unless CPU forced)
- Add configuration via ~/.config/opencode/tts.json
- Support voice cloning, emotion control, and Turbo model
- Automatic fallback to OS TTS (macOS say) when Chatterbox unavailable
- Update tests for new engine configuration
- Update README with Chatterbox setup instructions
- Remove all console.log/console.error statements
- Fix caching bug that prevented Chatterbox from being used
- Increase timeout to 5 minutes for CPU mode
- Simplify availability check logic
…erence

- Default to macOS Samantha voice (female) for better out-of-box experience
- Add OS TTS voice/rate configuration options
- Add Chatterbox server mode to keep model loaded between requests
- Add Turbo model support for 10x faster inference
- Add Apple Silicon (MPS) device support
- Use Unix socket IPC for low-latency server communication
- Update tests for new features
- Add lock file mechanism to prevent multiple server startups
- Check if server is already running before starting new one
- Run server detached so it survives across sessions
- Save PID file for server tracking
- Increase timeout to 120s for MPS/CPU model loading
- Allow socket permissions for all users
- Add graceful shutdown handling
- Add shared server feature to features list
- Add MPS speed comparison row
- Add server architecture diagram
- Document server files and management commands
- Update AGENTS.md with Chatterbox configuration and debugging info
The function was only checking for CUDA GPU or explicit CPU device,
causing MPS (Apple Silicon) users to fall back to OS TTS even when
chatterbox.device was set to 'mps' in config.

Now returns true for mps/cpu devices explicitly, and only checks
CUDA availability when cuda device is configured.
The embedded tts.py script was missing 'mps' in argparse choices and
MPS fallback logic. This caused the script to fail when device='mps'
was configured, falling back to OS TTS silently.

Root cause: Code duplication between embedded scripts in tts.ts and
the standalone files that get written to disk. Fixing standalone files
doesn't persist because ensureChatterboxScript() overwrites them.

Added tests to prevent regression:
- Verify argparse accepts --device mps
- Verify MPS fallback when unavailable
- Verify auto-detection of MPS when CUDA unavailable
- Verify consistency between one-shot and server scripts
dzianisv and others added 26 commits January 30, 2026 12:30
…64 endpoint

- Fix API endpoint mismatch: /transcribe -> /transcribe-base64 for opencode-manager compatibility
- Update DEFAULT_SUPABASE_ANON_KEY to new token (expires 2081)
- Add comprehensive Telegram test instructions to AGENTS.md
- Add Quick Reference test sequence for all tests
- Fix test/test-telegram-whisper.ts to use correct port (5552) and endpoint
- Verified real voice transcription: 'It's ready to use, maybe.' from 1.6s audio

Tests: typecheck (0 errors), unit (132), plugin-load (5), telegram-whisper (5/5)
- Add ReflectionConfig interface with customRules, taskPatterns, severityMapping
- Load config from <project>/.opencode/reflection.json or ~/.config/opencode/reflection.json
- Support query-based customization via task patterns with regex matching
- Patterns can override task type detection (coding/research) and add extra rules
- Add 15 new unit tests for findMatchingPattern, buildCustomRules, mergeConfig
- Document all config options with examples in AGENTS.md
#41)

* fix(telegram,tts): fix Whisper endpoint and switch to Coqui VCTK model

- Fix Telegram voice transcription: change endpoint from /transcribe-base64 to /transcribe
- Switch default TTS engine to Coqui with vctk_vits model (tts_models/en/vctk/vits)
- Set default speaker to p226 (clear, professional British male voice)
- Add vctk_vits model support to Coqui TTS scripts and server
- Update AGENTS.md documentation with new TTS configuration

* docs: add comprehensive TTS model documentation to README

- Document all 6 Coqui TTS models with descriptions
- Add configuration options table for each engine
- Recommend vctk_vits with p226 speaker as default
- Add Chatterbox and OS TTS configuration options

* docs: add comprehensive VCTK speaker list and XTTS voice cloning info

- List all 109 VCTK speakers with popular choices highlighted
- Add speaker descriptions (gender, accent, characteristics)
- Document XTTS v2 voice cloning with voiceRef option
- List XTTS supported languages
Add a new reflection-static.ts plugin that uses a simpler approach:
1. Ask the agent a static self-assessment question when session idles
2. Use GenAI judge to analyze the agent's response
3. If agent confirms completion → toast notification, no feedback loop
4. If agent identifies improvements → push to continue

Features:
- Simple self-assessment question: "What was the task? Are you sure you completed it?"
- GenAI-powered analysis of agent's self-assessment
- Prevents infinite feedback loops by tracking confirmed completions
- Tracks aborted sessions to skip reflection
- E2E test that verifies plugin effectiveness (scored 5/5)

New npm scripts:
- test:reflection-static: Run E2E evaluation test
- install:reflection-static: Deploy reflection-static instead of reflection.ts
- Add multiple abort detection layers (session.error, message.aborted)
- Add delay before reflection to allow abort events to arrive
- Check if last message was aborted/incomplete in runReflection
- Remove mock evaluation fallback - require real Azure LLM
- Use AZURE_OPENAI_DEPLOYMENT env var for eval model
- Change from Set to Map with timestamps for abort tracking
- Add 10 second cooldown period after Esc press
- Add type cast for error property to fix TypeScript error
- Separate completed check from error check for clearer debugging
- Match pattern from reflection.ts for consistent behavior
…ach (#43)

* feat: add reflection-static plugin with simpler self-assessment approach

Add a new reflection-static.ts plugin that uses a simpler approach:
1. Ask the agent a static self-assessment question when session idles
2. Use GenAI judge to analyze the agent's response
3. If agent confirms completion → toast notification, no feedback loop
4. If agent identifies improvements → push to continue

Features:
- Simple self-assessment question: "What was the task? Are you sure you completed it?"
- GenAI-powered analysis of agent's self-assessment
- Prevents infinite feedback loops by tracking confirmed completions
- Tracks aborted sessions to skip reflection
- E2E test that verifies plugin effectiveness (scored 5/5)

New npm scripts:
- test:reflection-static: Run E2E evaluation test
- install:reflection-static: Deploy reflection-static instead of reflection.ts

* fix: prevent reflection spam on Esc abort, use real Azure eval

- Add multiple abort detection layers (session.error, message.aborted)
- Add delay before reflection to allow abort events to arrive
- Check if last message was aborted/incomplete in runReflection
- Remove mock evaluation fallback - require real Azure LLM
- Use AZURE_OPENAI_DEPLOYMENT env var for eval model

* fix: use override:true for dotenv to ensure correct Azure credentials

* fix: improve abort detection with cooldown-based tracking

- Change from Set to Map with timestamps for abort tracking
- Add 10 second cooldown period after Esc press
- Add type cast for error property to fix TypeScript error
- Separate completed check from error check for clearer debugging
- Match pattern from reflection.ts for consistent behavior
- telegram.ts was incorrectly placed in lib/ subdirectory (not loaded as plugin)
- Fix: deploy telegram.ts directly to ~/.config/opencode/plugin/
- Fix isSessionComplete to check completed timestamp (same as tts.ts)
- Remove install:global, add individual install scripts per plugin
- Update plugin-load.test.ts for new deployment pattern
- Improve reflection-static.ts analysis prompt to be stricter about completion

Fixes telegram notifications not being sent since commit d10a8f5
Telegram plugin fixes:
- Changed plugin initialization to non-blocking (setTimeout instead of await)
- Fixed Whisper endpoint from /transcribe to /transcribe-base64

send-notify function fix:
- Fixed placeholder leak by using null bytes instead of underscores

Test consolidation:
- Deleted redundant test files (telegram-e2e-real.ts, telegram-forward-e2e.test.ts, test-telegram-whisper.ts)
- Consolidated 17 real integration tests in test/telegram.test.ts
- All tests use real Supabase (no mocks)

Documentation updates:
- Added warnings about pkill and deployment
- Updated AGENTS.md with test requirements
- Updated plan.md with status

All tests pass: typecheck (0 errors), unit (130), plugin-load (5)
- Posts agent messages to associated GitHub issues as comments
- Auto-detects issues from: URL in first message, .github-issue file,
  PR's closingIssuesReferences, branch name conventions
- Configurable via ~/.config/opencode/github.json
- Batches messages (5s interval) to avoid API rate limits
- Optional: create new issue if none found
- 18 unit tests for URL parsing, branch detection, message formatting
- Add github.ts to available plugins list
- Document all configuration options with table format
- Add .github-issue file format examples
- Add branch name pattern documentation
- Add debug logging instructions
- Update deployment instructions to include github.ts
- Update plan.md to mark all tasks complete
feat: GitHub issue plugin + telegram fixes
Added descriptions for new plugins and updated the README layout.
…t support (#46)

* fix(reflection-static): allow recursive reflection and reset completion on new messages

* fix(test): increase timeout for telegram send-notify test

* fix(reflection-static): use message ID tracking instead of counting to handle compression

* Fix install:global script and update github plugin config

* feat(reflection-static): fix Plan Mode detection and add custom reflection.md support

- Fix Plan Mode detection to check system/developer messages (not just user messages)
- Add support for custom reflection prompt via ./reflection.md file
- Falls back to default 4-question prompt if reflection.md not found
- Fixes issue where reflection triggered in Plan Mode and interrupted agent workflow

---------

Co-authored-by: engineer <engineer@opencode.ai>
Extended timeouts for tests that make real HTTP requests to Supabase:
- stores text reply with correct session_id: 15s
- routes replies to correct session: 15s
- webhook handles malformed JSON: 10s
- webhook handles missing message field: 10s

These tests were occasionally failing due to network latency.
OpenCode's plugin loader treats ALL named exports as plugin functions.
The _test_internal object export caused 'fn3 is not a function' errors
because OpenCode tried to call the object as a plugin.

Changes:
- telegram.ts: Remove _test_internal named export (keeping only TelegramPlugin + default)
- test/telegram-internal.test.ts: Skip tests that depended on internal exports
- test/telegram.test.ts: Add 15s timeout to send-notify test
- test/plugin-load.test.ts: Increase server timeout to 60s, add debug logging

Root cause: OpenCode plugin loader at src/plugin/index.ts:89 iterates all
exports and calls them as functions. Non-function exports cause TypeError.

All 5 plugins now load successfully together.
…lopment

- Replace 'opencode run' (single-shot) with 'opencode attach' (persistent TUI)
- Create session via API before launching TUI
- Send initial task to session via API
- Add worktree_attach tool for resuming work on existing worktree
- Enhance worktree_status with remote tracking info and active sessions
- Enhance worktree_delete with branch deletion option and uncommitted warnings
- Add configuration support via ~/.config/opencode/worktree.json
- Add isServerRunning() to check server health endpoint
- Add startServer() to spawn opencode serve in background
- Add ensureServer() wrapper that starts server if needed
- Save server PID to ~/.config/opencode/worktree-server.pid
- Update launchTerminal to be async and use ensureServer
- Show 'Started OpenCode server automatically' message when server was started
- Support serverPort config option (default: 4096)

This improves UX by not requiring users to manually start the server.
… analysis

Race condition: reflection sends self-assessment question, waits for response,
then analyzes with GenAI judge. During this time (could be 30+ seconds), human
may type a new message. Without this fix, reflection would still inject its
'Please continue...' prompt even though human already provided new instructions.

Fix: After GenAI analysis completes, re-fetch messages and compare the current
lastUserMsgId with the initial one captured at reflection start. If they differ,
abort the reflection to avoid injecting stale prompts.

This prevents confusing UX where reflection feedback appears after human already
moved on to a new task.
…evaluation

Apply the same race condition fix from reflection-static.ts to reflection.ts.

Problem: The GenAI judge evaluation can take 30+ seconds. During this time,
the human might type a new message. When the judge finishes, the plugin would
inject feedback for the OLD task, which is stale and confusing.

Solution: After waitForResponse() completes, re-fetch messages and compare
currentUserMsgId with initialUserMsgId. If they differ, abort the feedback
injection to avoid stale prompts.

- Capture initialUserMsgId at start of runReflection()
- After judge verdict is parsed, re-fetch messages
- If currentUserMsgId != initialUserMsgId, abort and mark original as reflected
- Add unit tests for the race condition scenarios
…re notifications

- Write verdict signals in reflection-static for coordination
- Telegram waits for verdict and skips reflection prompts
- TTS requires verdict before speaking and records missing-verdict metrics
- Load model list from ~/.config/opencode/reflection.yaml
- Try models in order and fall back on timeout/invalid JSON
- Document config format
@dzianisv dzianisv force-pushed the feature/task-model-routing branch from 832d496 to e58b10b Compare February 11, 2026 05:49
@dzianisv dzianisv closed this Feb 11, 2026
@dzianisv dzianisv deleted the feature/task-model-routing branch February 11, 2026 07:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant