Open
Conversation
…CBrlK test(openrouter): add comprehensive briefing-aggregator tests for Pha…
- Split /start and /help into separate messages - /start: friendly welcome explaining 7 capabilities (Chat, Vision, Tools, Images, Reasoning, JSON, Briefing) with quick-start tips - /help: full command reference with all 12 tools listed individually, grouped sections (Core, Costs, Briefing, Image Gen, Checkpoints, Models, Tools, Prefixes, Vision) - Add TEST_PROTOCOL.md: 39-step manual test checklist covering basics, model switching, all tool types, vision, JSON mode, reasoning, image gen, briefing, bug regressions, and session management - Update briefing-aggregator tests for new help message format https://claude.ai/code/session_01NbL359VJGJE4Xsg5tTVR8u
…CBrlK feat(telegram): rewrite /help and /start, add manual test protocol
…nt summary Model catalog cleanup: - Remove mimo (xiaomi/mimo-v2-flash:free) — free period ended Jan 2026 - Remove llama405free — deprecated, not in OpenRouter free collection - Remove nemofree (mistral-nemo:free) — no longer in free collection - Fix opus cost: $15/$75 → $5/$25 (actual OpenRouter price) - Fix qwenthink maxContext: 131072 → 262144 Checkpoint preview feature: - Add getCheckpointConversation() to storage — reads messages from R2 - /save <name> now generates an AI summary of the conversation content using /auto model, showing what was discussed and accomplished - Falls back gracefully to metadata-only if summary fails Update TEST_PROTOCOL.md with checkpoint summary test (#35) https://claude.ai/code/session_01NbL359VJGJE4Xsg5tTVR8u
…CBrlK fix(models): remove dead models, fix prices; feat(telegram): checkpoi…
…uter - Add xiaomi/mimo-v2-flash as paid model ($0.10/$0.30) - Add /syncmodels command to fetch free models from OpenRouter API at runtime - Dynamic models system: DYNAMIC_MODELS map with registerDynamicModels(), getAllModels(), getModel() that checks dynamic before static - R2 persistence for synced models (survives redeploys) - Auto-load dynamic models from R2 on handler init - Update /help with /syncmodels documentation https://claude.ai/code/session_01NbL359VJGJE4Xsg5tTVR8u
Rewrite /syncmodels from auto-add-all to an interactive Telegram inline keyboard picker: - Fetches free models from OpenRouter API - Shows new models (not in catalog) and stale models (no longer free) with context size, vision support, and model IDs - Toggle buttons (☐/☑) to select which models to add/remove - Validate button applies all selections at once - Cancel button discards without changes Supporting changes: - Add blocked models mechanism (BLOCKED_ALIASES set in models.ts) so stale models can be hidden at runtime via getModel()/getAllModels() - Add editMessageWithButtons to TelegramBot for updating message text + inline keyboard in a single API call - Update storage.ts to persist blocked list alongside dynamic models - Fix /pick button: mimo is now paid, not free https://claude.ai/code/session_01NbL359VJGJE4Xsg5tTVR8u
…CBrlK Claude/test briefing aggregator c brl k
When a free model hits 429/503 rate limits during a DO task, the processor now automatically rotates to the next free tool-supporting model and continues from the same iteration. Cycles through all free models (qwencoderfree, pony, trinitymini, devstral, gptoss, phi4reason) before giving up. Also fixes "No response generated" — when a model returns empty content after tool calls, the processor now nudges it up to 2 times with a follow-up message before accepting the empty result. Changes: - task-processor.ts: free model rotation on 429/503 errors, empty content retry with MAX_EMPTY_RETRIES=2, use task.modelAlias instead of request.modelAlias for rotation support - models.ts: add getFreeToolModels() helper - handler.ts: add /syncreset command to clean up stale auto-synced dynamic models from R2 https://claude.ai/code/session_01NbL359VJGJE4Xsg5tTVR8u
…CBrlK feat(task-processor): free model rotation + empty response retry
Cloudflare Workers are stateless — the in-memory syncSessions Map was lost between requests, making all toggle buttons non-functional. Now sync sessions are stored in R2 (saveSyncSession/loadSyncSession/ deleteSyncSession) so button callbacks work across Worker invocations. Also changed selectedAdd/selectedRemove from Set to string[] for JSON serialization compatibility. https://claude.ai/code/session_01NbL359VJGJE4Xsg5tTVR8u
…CBrlK fix(telegram): persist sync sessions in R2 instead of in-memory Map
Non-tool models (like auto-synced free models) were routed through the Worker's direct path which has a 10s timeout. Slow models like DeepSeek R1 would silently timeout with no response. Changes: - handler.ts: Always route through Durable Object when available, regardless of tool support. Worker fallback only when DO is not configured. - task-processor.ts: Conditionally inject tools based on model's supportsTools flag. Non-tool models go through DO but without tool definitions — they get unlimited time, checkpointing, and auto-resume for free. https://claude.ai/code/session_01NbL359VJGJE4Xsg5tTVR8u
The original deepseek/deepseek-r1:free endpoint was removed from
OpenRouter ("No endpoints found" error). Update to the newer
deepseek/deepseek-r1-0528:free which is still available.
https://claude.ai/code/session_01NbL359VJGJE4Xsg5tTVR8u
…CBrlK
fix(models): update deepfree to deepseek-r1-0528 (old endpoint dead)
The original deepseek/deepseek-r1:free endpoint was removed from
OpenRouter ("No endpoints found" error). Update to the newer
deepseek/deepseek-r1-0528:free which is still available.
https://claude.ai/code/session_01NbL359VJGJE4Xsg5tTVR8u
Free models cost nothing so they get 50 auto-resume attempts instead of 10, letting complex tasks grind through rate limits and timeouts. Paid models keep the 10x limit to avoid burning credits on stuck tasks. https://claude.ai/code/session_01NbL359VJGJE4Xsg5tTVR8u
…CBrlK feat(task-processor): dynamic auto-resume limits (50x free, 10x paid)
Add two new tools for code modification capabilities: 1. github_create_pr: Creates a branch, commits file changes (create/update/delete), and opens a PR using the GitHub Git Data API. Supports up to 20 files, 1MB total. Auto-prefixes branches with bot/ to avoid conflicts. Full input validation (owner/repo format, path traversal, branch names, content size). 2. sandbox_exec: Executes shell commands in a Cloudflare Sandbox container for complex refactors needing build/test. Runs commands sequentially with fail-fast behavior, configurable timeout (5-300s), and dangerous command blocking. Injects GitHub token as env vars for git/gh CLI auth. Also extends ToolContext with SandboxLike interface, wires sandbox through TelegramHandler, and updates /help and /status commands. Adds 30 new tests covering validation, API mocking, error handling, and edge cases. https://claude.ai/code/session_01E4joY3pFyYfTxVZegqe52P
feat(tools): add github_create_pr and sandbox_exec tools
Extract structured metadata (tools used, model, iterations, success/failure, category, duration) after each completed DO task and store in R2. Before new tasks, inject relevant past patterns into the system prompt to improve future tool selection and execution strategy. New: src/openrouter/learnings.ts — extraction, storage, retrieval New: src/openrouter/learnings.test.ts — 36 tests Modified: task-processor.ts — learning extraction on completion/failure Modified: handler.ts — learning injection into system prompt AI: Claude Opus 4.6 (Session: 018gmCDcuBJqs9ffrrDHHBBd) https://claude.ai/code/session_018gmCDcuBJqs9ffrrDHHBBd
Add gap tests identified in test protocol: - categorizeTask: tie-breaking, duplicates, all-github-tools - extractLearning: empty message, zero duration/iterations, auto-timestamp - storeLearning: write error propagation, updatedAt, key format per user - loadLearnings: R2 get() throw, key verification - getRelevantLearnings: null history, category mismatch, no-bonus-without-base, short word filtering, case insensitivity, combined scoring, partial vs exact - formatLearningsForPrompt: multi-tool display, leading newlines, duration boundaries (0s, 59999ms, 60000ms) AI: Claude Opus 4.6 (Session: 018gmCDcuBJqs9ffrrDHHBBd) https://claude.ai/code/session_018gmCDcuBJqs9ffrrDHHBBd
1. GLM supportsTools: add missing flag so glmfree uses tools instead of hallucinating (models.ts) 2. 402 error handling: fail fast on quota exceeded, rotate to free model if possible, show helpful message (task-processor.ts) 3. Cross-task context: store last task summary in R2, inject into next task's system prompt (expires after 1h) to prevent "I haven't seen your website" amnesia (learnings.ts, handler.ts) 4. Elapsed time cap: 15min for free models, 30min for paid, prevents runaway auto-resume loops (task-processor.ts) 5. Tool-intent detection: warn users when message needs tools but model doesn't support them, suggest alternatives (models.ts, handler.ts) 6. Parallel tool-call prompt: stronger instruction for models with parallelCalls flag to batch tool calls (handler.ts) Tests: 447 total (33 new — 22 models, 11 learnings) https://claude.ai/code/session_018gmCDcuBJqs9ffrrDHHBBd
Claude/extract task metadata 8l mcm
Auto-resume counter was persisting across different tasks because processTask() inherited autoResumeCount from any previous task in DO storage. Now only inherits when resuming the SAME task (matching taskId). Reverted supportsTools on glmfree — live testing confirmed GLM 4.5 Air free tier doesn't generate tool_calls (answers from training data with 0 unique tools). Paid GLM 4.7 still has tools enabled. https://claude.ai/code/session_018gmCDcuBJqs9ffrrDHHBBd
Claude/extract task metadata 8l mcm
Includes the complete system prompt reflecting all 14 tools, tool usage guidelines, and response style for Telegram. README explains R2 bucket structure and upload instructions. https://claude.ai/code/session_018gmCDcuBJqs9ffrrDHHBBd
docs(r2): add storia-orchestrator skill prompt for R2 bucket
…mmands - /start now shows inline keyboard with 8 feature categories (Coding, Research, Images, Tools, Vision, Reasoning, Pick Model, All Commands) - Each button sends a detailed guide for that feature with actionable examples and model recommendations - Back to Menu and Pick Model buttons for navigation - Added setMyCommands to TelegramBot class, registered 12 commands during /setup so Telegram shows the correct command menu - Enhanced R2 skill prompt with Storia identity, model recommendations, stronger tool-first behavior, and better response style guidelines https://claude.ai/code/session_018gmCDcuBJqs9ffrrDHHBBd
- GLOBAL_ROADMAP: add Model Sync section (MS.1-6), 2 changelog entries, update project overview - WORK_STATUS: add MS.1-6 tasks, update test count (1227), sprint velocity (57 tasks), strikethrough completed priorities - next_prompt: add MS.1-6 to Recently Completed, update timestamp https://claude.ai/code/session_01V82ZPEL4WPcLtvGC6szgt5
…queries Route simple queries (weather, greetings, crypto) to GPT-4o Mini for lower latency when user is on default 'auto' model. Explicit model choices via /use are never overridden. - routeByComplexity() in src/openrouter/model-router.ts - FAST_MODEL_CANDIDATES: mini > flash > haiku (ordered by cost) - autoRoute user preference (default: true, toggle via /autoroute) - Logging: [ModelRouter] on every routing decision - /status shows auto-route state - 15 new tests (1242 total) https://claude.ai/code/session_01V82ZPEL4WPcLtvGC6szgt5
- GLOBAL_ROADMAP: mark 7B.2 complete, add changelog entry, update dependency graph - WORK_STATUS: add 7B.2 task, update test count (1242), sprint velocity (58) - next_prompt: advance to 7B.3 Pre-fetching Context as next task https://claude.ai/code/session_01V82ZPEL4WPcLtvGC6szgt5
Claude/execute next prompt psd ex
Extract file paths from user messages and pre-fetch them from GitHub in parallel with the first LLM call. When the model calls github_read_file, the content is already in the prefetch cache. - extractFilePaths() regex extraction with false-positive filtering - extractGitHubContext() finds owner/repo from system prompt or message - startFilePrefetch() in task-processor fires GitHub reads in parallel - Prefetch cache checked in executeToolWithCache() for github_read_file - Export githubReadFile from tools.ts for direct pre-fetch use - 31 new tests (1273 total) https://claude.ai/code/session_01V82ZPEL4WPcLtvGC6szgt5
- GLOBAL_ROADMAP: mark 7B.3 complete, add changelog entry, update dependency graph - WORK_STATUS: add 7B.3 task, update test count (1273), sprint velocity (59) - next_prompt: advance to 7A.4 Structured Step Decomposition as next task https://claude.ai/code/session_01V82ZPEL4WPcLtvGC6szgt5
Claude/execute next prompt psd ex
…ith file pre-loading
Replace free-form plan phase prompt with STRUCTURED_PLAN_PROMPT that requests
JSON {steps: [{action, files, description}]}. parseStructuredPlan() uses 3-tier
parsing: code block → raw JSON → free-form file extraction fallback.
prefetchPlanFiles() pre-loads all referenced files at plan→work transition,
merging into existing prefetch cache. 26 new tests (1299 total).
https://claude.ai/code/session_01V82ZPEL4WPcLtvGC6szgt5
feat(quality): 7A.4 Structured Step Decomposition — JSON plan steps w…
…o context After plan→work transition, awaits all prefetch promises and injects [FILE: path] blocks directly into conversation context. Model sees files already loaded and skips github_read_file calls, reducing typical multi-file tasks from ~8 iterations to 3-4. - awaitAndFormatPrefetchedFiles() in step-decomposition.ts - Binary detection, 8KB/file truncation, 50KB total cap - Also injects user-message prefetch files (7B.3 fallback path) - 13 new tests (1312 total), typecheck clean https://claude.ai/code/session_01V82ZPEL4WPcLtvGC6szgt5
feat(perf): 7B.4 Reduce Iteration Count — inject pre-loaded files int…
…ith retry At work→review transition, scans tool results for unacknowledged mutation errors, test failures, missing PRs, and unverified claims. If failures found, injects details and gives model one retry iteration before proceeding to review phase. - shouldVerify() + verifyWorkPhase() in cove-verification.ts - Smart "0 failed" exclusion to avoid false positives - coveRetried flag limits to single retry - 24 new tests (1336 total), typecheck clean https://claude.ai/code/session_01V82ZPEL4WPcLtvGC6szgt5
feat(quality): 7A.1 CoVe Verification Loop — post-work verification w…
… handling - github_create_pr description now explains read-modify-write update workflow (read with github_read_file → modify → pass COMPLETE content with action "update") - github_read_file description mentions 50KB limit - LARGE_FILE_THRESHOLD raised: 300→500 lines, 15→30KB (tools support 50KB, previous thresholds were overly conservative for modern models) - Orchestra run prompt gets "How to Update Existing Files" section - Orchestra run prompt gets "Step 4.5: HANDLE PARTIAL FAILURES" section for logging blocked/partial tasks in WORK_LOG.md and ROADMAP.md - Orchestra redo prompt gets matching update workflow + failure handling - 12 new tests (1348 total), typecheck clean Fixes issues observed in real bot conversations where models incorrectly claimed they couldn't edit existing files or silently gave up on large files. https://claude.ai/code/session_01V82ZPEL4WPcLtvGC6szgt5
fix(orchestra+tools): improve tool descriptions + add partial failure…
…messages
Replace generic "Thinking..." with rich real-time progress updates in Telegram:
- formatProgressMessage() builds phase-aware strings with emoji labels:
📋 Planning, 🔨 Working, 🔍 Reviewing, 🔄 Verifying
- humanizeToolName() maps 16 tool names to readable labels
("github_read_file" → "Reading", "sandbox_exec" → "Running commands")
- extractToolContext() extracts display info from tool args
(file paths, URLs, commands, PR titles, search queries)
- estimateCurrentStep() shows plan step progress (step 2/5: Add JWT)
- shouldSendUpdate() throttle gate (15s interval)
- sendProgressUpdate() helper wired into task-processor iteration loop
- Both parallel and sequential tool execution paths update progress
- 44 new tests (1392 total), typecheck clean
Example progress messages:
⏳ 🔨 Reading: src/App.tsx (12s)
⏳ 🔨 Working (step 2/5: Add JWT validation) (iter 4, 6 tools, 35s)
⏳ 🔨 Running commands: npm test (48s)
⏳ 🔄 Verifying results… (1m30s)
https://claude.ai/code/session_01V82ZPEL4WPcLtvGC6szgt5
…streaming Add onToolCallReady callback to parseSSEStream that fires when a tool_call is complete during SSE streaming. createSpeculativeExecutor() starts PARALLEL_SAFE tools immediately while the model continues generating. Task-processor checks speculative cache before executing, reusing pre-computed results and saving 2-10s per multi-tool iteration. Detection: fires on new tool_call index (previous done) and on finish_reason='tool_calls' (all done). Safety: only PARALLEL_SAFE_TOOLS, max 5 speculative, 30s timeout. 19 new tests (1411 total). All Phase 7 (Performance & Quality Engine) now complete — 10/10 tasks. https://claude.ai/code/session_01V82ZPEL4WPcLtvGC6szgt5
Claude/execute next prompt psd ex
Routes the review phase to a different model than the worker for independent verification. A "fresh pair of eyes" catches hallucinated claims, incomplete answers, and unacknowledged tool errors that self-review misses. - New reviewer.ts: model selection (cross-family), context building, response parsing (approve/revise) - Reviewer candidates: Sonnet > Grok > Gemini Pro > Mini > Flash - Eligibility: mutation tools, 3+ tool calls, or 3+ iterations - Falls back to same-model review when no reviewer available or call fails - Progress shows reviewer model: "⏳ 🔍 Reviewing (sonnet)…" - Attribution footer: "🔍 Reviewed by Claude Sonnet 4.5" - 47 new tests (1458 total), typecheck clean https://claude.ai/code/session_01V82ZPEL4WPcLtvGC6szgt5
Claude/execute next prompt psd ex
Compares curated models against live OpenRouter catalog to detect: - Models removed from OpenRouter (deprecated upstream) - Pricing changes for existing curated models - New models from tracked families (anthropic, google, openai, etc.) Available via Telegram /synccheck and GET /api/admin/models/check. https://claude.ai/code/session_01K2mQTABDGY7DnnposPdDjw
After /syncall, auto-synced models get hyphenated aliases like
"claude-sonnet-46" but users try "sonnet46" or "claudesonnet".
getModel() only did exact key lookups, so these all failed.
Added fuzzy fallback with 4 passes:
1. Normalized exact (strip hyphens/dots)
2. Suffix match ("sonnet46" → "claude-sonnet-46")
3. Prefix match ("claudesonnet" → "claude-sonnet-46")
4. Model ID match ("gpt4o" → openai/gpt-4o)
Also stores canonical alias in /use handler so subsequent lookups
are always exact matches.
https://claude.ai/code/session_01K2mQTABDGY7DnnposPdDjw
…heck /models changes: - Add "AUTO-SYNCED HIGHLIGHTS" section showing top 2 flagship models per major provider (Anthropic, Google, OpenAI, etc.) - Filter value tier sections to curated-only (prevents 300+ models flooding the listing) - Sonnet 4.6, Opus 4.1, etc. now visible in /models /synccheck changes: - Group models by family with count, show top 4 per family (flagship first) - Collapse older/variant models into "+N older/variant" summary - Show auto-sync alias (→ /claude-sonnet-46) for each model - Add note that models are usable via /use after /syncall https://claude.ai/code/session_01K2mQTABDGY7DnnposPdDjw
…lock - Bump openclaw 2026.2.3 → 2026.2.6-3 in Dockerfile (upstream PR cloudflare#204) - Add redactWsPayload() to sanitize sensitive fields (api_key, token, auth, etc.) from WebSocket debug logs (upstream PR cloudflare#206) - Add container-level lock file to prevent concurrent R2 sync operations, with 5-min stale lock cleanup (upstream PRs cloudflare#199, cloudflare#202) - Add logging.test.ts for redaction utilities https://claude.ai/code/session_01K2mQTABDGY7DnnposPdDjw
MiniMax API rejects requests with reasoning disabled — error: "Reasoning is mandatory for this endpoint and cannot be disabled." Change from 'configurable' to 'fixed' so getReasoningParam() returns undefined (no reasoning param sent), letting MiniMax handle it natively. https://claude.ai/code/session_01K2mQTABDGY7DnnposPdDjw
…ated PR claims Models (especially Grok) can claim "PR #3 created successfully" when github_create_pr actually failed with guardrail violations. This adds three layers of protection: Fix 2: Tag github_create_pr errors with unmistakable ❌ PR NOT CREATED banner + "Do NOT claim a PR was created" instruction in tool result. Fix 3: validateOrchestraResult() cross-references parsed ORCHESTRA_RESULT against all tool outputs — if failure patterns found (Destructive update blocked, INCOMPLETE REFACTOR, DATA FABRICATION, etc.) with no matching success evidence, flags as phantom PR and clears the URL. Fix 1: Post-execution PR verification via GitHub API — after all parsing, if a PR URL survives, verify it actually exists (GET /repos/.../pulls/N). Non-fatal on network errors, but catches any edge case the other layers miss. https://claude.ai/code/session_01K2mQTABDGY7DnnposPdDjw
5 representative tasks testing each 7B optimization: - Task A: Simple chat → 7B.2 model routing (< 5s, fast model) - Task B: Multi-tool → 7B.1 speculative execution (< 20s, 2 tools/1 iter) - Task C: GitHub read → 7B.3+7B.4 prefetch+injection (< 30s, ≤ 3 iter) - Task D: Orchestra → all optimizations end-to-end (< 3min, ≤ 15 iter) - Task E: Reasoning → 7B.5 streaming feedback (first update < 3s) Includes pass/conditional/fail criteria and comparison notes. https://claude.ai/code/session_01K2mQTABDGY7DnnposPdDjw
…res conversation length Two bugs found during Phase 7B.6 benchmark: 1. extractUserQuestion() iterated forward and returned the FIRST user message. In multi-turn conversations the reviewer evaluated the assistant's answer against the wrong question (e.g. "capital of France" instead of "read README.md and summarize"). Fixed by iterating backwards. Also skips 7B.4 file-injection blocks. 2. Model routing used classifyTaskComplexity(msg, conversationLength) which gates on conversationLength >= 3 → 'complex', preventing simple messages from routing to fast models in longer conversations. Fixed by passing conversationLength=0 for routing decisions so only message content determines complexity. https://claude.ai/code/session_01K2mQTABDGY7DnnposPdDjw
…ol-limit fixes
Bug 1: startTime reset on every auto-resume — each processTask() call
created a new TaskState with startTime=Date.now(), so the elapsed time
cap (15min free / 30min paid) never triggered across resumes. Fix:
preserve startTime from the original task when resuming.
Bug 2: elapsed time cap only checked when task appears stuck — the
alarm handler returned early ("still active") before reaching the
elapsed check. Fix: move elapsed check before the "still active"
early return so it fires regardless of task activity.
Bug 3: no total tool call limit — a model could make unlimited tool
calls across its lifetime. Fix: add MAX_TOTAL_TOOLS_FREE=50 and
MAX_TOTAL_TOOLS_PAID=100 with a nudge message when exceeded.
Also adds defense-in-depth elapsed check in the main processTask loop.
These bugs caused a 2-file GitHub read to take 46 minutes with 8
auto-resumes and 29 tool calls instead of stopping at the time cap.
https://claude.ai/code/session_01K2mQTABDGY7DnnposPdDjw
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.