Skip to content

fix: self-healing agent, tool timeout, message ordering#7

Open
Blankll wants to merge 11 commits into
masterfrom
fix/agent-self-healing-and-timeout
Open

fix: self-healing agent, tool timeout, message ordering#7
Blankll wants to merge 11 commits into
masterfrom
fix/agent-self-healing-and-timeout

Conversation

@Blankll

@Blankll Blankll commented Jun 20, 2026

Copy link
Copy Markdown
Member

Summary

  • session_store: ORDER BY rowid ASC fixes UUID race within 1ms writes
  • build_llm_messages: strip orphan tool_calls at end, include unparseable tool JSON
  • loop: continue on empty prepared (don't exit on tool error)
  • loop: prepare_for_llm cancelable via tokio::select with cancel_rx
  • loop: 400 'insufficient tool messages' auto-repair + retry up to 3x
  • loop: Phase 3 tool storage isolated (one failure doesn't cascade)
  • loop: tool execution wrapped in tokio::time::timeout(30s)
  • loop: tool retry up to 3 attempts with 2s/5s backoff
  • loop: retry events emitted for UI progress indicators
  • loop: runaway-loop guard threshold 3→5, progress-aware reset

Blankll and others added 11 commits June 20, 2026 21:40
- session_store: ORDER BY rowid ASC fixes UUID race within 1ms writes
- build_llm_messages: strip orphan tool_calls at end of message list
- build_llm_messages: include unparseable tool messages as-is (no silent drops)
- loop: continue on empty prepared instead of returning (tool retry)
- loop: prepare_for_llm cancelable via tokio::select with cancel_rx
- loop: 400 'insufficient tool messages' auto-repair + retry up to 3x
- loop: Phase 3 tool storage isolated (one failure doesn't cascade)
- loop: tool execution wrapped in tokio::time::timeout(30s)
- loop: tool retry up to 3 attempts with 2s/5s backoff
- loop: retry events emitted for UI progress indicators
- loop: runaway-loop guard threshold 3→5, progress-aware reset
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Sets up cargo-llvm-cov for lcov generation on Linux, uploads to Codecov. Adds codecov.yml with 2% project / 5% patch thresholds and Rust flag mapping.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
…tfmt.toml)

The rustfmt.toml uses nightly-only options (imports_granularity, group_imports, trailing_comma, etc.). Stable cargo fmt ignores them and produces different formatting. Install both stable (clippy) and nightly (rustfmt) toolchains.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
dtolnay/rust-toolchain@nightly sets default to nightly, causing 'cargo clippy' to fail. Use 'cargo +stable clippy' to stay on stable.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
The project's rustfmt.toml uses nightly-only options. Bulk-format the entire codebase with nightly rustfmt so CI (now using cargo +nightly fmt) passes.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Remove all nightly-only rustfmt options (imports_granularity, group_imports, trailing_comma, trailing_semicolon, struct_field_align_threshold, enum_discrim_align_threshold, format_macro_matchers, normalize_comments, wrap_comments, comment_width). CI now uses single stable toolchain for both clippy and fmt.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
cargo llvm-cov --output-path coverage/lcov.info fails when coverage/ doesn't exist. Add mkdir -p before running it.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
The else if role == "assistant" branch was accidentally removed in bd34c5a, merging assistant-message handling into the tool block. This broke all tool-calling conversations: assistant messages with tool_calls were pushed as plain text (tool_calls not extracted, pending_tool_call_ids never populated) and tool responses were silently dropped.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
The parallel tool execution path called tool_executor.execute() directly with no timeout and no retry, while the sequential path had 30s timeout with 3 retry attempts. A hanging tool in a parallel batch would block indefinitely. Now both paths have consistent timeout and retry behavior, emitting agent-loop-tool-retry events for UI visibility.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Three minor fixes from code review:

- Update stale runaway guard comment and message from 3 to 5 iterations to match MAX_RUNAWAY_ITERATIONS.
- Persist 400 'insufficient tool messages' errors via inline_append before retry, giving the UI visibility and recording the failure in DB history.
- Emit agent-loop-tool-storage-error event when Phase 3 storage fails, so the UI can detect lost tool results instead of only logging to stderr.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant