Skip to content

fix(agent): replay reasoning_content across native tool-call turns to prevent DeepSeek 400s#2918

Merged
oxoxDev merged 4 commits into
tinyhumansai:mainfrom
YellowSnnowmann:fix/reasoning-content-replay
May 29, 2026
Merged

fix(agent): replay reasoning_content across native tool-call turns to prevent DeepSeek 400s#2918
oxoxDev merged 4 commits into
tinyhumansai:mainfrom
YellowSnnowmann:fix/reasoning-content-replay

Conversation

@YellowSnnowmann
Copy link
Copy Markdown
Contributor

@YellowSnnowmann YellowSnnowmann commented May 29, 2026

Summary

  • Persisted assistant reasoning_content on ConversationMessage::AssistantToolCalls so thinking-mode context survives tool-call turns.
  • Updated native dispatcher serialization to include reasoning_content in assistant tool-call payloads sent back to providers.
  • Patched agent session turn flow to capture and store non-empty reasoning_content when persisting assistant tool-call intents.
  • Added a targeted regression test to verify reasoning_content is preserved in native tool-call history replay.
  • Updated affected test fixtures/constructors for the new optional field to keep behavior backward-compatible.

Problem

  • Sentry issue TAURI-RUST-4K9 reports repeated DeepSeek 400 failures: The reasoning_content in the thinking mode must be passed back to the API.
  • In multi-turn/native tool-call flows, reasoning output could be dropped before follow-up provider calls, causing request rejection.
  • This resulted in high error volume and degraded reliability for thinking-capable providers.

Solution

  • Extended ConversationMessage::AssistantToolCalls with optional reasoning_content (serde-defaulted for compatibility with existing persisted transcripts).
  • In agent turn execution, when assistant returns tool calls, capture trimmed non-empty response.reasoning_content and persist it alongside tool-call history.
  • In NativeToolDispatcher::to_provider_messages, include reasoning_content in serialized assistant tool-call payload JSON so provider replay contract is satisfied.
  • Preserved existing TAURI-RUST-7 pairing safeguards (unpaired tool-call cycles are still dropped), and adapted regression testing to valid paired cycles.
  • Tradeoff: slightly larger serialized assistant tool-call payloads, but only when thinking content exists.

Submission Checklist

If a section does not apply to this change, mark the item as N/A with a one-line reason. Do not delete items.

  • Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy
  • Diff coverage ≥ 80% — changed lines (Vitest + cargo-llvm-cov merged via diff-cover) meet the gate enforced by .github/workflows/coverage.yml. Run pnpm test:coverage and pnpm test:rust locally; PRs below 80% on changed lines will not merge.
  • Coverage matrix updated — added/removed/renamed feature rows in docs/TEST-COVERAGE-MATRIX.md reflect this change (or N/A: behaviour-only change)
  • All affected feature IDs from the matrix are listed in the PR description under ## Related
  • No new external network dependencies introduced (mock backend used per Testing Strategy)
  • Manual smoke checklist updated if this touches release-cut surfaces (docs/RELEASE-MANUAL-SMOKE.md)
  • Linked issue closed via Closes #NNN in the ## Related section

Impact

  • Runtime/platform: Rust core + Tauri desktop inference path; affects native tool-calling providers in multi-turn sessions.
  • Compatibility: backward-compatible transcript/history schema change (reasoning_content is optional + serde default).
  • Performance: negligible overhead (small optional metadata field/payload extension).
  • Security: no new permissions, secrets, or network surfaces introduced.

Related

Summary by CodeRabbit

  • New Features

    • Conversation messages now optionally capture and propagate model "reasoning content" alongside tool-call results; this is included in outgoing payloads when present and omitted when absent.
    • Token estimation and transcript/history reconstruction include reasoning content when available.
  • Testing

    • Added and updated tests to verify serialization, preservation in history, omission when absent, and related handling.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 29, 2026

📝 Walkthrough

Walkthrough

Adds an optional reasoning_content: Option to AssistantToolCalls, captures it from provider responses, persists it in turns, serializes it into dispatcher-emitted provider JSON when present, updates token accounting, and adjusts tests/fixtures and pattern matches for compatibility.

Changes

Reasoning Content Propagation

Layer / File(s) Summary
Schema field definition
src/openhuman/inference/provider/traits.rs
ConversationMessage::AssistantToolCalls now includes optional reasoning_content: Option<String> with Serde attributes to skip when None and default when absent.
Response capture and turn persistence
src/openhuman/agent/harness/session/turn.rs
Extracts response.reasoning_content, trims it, and persists it on AssistantToolCalls when non-empty.
Provider serialization and regression tests
src/openhuman/agent/dispatcher.rs, src/openhuman/agent/dispatcher_tests.rs
NativeToolDispatcher::to_provider_messages serializes reasoning_content into the assistant message JSON payload. Adds tests that assert presence when set and omission when None; updates test helpers to set reasoning_content: None where appropriate.
Token budget accounting
src/openhuman/agent/harness/token_budget.rs
Token estimation for AssistantToolCalls now includes reasoning_content when present; corresponding test fixture updated.
Pattern matching and fixture compatibility
src/openhuman/agent/tests.rs, src/openhuman/agent/harness/session/runtime_tests.rs, src/openhuman/agent/harness/session/turn_tests.rs, src/openhuman/context/*, src/openhuman/context/summarizer.rs, src/openhuman/context/segment_recap_summarizer.rs
Updated fixtures to set reasoning_content: None where appropriate and adjusted match patterns to use .. or include the new field to keep existing logic unchanged.

Sequence Diagram

sequenceDiagram
  participant Provider
  participant Turn
  participant History
  participant Dispatcher
  Provider->>Turn: response with reasoning_content
  Turn->>History: construct AssistantToolCalls with reasoning_content
  History->>Dispatcher: serialize message for provider
  Dispatcher->>Provider: send JSON with reasoning_content field
Loading

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly Related PRs

Suggested Reviewers

  • M3gA-Mind

Poem

🐰 I nibble through the message stream,
Tuck the model's thinking in a seam,
Saved and sent when it is found,
Quiet reasoning, wrapped and sound. ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically describes the main change: persisting and replaying reasoning_content across native tool-call turns to fix a DeepSeek provider issue.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@YellowSnnowmann YellowSnnowmann marked this pull request as ready for review May 29, 2026 07:57
@YellowSnnowmann YellowSnnowmann requested a review from a team May 29, 2026 07:57
@coderabbitai coderabbitai Bot added rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. agent Built-in agents, prompts, orchestration, and agent runtime in src/openhuman/agent/. bug labels May 29, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/openhuman/agent/harness/token_budget.rs`:
- Around line 36-38: The AssistantToolCalls arm in token_budget.rs currently
ignores the new reasoning_content field; update the pattern matching for
ConversationMessage::AssistantToolCalls to capture reasoning_content and include
its token count when estimating tokens for that message in
trim_conversation_history_to_budget by calling
model_token_estimator.tokens_for_text(&reasoning_content) (or the project's
equivalent tokens_for_text method) and adding it to the existing text/tool_calls
token sum so large reasoning payloads are counted in the budget.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e24d24cf-19fb-40d4-8b6c-d8892c85a83c

📥 Commits

Reviewing files that changed from the base of the PR and between c7f4da7 and b396587.

📒 Files selected for processing (14)
  • src/openhuman/agent/dispatcher.rs
  • src/openhuman/agent/dispatcher_tests.rs
  • src/openhuman/agent/harness/session/runtime_tests.rs
  • src/openhuman/agent/harness/session/turn.rs
  • src/openhuman/agent/harness/session/turn_tests.rs
  • src/openhuman/agent/harness/token_budget.rs
  • src/openhuman/agent/tests.rs
  • src/openhuman/context/manager_tests.rs
  • src/openhuman/context/microcompact.rs
  • src/openhuman/context/pipeline.rs
  • src/openhuman/context/segment_recap_summarizer.rs
  • src/openhuman/context/summarizer.rs
  • src/openhuman/context/summarizer_tests.rs
  • src/openhuman/inference/provider/traits.rs

Comment thread src/openhuman/agent/harness/token_budget.rs
coderabbitai[bot]
coderabbitai Bot previously approved these changes May 29, 2026
Copy link
Copy Markdown
Contributor

@graycyrus graycyrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YellowSnnowmann this is a solid fix for the DeepSeek 400s — the approach is right and the backward compat via serde default is the correct call. CI is still pending so holding off on approving until those pass.

One thing to check in the meantime: in NativeToolDispatcher::to_provider_messages, reasoning_content is included directly in serde_json::json!, which means when it's None, the serialized assistant payload includes "reasoning_content": null rather than omitting the key entirely. For DeepSeek and other thinking-capable providers that's fine since the session had no reasoning to replay. But other providers routing through NativeToolDispatcher will now receive a field they've never seen before. Most JSON APIs tolerate unexpected null fields, but worth verifying against each provider's schema — or conditionally including the key only when reasoning_content is Some.

Fix CI and i'll approve.

Comment thread src/openhuman/agent/dispatcher.rs Outdated
…e reasoning_content in output and add test for omission
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/openhuman/agent/dispatcher_tests.rs (1)

436-452: 🏗️ Heavy lift

Split this test module to stay within the repo’s Rust file-size guideline.

This file is now ~545 lines, and this new test adds more surface to an already oversized module. Please extract related test groups (for example, reasoning-content replay/omission cases) into dedicated sibling test files to keep this module maintainable.

As per coding guidelines, "**/*.{ts,tsx,rs}: Keep React component files and Rust modules at ≤ ~500 lines; split growing modules into multiple files" and "{src,app/src,app/src-tauri}/**/*.{rs,ts,tsx}: File size: prefer ≤ ~500 lines per source file; split modules when growing to maintain readability and single responsibility".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/openhuman/agent/dispatcher_tests.rs` around lines 436 - 452, The test
module has grown over the file-size guideline; extract the reasoning-content
tests into a new sibling test file: create a new tests module (e.g., a
dispatcher_reasoning tests file) and move
native_dispatcher_omits_reasoning_content_when_absent and any paired
replay/omission tests there, keeping the exact test names
(native_dispatcher_omits_reasoning_content_when_absent) and references to
NativeToolDispatcher and to_provider_messages; ensure the new file imports or
re-exports required helpers (assistant_tool_calls, tool_results) and types
(NativeToolDispatcher) from the original module (make helpers pub(crate) or add
appropriate use crate::... paths) so the tests compile, and update module
declarations if needed so the test suite still runs.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/openhuman/agent/dispatcher_tests.rs`:
- Around line 436-452: The test module has grown over the file-size guideline;
extract the reasoning-content tests into a new sibling test file: create a new
tests module (e.g., a dispatcher_reasoning tests file) and move
native_dispatcher_omits_reasoning_content_when_absent and any paired
replay/omission tests there, keeping the exact test names
(native_dispatcher_omits_reasoning_content_when_absent) and references to
NativeToolDispatcher and to_provider_messages; ensure the new file imports or
re-exports required helpers (assistant_tool_calls, tool_results) and types
(NativeToolDispatcher) from the original module (make helpers pub(crate) or add
appropriate use crate::... paths) so the tests compile, and update module
declarations if needed so the test suite still runs.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1f773ed6-c38b-4f11-82c6-339e0e52885c

📥 Commits

Reviewing files that changed from the base of the PR and between e2aeba9 and 5c3079b.

📒 Files selected for processing (2)
  • src/openhuman/agent/dispatcher.rs
  • src/openhuman/agent/dispatcher_tests.rs
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/openhuman/agent/dispatcher.rs

Copy link
Copy Markdown
Contributor

@graycyrus graycyrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YellowSnnowmann hey! previous changes are addressed — the conditional reasoning_content insertion landed in 5c3079b exactly as requested. code looks good to me, but there are still some CI checks pending (Windows Rust tests, E2E suites, Tauri build). once those are green, i'll come back and approve this.

Copy link
Copy Markdown
Contributor

@oxoxDev oxoxDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Clean, correctly-scoped fix for TAURI-RUST-4K9 — reasoning_content is persisted on AssistantToolCalls and replayed so DeepSeek thinking-mode follow-ups stop 400ing.

Verified:

  • #[serde(default, skip_serializing_if = "Option::is_none")] keeps old transcripts deserializing (None) and never emits null — the exact DeepSeek-reject case @graycyrus flagged, fixed via the conditional payload["reasoning_content"] insert in commit 5c3079b.
  • turn.rs trims + drops empty reasoning before storing.
  • token_budget now counts reasoning tokens, so the budget estimate doesn't under-count.
  • Good coverage: serde roundtrip, serialize-present, omit-absent, token estimate.

Minor (optional, non-blocking)

reasoning_content is replayed for every past AssistantToolCalls in history, but DeepSeek only needs it on the most recent pre-tool-call turn. Long thinking-mode sessions will carry every turn's reasoning blob forward — it's budget-counted and microcompact/summarizer will trim it, so it's bounded, but if token cost shows up consider replaying only the latest turn's reasoning. Approving as-is.

@oxoxDev oxoxDev merged commit 9f3e161 into tinyhumansai:main May 29, 2026
33 checks passed
@oxoxDev
Copy link
Copy Markdown
Contributor

oxoxDev commented May 29, 2026

Merged ✅ — thanks @graycyrus for the close review here, especially catching the serde_json::json! null-serialization issue ("reasoning_content": null would have been sent to DeepSeek when absent). The switch to conditional insertion in 5c3079b resolved it cleanly.

CI was fully green and the change reviewed sane on a re-read — backward-compatible #[serde(default, skip_serializing_if = "Option::is_none")], reasoning trimmed/filtered before persist, and token-budget accounting updated — so we went ahead and merged. Nice fix for TAURI-RUST-4K9.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent Built-in agents, prompts, orchestration, and agent runtime in src/openhuman/agent/. bug rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants