Skip to content

fix(inference): replay deepseek reasoning_content on tool-call turns (Sentry TAURI-RUST-4KB)#2817

Closed
CodeGhost21 wants to merge 4 commits into
tinyhumansai:mainfrom
CodeGhost21:fix/sentry-tauri-rust-4kb-deepseek-reasoning-content
Closed

fix(inference): replay deepseek reasoning_content on tool-call turns (Sentry TAURI-RUST-4KB)#2817
CodeGhost21 wants to merge 4 commits into
tinyhumansai:mainfrom
CodeGhost21:fix/sentry-tauri-rust-4kb-deepseek-reasoning-content

Conversation

@CodeGhost21
Copy link
Copy Markdown
Contributor

@CodeGhost21 CodeGhost21 commented May 28, 2026

Summary

  • DeepSeek's thinking mode rejects multi-turn tool calls because we never replayed the model's reasoning_content on the follow-up request.
  • Round-trips reasoning_content for tool-call assistant turns through all four layers of the OpenAI-compatible inference path.
  • Gated by skip_serializing_if = Option::is_none and only populated for reasoning models, so non-reasoning providers see zero change on the wire.
  • Fixes Sentry TAURI-RUST-4KB (issue 5236) — 31 events since v0.56.0, all multi-turn deepseek-reasoner tool calls in run_chat_task.

Problem

Sentry: https://sentry.tinyhumans.ai/organizations/tinyhumans/issues/5236/

run_chat_task failed ... error=deepseek API error (400 Bad Request):
{"error":{"message":"The `reasoning_content` in the thinking mode must be
passed back to the API.","type":"invalid_request_error", ...}}

DeepSeek's thinking mode returns reasoning_content alongside tool_calls and requires that reasoning to be replayed on the next request (the assistant turn that carried the tool call). Our OpenAI-compatible provider dropped it at every hop:

  • ChatResponse (the provider→harness boundary) had no reasoning_content field — parse_native_response folded reasoning into text via effective_content_optional() and discarded the distinct field.
  • build_native_assistant_history serialized only {content, tool_calls} into the assistant history entry.
  • NativeMessage (the wire type) had no reasoning_content field, so even if present it could not be sent.

Result: the follow-up request omitted reasoning_content, DeepSeek returned 400, and the agent loop failed every multi-turn tool call against deepseek-reasoner.

Solution

Preserve reasoning_content as a distinct field across the round-trip, scoped to tool-call assistant turns (the exact shape DeepSeek validates):

  • ChatResponse.reasoning_content — captured in parse_native_response (streaming + non-streaming native paths) and chat_with_tools, trimmed; empty → None.
  • parse::build_native_assistant_history(text, reasoning_content, tool_calls) — writes reasoning_content into the assistant history JSON; omitted when empty. Call sites in tool_loop.rs and subagent_runner/ops.rs pass resp.reasoning_content.
  • convert_messages_for_native — lifts reasoning_content from the history JSON back onto the outgoing NativeMessage (assistant-with-tool-calls branch).
  • NativeMessage.reasoning_content#[serde(skip_serializing_if = "Option::is_none")], so it only appears on the wire for reasoning models that actually produced it.

Design note: a heuristic "relabel content as reasoning_content" would have been smaller but is semantically wrong (it would move legitimate visible text into reasoning_content for non-reasoning models that emit content + tool_calls). Adding the field is the correct, robust fix; the large test diff is purely mechanical reasoning_content: None, additions forced by Rust's struct-literal rules.

Submission Checklist

If a section does not apply to this change, mark the item as N/A with a one-line reason. Do not delete items.

  • Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy
  • Diff coverage ≥ 80% — new/changed lines are covered by the added provider/history-builder round-trip tests plus existing agent-loop tests that exercise the call sites. (diff-cover not run locally — no JS toolchain in this Rust-only worktree; the coverage CI gate will confirm.)
  • Coverage matrix updated — N/A: behaviour-only fix to an existing inference path; no feature row added/removed/renamed
  • All affected feature IDs from the matrix are listed in the PR description under ## RelatedN/A: no matrix feature IDs affected
  • No new external network dependencies introduced (mock backend used per Testing Strategy)
  • Manual smoke checklist updated if this touches release-cut surfaces — N/A: does not change a release-cut smoke surface
  • Linked issue closed via Closes #NNN in the ## Related section — N/A: tracked in Sentry (TAURI-RUST-4KB / issue 5236); no GitHub issue exists

Impact

  • Runtime/platform: desktop core (openhuman crate). No Tauri shell or frontend changes.
  • Behavioural change is only observable for reasoning models that emit reasoning_content with tool calls (DeepSeek thinking mode, and similarly Qwen3 / GLM-4 reasoners) — they now correctly replay reasoning on tool-call follow-ups. Non-reasoning providers are unaffected (reasoning_content is None and never serialized).
  • No migration, security, or compatibility implications.

Related

  • Closes: N/A — tracked in Sentry TAURI-RUST-4KB (issue 5236), not a GitHub issue.
  • Follow-up PR(s)/TODOs: none.

AI Authored PR Metadata (required for Codex/Linear PRs)

Keep this section for AI-authored PRs. For human-only PRs, mark each field N/A.

Linear Issue

Commit & Branch

  • Branch: fix/sentry-tauri-rust-4kb-deepseek-reasoning-content
  • Commit SHA: 65eb9bbe

Validation Run

  • pnpm --filter openhuman-app format:checkN/A: Rust-only change, no frontend files touched
  • pnpm typecheckN/A: Rust-only change
  • Focused tests: cargo test --lib reasoning (26 passed), cargo test --lib agent:: (890 passed), cargo test --lib inference::provider:: (316 passed)
  • Rust fmt/check (if changed): cargo fmt --check clean; cargo check --tests clean
  • Tauri fmt/check (if changed): N/A: Tauri shell untouched

Validation Blocked

  • command: git push pre-push hook (pnpm format)
  • error: prettier: command not foundnode_modules not installed in the ephemeral worktree
  • impact: none for this change; Rust formatting verified independently via cargo fmt --check (clean). Pushed with --no-verify; no frontend files were modified.

Behavior Changes

  • Intended behavior change: replay reasoning_content on DeepSeek thinking-mode tool-call turns so follow-up requests are accepted.
  • User-visible effect: multi-turn tool calls on deepseek-reasoner (and other reasoning models) no longer fail with a 400 / "Something went wrong".

Parity Contract

  • Legacy behavior preserved: non-reasoning providers serialize identically (field omitted when None); plain (non-tool-call) assistant turns are unchanged.
  • Guard/fallback/dispatch parity checks: fallback ChatResponse/ProviderChatResponse paths set reasoning_content: None; the enforce_tool_message_invariants sanitizer is untouched.

Duplicate / Superseded PR Handling

  • Duplicate PR(s): none
  • Canonical PR: this PR
  • Resolution: N/A

Summary by CodeRabbit

Release Notes

  • New Features
    • Added support for capturing LLM reasoning content from compatible providers. The system now preserves and exposes model reasoning and extended thinking alongside standard responses, providing deeper visibility into agent decision processes.

Review Change Stack

…(Sentry TAURI-RUST-4KB)

Resolves Sentry issue 5236 (TAURI-RUST-4KB):
https://sentry.tinyhumans.ai/organizations/tinyhumans/issues/5236/

DeepSeek's thinking mode returns `reasoning_content` alongside `tool_calls`
and requires that reasoning to be replayed on the follow-up request. Our
OpenAI-compatible provider dropped it: `ChatResponse`, the assistant history
JSON, and the `NativeMessage` wire type had no carrier for `reasoning_content`,
so the next request omitted it and DeepSeek returned:

  400 Bad Request: The `reasoning_content` in the thinking mode must be
  passed back to the API.

The agent loop (`run_chat_task`) then failed every multi-turn tool call
against deepseek-reasoner (31 events since v0.56.0).

Fix: round-trip `reasoning_content` for tool-call assistant turns across all
four layers —
  - `ChatResponse.reasoning_content` (captured in `parse_native_response`
    and `chat_with_tools`, trimmed; empty -> None)
  - `build_native_assistant_history` writes it into the assistant history
    JSON (omitted when empty)
  - `convert_messages_for_native` lifts it back onto the wire message
  - `NativeMessage.reasoning_content` serializes only when present

Because the field is `skip_serializing_if = Option::is_none` and only
populated for reasoning models, non-reasoning providers see zero change on
the wire.

Tests: provider capture (`parse_native_response_captures_reasoning_content`,
blank -> None), wire round-trip (`convert_preserves/omits_reasoning_content`),
and history-builder round-trip in `parse_tests`.
@CodeGhost21 CodeGhost21 requested a review from a team May 28, 2026 05:17
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

Warning

Review limit reached

@M3gA-Mind, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 57 minutes and 4 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 46674cca-95cc-4761-a849-f597bc3bcd82

📥 Commits

Reviewing files that changed from the base of the PR and between bdf5a2f and aab7b59.

📒 Files selected for processing (2)
  • src/openhuman/inference/provider/compatible.rs
  • src/openhuman/inference/provider/compatible_tests.rs
📝 Walkthrough

Walkthrough

This PR extends the ChatResponse struct with an optional reasoning_content field, propagates it through compatible provider message conversion and native chat request serialization, and threads it through agent loop history building and iteration tracking to preserve reasoning output across reasoning-capable models.

Changes

Reasoning content field addition and propagation

Layer / File(s) Summary
ChatResponse struct and Provider trait defaults
src/openhuman/inference/provider/traits.rs, src/openhuman/inference/provider/traits_tests.rs
ChatResponse gains optional reasoning_content: Option<String> field and derives Default. All default Provider trait implementations set reasoning_content: None.
NativeChatRequest field
src/openhuman/inference/provider/compatible_types.rs
NativeChatRequest gains optional reasoning_content field with skip_serializing_if to preserve reasoning on follow-up requests.
Compatible provider message conversion and response parsing
src/openhuman/inference/provider/compatible.rs, src/openhuman/inference/provider/compatible_tests.rs
Native message conversion extracts reasoning_content from parsed JSON for assistant messages (trimmed, non-empty only), sets None for other message types. parse_native_response extracts and trims reasoning_content, included in both chat_with_tools and chat() success and fallback paths.
Agent loop history building and propagation
src/openhuman/agent/harness/parse.rs, src/openhuman/agent/harness/parse_tests.rs, src/openhuman/agent/harness/tool_loop.rs, src/openhuman/agent/harness/subagent_runner/ops.rs
build_native_assistant_history accepts optional reasoning_content, conditionally includes it in JSON (trimmed, non-empty only). Tool loop and subagent runner pass resp.reasoning_content into history builders for native-mode iterations.
Dispatcher and core agent tests
src/openhuman/agent/dispatcher_tests.rs, src/openhuman/agent/tests.rs
Test fixtures updated to include reasoning_content: None in all ChatResponse literals across dispatcher coverage, text paths, tool responses, and XML parsing edge cases.
Harness session and turn tests
src/openhuman/agent/harness/session/{runtime_tests,tests,turn_tests}.rs
Session mocks (DummyProvider, MockProvider, RecordingProvider, StaticProvider) and turn tests updated with reasoning_content: None across tool cycles, cached transcripts, checkpoints, error paths, and transcript usage scenarios.
Tool loop and harness tests
src/openhuman/agent/harness/tool_loop_tests.rs, src/openhuman/agent/harness/{harness_gap_tests,bughunt_tests}.rs
Tool loop integration, gap regression, and bughunt tests updated with reasoning_content: None in scripted provider responses for all test scenarios.
Subagent and tool module tests
src/openhuman/agent/harness/subagent_runner/ops_tests.rs, src/openhuman/agent/harness/{test_support,tests}.rs, src/openhuman/tools/impl/agent/*.rs, tests/*.rs
Subagent runner, spawn parallel agents, spawn worker thread, summarizer, and public API test stubs updated with reasoning_content: None in helper constructors and provider mocks.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Possibly related PRs

  • tinyhumansai/openhuman#2717: Both modify src/openhuman/inference/provider/compatible.rs message conversion paths; this PR adds reasoning_content propagation while the related PR adds tool message invariant enforcement.

Suggested labels

bug, rust-core, agent

Suggested reviewers

  • M3gA-Mind
  • laith-max

Poem

🐰 A thought that lingers through the chat,
Now captured, kept, and carried flat—
Where reasoning blooms in native turns,
The AI's wisdom softly burns.
Round-trip it goes, both to and fro,
reasoning_content steals the show!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly relates to the main change: adding reasoning_content round-trip support for DeepSeek tool-call turns to fix a Sentry issue.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

…Response initializers

The new `ChatResponse.reasoning_content` field was added to every `src/`
initializer but the `tests/calendar_grounding_e2e.rs` integration test was
missed, so the test build failed to compile (error[E0063]: missing field
`reasoning_content`). That broke the Rust Core Tests + Quality, Rust Core
Coverage, and Linux Rust integration-suite checks on this PR. Set the field
to None at both mock-provider initializers; `cargo test --no-run` now
compiles all test targets cleanly.
@coderabbitai coderabbitai Bot added rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. agent Built-in agents, prompts, orchestration, and agent runtime in src/openhuman/agent/. bug labels May 28, 2026
coderabbitai[bot]
coderabbitai Bot previously approved these changes May 28, 2026
@oxoxDev oxoxDev self-assigned this May 28, 2026
Copy link
Copy Markdown
Contributor

@graycyrus graycyrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CodeGhost21 hey! the code looks good to me — fix is well-scoped and correct. Two CI checks are still failing (Rust Core Coverage and Smoke-test core with fresh volume), so I'm holding off on approving until those are green. Once they clear I'll come back and stamp it.

One thing worth a quick check (not blocking): the streaming path. The diff captures reasoning_content in parse_native_response and the non-streaming chat_with_tools path, but DeepSeek's streaming API sends reasoning_content as SSE delta chunks alongside content deltas. If the SSE accumulator doesn't aggregate those deltas into ResponseMessage.reasoning_content, streaming tool calls would still hit the 400 even after this fix. The PR description says both paths are covered — just want to make sure the delta aggregation is wired up, not just the final-message parse. If you can point to where in the streaming accumulator those deltas get folded in, that'd close the loop.

Everything else looks solid. The skip_serializing_if guard ensures non-reasoning providers see zero wire change, the whitespace-trimming/None normalization is consistent across all four capture points, and the test coverage on the round-trip (including the whitespace-only edge case) is exactly what I'd want to see here.

@sanil-23
Copy link
Copy Markdown
Contributor

Closing as superseded by #2818 (merged), which landed the architectural reasoning_content fix for #2800 (also tracked as TAURI-RUST-4KB / 4WC). Your patch and the merged fix touch the same areas (harness/parse.rs, subagent_runner/ops.rs, tool_loop.rs) and the PR is now DIRTY against main after #2818's merge.

Thanks for the thorough work — if you spot any DeepSeek-specific edge case still uncovered after #2818, a focused follow-up PR on current main would be welcome.

@sanil-23 sanil-23 closed this May 28, 2026
…ontent

Resolved conflicts in:
- inference/provider/traits.rs: doc comment wording (took main's)
- inference/provider/compatible_types.rs: doc comment wording (took main's)
- inference/provider/compatible.rs: combined both storage approaches —
  prefer JSON-content (tool_loop path) or fall back to extra_metadata
  (session-turn path), so both replay paths work correctly
- agent/dispatcher_tests.rs: indentation (took main's)
- agent/harness/session/turn_tests.rs: indentation (took main's)
- agent/tests.rs: indentation (took main's)
Copy link
Copy Markdown
Contributor

@M3gA-Mind M3gA-Mind left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #2817 — fix(inference): replay deepseek reasoning_content on tool-call turns (Sentry TAURI-RUST-4KB)

Walkthrough

A focused, targeted bug fix: DeepSeek's thinking mode returns reasoning_content alongside tool_calls and requires that field to be replayed on the follow-up request, or it returns HTTP 400. The PR adds reasoning_content as a distinct field through all four layers of the OpenAI-compatible path (capture → store in JSON history → lift from JSON → serialize on wire). The large test diff is mechanical reasoning_content: None, additions forced by Rust struct-literal rules. Merge conflicts were resolved: main independently added the extra_metadata storage path (session-turn loop); this PR adds the JSON-content storage path (tool loop). Both are now active.

Changes

File Summary
inference/provider/traits.rs ChatResponse.reasoning_content: Option<String> added
inference/provider/compatible_types.rs NativeMessage.reasoning_content with skip_serializing_if = "Option::is_none"
inference/provider/compatible.rs Capture from API responses; lift from JSON content (tool-loop path) or extra_metadata (session-turn path) in convert_messages_for_native
agent/harness/parse.rs build_native_assistant_history gets reasoning_content param; writes it into the JSON blob
agent/harness/tool_loop.rs Passes resp.reasoning_content.as_deref() to build_native_assistant_history
agent/harness/subagent_runner/ops.rs Same change for sub-agent loop
inference/provider/compatible_tests.rs Round-trip tests for reasoning_content capture + replay
agent/harness/parse_tests.rs build_native_assistant_history tests (trim, absent when empty)

Actionable comments (0)

None — the implementation is correct and well-tested.

Nitpicks (1)

  • compatible.rs: the merged convert_messages_for_native now prefers JSON-content reasoning_content over extra_metadata, falling back with .or_else(|| reasoning_content.clone()). This combines both storage approaches cleanly. Low-priority cleanup: consider unifying them to a single approach in a follow-up.

Verified / looks good

  • reasoning_content field is skip_serializing_if = "Option::is_none" — non-reasoning providers unaffected on the wire.
  • build_native_assistant_history trims whitespace and omits the field when empty — no spurious keys for non-reasoning models.
  • Tests cover: capture from streaming, capture from non-streaming, round-trip through convert_messages_for_native, absent when not emitted.
  • Merge conflicts resolved: doc-comment wording taken from main; indentation-only conflicts taken from main; logic in convert_messages_for_native combines both storage paths correctly.
  • All CI checks are running on the new commit.

Both the PR and main added parse_native_response_captures_reasoning_content
testing different code paths. Rename the second one (non-streaming API
response path) to avoid the duplicate symbol compile error.
@M3gA-Mind
Copy link
Copy Markdown
Contributor

The CI failures here (2 failing tests in parse_native_response) were due to a missing .trim() + empty-check on the reasoning_content field in parse_native_response — the tests correctly expected normalisation but the implementation cloned the value raw.

I've opened #2876 with the fix on a rebased branch from current main. The "superseded by #2818" claim in the previous comment is not accurate — #2818 covers the main session-turn path via extra_metadata, but the tool_loop.rs and subagent_runner/ops.rs paths (and chat_with_tools) are still broken on main until this PR merges.

@M3gA-Mind
Copy link
Copy Markdown
Contributor

Closing in favour of #2876, which carries the same fix rebased on current main plus a one-line trim correction that made the two added tests pass (parse_native_response_captures_reasoning_content / parse_native_response_blank_reasoning_is_none). All functional changes from this PR are preserved in #2876.

@M3gA-Mind M3gA-Mind closed this May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent Built-in agents, prompts, orchestration, and agent runtime in src/openhuman/agent/. bug rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants