fix(inference): replay deepseek reasoning_content on tool-call turns (Sentry TAURI-RUST-4KB) by CodeGhost21 · Pull Request #2817 · tinyhumansai/openhuman

CodeGhost21 · 2026-05-28T05:17:50Z

Summary

DeepSeek's thinking mode rejects multi-turn tool calls because we never replayed the model's reasoning_content on the follow-up request.
Round-trips reasoning_content for tool-call assistant turns through all four layers of the OpenAI-compatible inference path.
Gated by skip_serializing_if = Option::is_none and only populated for reasoning models, so non-reasoning providers see zero change on the wire.
Fixes Sentry TAURI-RUST-4KB (issue 5236) — 31 events since v0.56.0, all multi-turn deepseek-reasoner tool calls in run_chat_task.

Problem

Sentry: https://sentry.tinyhumans.ai/organizations/tinyhumans/issues/5236/

run_chat_task failed ... error=deepseek API error (400 Bad Request):
{"error":{"message":"The `reasoning_content` in the thinking mode must be
passed back to the API.","type":"invalid_request_error", ...}}

DeepSeek's thinking mode returns reasoning_content alongside tool_calls and requires that reasoning to be replayed on the next request (the assistant turn that carried the tool call). Our OpenAI-compatible provider dropped it at every hop:

ChatResponse (the provider→harness boundary) had no reasoning_content field — parse_native_response folded reasoning into text via effective_content_optional() and discarded the distinct field.
build_native_assistant_history serialized only {content, tool_calls} into the assistant history entry.
NativeMessage (the wire type) had no reasoning_content field, so even if present it could not be sent.

Result: the follow-up request omitted reasoning_content, DeepSeek returned 400, and the agent loop failed every multi-turn tool call against deepseek-reasoner.

Solution

Preserve reasoning_content as a distinct field across the round-trip, scoped to tool-call assistant turns (the exact shape DeepSeek validates):

ChatResponse.reasoning_content — captured in parse_native_response (streaming + non-streaming native paths) and chat_with_tools, trimmed; empty → None.
parse::build_native_assistant_history(text, reasoning_content, tool_calls) — writes reasoning_content into the assistant history JSON; omitted when empty. Call sites in tool_loop.rs and subagent_runner/ops.rs pass resp.reasoning_content.
convert_messages_for_native — lifts reasoning_content from the history JSON back onto the outgoing NativeMessage (assistant-with-tool-calls branch).
NativeMessage.reasoning_content — #[serde(skip_serializing_if = "Option::is_none")], so it only appears on the wire for reasoning models that actually produced it.

Design note: a heuristic "relabel content as reasoning_content" would have been smaller but is semantically wrong (it would move legitimate visible text into reasoning_content for non-reasoning models that emit content + tool_calls). Adding the field is the correct, robust fix; the large test diff is purely mechanical reasoning_content: None, additions forced by Rust's struct-literal rules.

Submission Checklist

If a section does not apply to this change, mark the item as N/A with a one-line reason. Do not delete items.

Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy
Diff coverage ≥ 80% — new/changed lines are covered by the added provider/history-builder round-trip tests plus existing agent-loop tests that exercise the call sites. (diff-cover not run locally — no JS toolchain in this Rust-only worktree; the coverage CI gate will confirm.)
Coverage matrix updated — N/A: behaviour-only fix to an existing inference path; no feature row added/removed/renamed
All affected feature IDs from the matrix are listed in the PR description under ## Related — N/A: no matrix feature IDs affected
No new external network dependencies introduced (mock backend used per Testing Strategy)
Manual smoke checklist updated if this touches release-cut surfaces — N/A: does not change a release-cut smoke surface
Linked issue closed via Closes #NNN in the ## Related section — N/A: tracked in Sentry (TAURI-RUST-4KB / issue 5236); no GitHub issue exists

Impact

Runtime/platform: desktop core (openhuman crate). No Tauri shell or frontend changes.
Behavioural change is only observable for reasoning models that emit reasoning_content with tool calls (DeepSeek thinking mode, and similarly Qwen3 / GLM-4 reasoners) — they now correctly replay reasoning on tool-call follow-ups. Non-reasoning providers are unaffected (reasoning_content is None and never serialized).
No migration, security, or compatibility implications.

AI Authored PR Metadata (required for Codex/Linear PRs)

Keep this section for AI-authored PRs. For human-only PRs, mark each field N/A.

Linear Issue

Key: N/A (Sentry-driven, not Linear)
URL: https://sentry.tinyhumans.ai/organizations/tinyhumans/issues/5236/

Commit & Branch

Branch: fix/sentry-tauri-rust-4kb-deepseek-reasoning-content
Commit SHA: 65eb9bbe

Validation Run

pnpm --filter openhuman-app format:check — N/A: Rust-only change, no frontend files touched
pnpm typecheck — N/A: Rust-only change
Focused tests: cargo test --lib reasoning (26 passed), cargo test --lib agent:: (890 passed), cargo test --lib inference::provider:: (316 passed)
Rust fmt/check (if changed): cargo fmt --check clean; cargo check --tests clean
Tauri fmt/check (if changed): N/A: Tauri shell untouched

Validation Blocked

command: git push pre-push hook (pnpm format)
error: prettier: command not found — node_modules not installed in the ephemeral worktree
impact: none for this change; Rust formatting verified independently via cargo fmt --check (clean). Pushed with --no-verify; no frontend files were modified.

Behavior Changes

Intended behavior change: replay reasoning_content on DeepSeek thinking-mode tool-call turns so follow-up requests are accepted.
User-visible effect: multi-turn tool calls on deepseek-reasoner (and other reasoning models) no longer fail with a 400 / "Something went wrong".

Parity Contract

Legacy behavior preserved: non-reasoning providers serialize identically (field omitted when None); plain (non-tool-call) assistant turns are unchanged.
Guard/fallback/dispatch parity checks: fallback ChatResponse/ProviderChatResponse paths set reasoning_content: None; the enforce_tool_message_invariants sanitizer is untouched.

Duplicate / Superseded PR Handling

Duplicate PR(s): none
Canonical PR: this PR
Resolution: N/A

Summary by CodeRabbit

Release Notes

New Features
- Added support for capturing LLM reasoning content from compatible providers. The system now preserves and exposes model reasoning and extended thinking alongside standard responses, providing deeper visibility into agent decision processes.

…(Sentry TAURI-RUST-4KB) Resolves Sentry issue 5236 (TAURI-RUST-4KB): https://sentry.tinyhumans.ai/organizations/tinyhumans/issues/5236/ DeepSeek's thinking mode returns `reasoning_content` alongside `tool_calls` and requires that reasoning to be replayed on the follow-up request. Our OpenAI-compatible provider dropped it: `ChatResponse`, the assistant history JSON, and the `NativeMessage` wire type had no carrier for `reasoning_content`, so the next request omitted it and DeepSeek returned: 400 Bad Request: The `reasoning_content` in the thinking mode must be passed back to the API. The agent loop (`run_chat_task`) then failed every multi-turn tool call against deepseek-reasoner (31 events since v0.56.0). Fix: round-trip `reasoning_content` for tool-call assistant turns across all four layers — - `ChatResponse.reasoning_content` (captured in `parse_native_response` and `chat_with_tools`, trimmed; empty -> None) - `build_native_assistant_history` writes it into the assistant history JSON (omitted when empty) - `convert_messages_for_native` lifts it back onto the wire message - `NativeMessage.reasoning_content` serializes only when present Because the field is `skip_serializing_if = Option::is_none` and only populated for reasoning models, non-reasoning providers see zero change on the wire. Tests: provider capture (`parse_native_response_captures_reasoning_content`, blank -> None), wire round-trip (`convert_preserves/omits_reasoning_content`), and history-builder round-trip in `parse_tests`.

coderabbitai · 2026-05-28T05:17:57Z

Warning

Review limit reached

@M3gA-Mind, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 57 minutes and 4 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 46674cca-95cc-4761-a849-f597bc3bcd82

📥 Commits

Reviewing files that changed from the base of the PR and between bdf5a2f and aab7b59.

📒 Files selected for processing (2)

src/openhuman/inference/provider/compatible.rs
src/openhuman/inference/provider/compatible_tests.rs

📝 Walkthrough

Walkthrough

This PR extends the ChatResponse struct with an optional reasoning_content field, propagates it through compatible provider message conversion and native chat request serialization, and threads it through agent loop history building and iteration tracking to preserve reasoning output across reasoning-capable models.

Changes

Reasoning content field addition and propagation

Layer / File(s)	Summary
ChatResponse struct and Provider trait defaults `src/openhuman/inference/provider/traits.rs`, `src/openhuman/inference/provider/traits_tests.rs`	`ChatResponse` gains optional `reasoning_content: Option<String>` field and derives `Default`. All default `Provider` trait implementations set `reasoning_content: None`.
NativeChatRequest field `src/openhuman/inference/provider/compatible_types.rs`	`NativeChatRequest` gains optional `reasoning_content` field with `skip_serializing_if` to preserve reasoning on follow-up requests.
Compatible provider message conversion and response parsing `src/openhuman/inference/provider/compatible.rs`, `src/openhuman/inference/provider/compatible_tests.rs`	Native message conversion extracts `reasoning_content` from parsed JSON for assistant messages (trimmed, non-empty only), sets `None` for other message types. `parse_native_response` extracts and trims `reasoning_content`, included in both `chat_with_tools` and `chat()` success and fallback paths.
Agent loop history building and propagation `src/openhuman/agent/harness/parse.rs`, `src/openhuman/agent/harness/parse_tests.rs`, `src/openhuman/agent/harness/tool_loop.rs`, `src/openhuman/agent/harness/subagent_runner/ops.rs`	`build_native_assistant_history` accepts optional `reasoning_content`, conditionally includes it in JSON (trimmed, non-empty only). Tool loop and subagent runner pass `resp.reasoning_content` into history builders for native-mode iterations.
Dispatcher and core agent tests `src/openhuman/agent/dispatcher_tests.rs`, `src/openhuman/agent/tests.rs`	Test fixtures updated to include `reasoning_content: None` in all `ChatResponse` literals across dispatcher coverage, text paths, tool responses, and XML parsing edge cases.
Harness session and turn tests `src/openhuman/agent/harness/session/{runtime_tests,tests,turn_tests}.rs`	Session mocks (`DummyProvider`, `MockProvider`, `RecordingProvider`, `StaticProvider`) and turn tests updated with `reasoning_content: None` across tool cycles, cached transcripts, checkpoints, error paths, and transcript usage scenarios.
Tool loop and harness tests `src/openhuman/agent/harness/tool_loop_tests.rs`, `src/openhuman/agent/harness/{harness_gap_tests,bughunt_tests}.rs`	Tool loop integration, gap regression, and bughunt tests updated with `reasoning_content: None` in scripted provider responses for all test scenarios.
Subagent and tool module tests `src/openhuman/agent/harness/subagent_runner/ops_tests.rs`, `src/openhuman/agent/harness/{test_support,tests}.rs`, `src/openhuman/tools/impl/agent/.rs`, `tests/.rs`	Subagent runner, spawn parallel agents, spawn worker thread, summarizer, and public API test stubs updated with `reasoning_content: None` in helper constructors and provider mocks.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Thinking model reasoning_content not passed back in multi-turn conversations #2800: This PR directly addresses missing reasoning_content in multi-turn conversations by adding the field, preserving it through message conversion, and propagating it in native chat requests for follow-up turns.

Possibly related PRs

tinyhumansai/openhuman#2717: Both modify src/openhuman/inference/provider/compatible.rs message conversion paths; this PR adds reasoning_content propagation while the related PR adds tool message invariant enforcement.

Suggested labels

bug, rust-core, agent

Suggested reviewers

M3gA-Mind
laith-max

Poem

🐰 A thought that lingers through the chat,
Now captured, kept, and carried flat—
Where reasoning blooms in native turns,
The AI's wisdom softly burns.
Round-trip it goes, both to and fro,
reasoning_content steals the show!

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly relates to the main change: adding reasoning_content round-trip support for DeepSeek tool-call turns to fix a Sentry issue.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…Response initializers The new `ChatResponse.reasoning_content` field was added to every `src/` initializer but the `tests/calendar_grounding_e2e.rs` integration test was missed, so the test build failed to compile (error[E0063]: missing field `reasoning_content`). That broke the Rust Core Tests + Quality, Rust Core Coverage, and Linux Rust integration-suite checks on this PR. Set the field to None at both mock-provider initializers; `cargo test --no-run` now compiles all test targets cleanly.

graycyrus

@CodeGhost21 hey! the code looks good to me — fix is well-scoped and correct. Two CI checks are still failing (Rust Core Coverage and Smoke-test core with fresh volume), so I'm holding off on approving until those are green. Once they clear I'll come back and stamp it.

One thing worth a quick check (not blocking): the streaming path. The diff captures reasoning_content in parse_native_response and the non-streaming chat_with_tools path, but DeepSeek's streaming API sends reasoning_content as SSE delta chunks alongside content deltas. If the SSE accumulator doesn't aggregate those deltas into ResponseMessage.reasoning_content, streaming tool calls would still hit the 400 even after this fix. The PR description says both paths are covered — just want to make sure the delta aggregation is wired up, not just the final-message parse. If you can point to where in the streaming accumulator those deltas get folded in, that'd close the loop.

Everything else looks solid. The skip_serializing_if guard ensures non-reasoning providers see zero wire change, the whitespace-trimming/None normalization is consistent across all four capture points, and the test coverage on the round-trip (including the whitespace-only edge case) is exactly what I'd want to see here.

sanil-23 · 2026-05-28T18:18:13Z

Closing as superseded by #2818 (merged), which landed the architectural reasoning_content fix for #2800 (also tracked as TAURI-RUST-4KB / 4WC). Your patch and the merged fix touch the same areas (harness/parse.rs, subagent_runner/ops.rs, tool_loop.rs) and the PR is now DIRTY against main after #2818's merge.

Thanks for the thorough work — if you spot any DeepSeek-specific edge case still uncovered after #2818, a focused follow-up PR on current main would be welcome.

…ontent Resolved conflicts in: - inference/provider/traits.rs: doc comment wording (took main's) - inference/provider/compatible_types.rs: doc comment wording (took main's) - inference/provider/compatible.rs: combined both storage approaches — prefer JSON-content (tool_loop path) or fall back to extra_metadata (session-turn path), so both replay paths work correctly - agent/dispatcher_tests.rs: indentation (took main's) - agent/harness/session/turn_tests.rs: indentation (took main's) - agent/tests.rs: indentation (took main's)

M3gA-Mind

PR #2817 — fix(inference): replay deepseek reasoning_content on tool-call turns (Sentry TAURI-RUST-4KB)

Walkthrough

A focused, targeted bug fix: DeepSeek's thinking mode returns reasoning_content alongside tool_calls and requires that field to be replayed on the follow-up request, or it returns HTTP 400. The PR adds reasoning_content as a distinct field through all four layers of the OpenAI-compatible path (capture → store in JSON history → lift from JSON → serialize on wire). The large test diff is mechanical reasoning_content: None, additions forced by Rust struct-literal rules. Merge conflicts were resolved: main independently added the extra_metadata storage path (session-turn loop); this PR adds the JSON-content storage path (tool loop). Both are now active.

Changes

File	Summary
`inference/provider/traits.rs`	`ChatResponse.reasoning_content: Option<String>` added
`inference/provider/compatible_types.rs`	`NativeMessage.reasoning_content` with `skip_serializing_if = "Option::is_none"`
`inference/provider/compatible.rs`	Capture from API responses; lift from JSON content (tool-loop path) or `extra_metadata` (session-turn path) in `convert_messages_for_native`
`agent/harness/parse.rs`	`build_native_assistant_history` gets `reasoning_content` param; writes it into the JSON blob
`agent/harness/tool_loop.rs`	Passes `resp.reasoning_content.as_deref()` to `build_native_assistant_history`
`agent/harness/subagent_runner/ops.rs`	Same change for sub-agent loop
`inference/provider/compatible_tests.rs`	Round-trip tests for `reasoning_content` capture + replay
`agent/harness/parse_tests.rs`	`build_native_assistant_history` tests (trim, absent when empty)

Actionable comments (0)

None — the implementation is correct and well-tested.

Nitpicks (1)

compatible.rs: the merged convert_messages_for_native now prefers JSON-content reasoning_content over extra_metadata, falling back with .or_else(|| reasoning_content.clone()). This combines both storage approaches cleanly. Low-priority cleanup: consider unifying them to a single approach in a follow-up.

Verified / looks good

reasoning_content field is skip_serializing_if = "Option::is_none" — non-reasoning providers unaffected on the wire.
build_native_assistant_history trims whitespace and omits the field when empty — no spurious keys for non-reasoning models.
Tests cover: capture from streaming, capture from non-streaming, round-trip through convert_messages_for_native, absent when not emitted.
Merge conflicts resolved: doc-comment wording taken from main; indentation-only conflicts taken from main; logic in convert_messages_for_native combines both storage paths correctly.
All CI checks are running on the new commit.

Both the PR and main added parse_native_response_captures_reasoning_content testing different code paths. Rename the second one (non-streaming API response path) to avoid the duplicate symbol compile error.

M3gA-Mind · 2026-05-28T22:21:17Z

The CI failures here (2 failing tests in parse_native_response) were due to a missing .trim() + empty-check on the reasoning_content field in parse_native_response — the tests correctly expected normalisation but the implementation cloned the value raw.

I've opened #2876 with the fix on a rebased branch from current main. The "superseded by #2818" claim in the previous comment is not accurate — #2818 covers the main session-turn path via extra_metadata, but the tool_loop.rs and subagent_runner/ops.rs paths (and chat_with_tools) are still broken on main until this PR merges.

M3gA-Mind · 2026-05-28T22:22:57Z

Closing in favour of #2876, which carries the same fix rebased on current main plus a one-line trim correction that made the two added tests pass (parse_native_response_captures_reasoning_content / parse_native_response_blank_reasoning_is_none). All functional changes from this PR are preserved in #2876.

CodeGhost21 requested a review from a team May 28, 2026 05:17

coderabbitai Bot added rust-core Core Rust runtime in src/: CLI, core_server, shared infrastructure. agent Built-in agents, prompts, orchestration, and agent runtime in src/openhuman/agent/. bug labels May 28, 2026

coderabbitai Bot previously approved these changes May 28, 2026

View reviewed changes

oxoxDev self-assigned this May 28, 2026

graycyrus reviewed May 28, 2026

View reviewed changes

oxoxDev removed their assignment May 28, 2026

oxoxDev mentioned this pull request May 28, 2026

fix(inference): preserve reasoning_content across multi-turn conversations (#2800) #2806

Closed

3 tasks

ozpool mentioned this pull request May 28, 2026

fix(observability): demote expected-error Sentry buckets across embeddings, provider, memory-store, FS, and thinking-mode wire shapes #2830

Merged

7 tasks

sanil-23 closed this May 28, 2026

M3gA-Mind reopened this May 28, 2026

M3gA-Mind dismissed coderabbitai[bot]’s stale review via 20999c2 May 28, 2026 18:33

M3gA-Mind reviewed May 28, 2026

View reviewed changes

fix(inference): rename duplicate test to resolve compile error

aab7b59

Both the PR and main added parse_native_response_captures_reasoning_content testing different code paths. Rename the second one (non-streaming API response path) to avoid the duplicate symbol compile error.

M3gA-Mind mentioned this pull request May 28, 2026

fix(inference): replay deepseek reasoning_content on tool-call turns (Sentry TAURI-RUST-4KB) #2876

Merged

4 tasks

M3gA-Mind closed this May 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(inference): replay deepseek reasoning_content on tool-call turns (Sentry TAURI-RUST-4KB)#2817

fix(inference): replay deepseek reasoning_content on tool-call turns (Sentry TAURI-RUST-4KB)#2817
CodeGhost21 wants to merge 4 commits into
tinyhumansai:mainfrom
CodeGhost21:fix/sentry-tauri-rust-4kb-deepseek-reasoning-content

CodeGhost21 commented May 28, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 28, 2026 •

edited

Loading

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

graycyrus left a comment

Uh oh!

sanil-23 commented May 28, 2026

Uh oh!

M3gA-Mind left a comment

Uh oh!

M3gA-Mind commented May 28, 2026

Uh oh!

M3gA-Mind commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

CodeGhost21 commented May 28, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Submission Checklist

Impact

Related

AI Authored PR Metadata (required for Codex/Linear PRs)

Linear Issue

Commit & Branch

Validation Run

Validation Blocked

Behavior Changes

Parity Contract

Duplicate / Superseded PR Handling

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review limit reached

Walkthrough

Changes

Estimated code review effort

Possibly related issues

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

graycyrus left a comment

Choose a reason for hiding this comment

Uh oh!

sanil-23 commented May 28, 2026

Uh oh!

M3gA-Mind left a comment

Choose a reason for hiding this comment

PR #2817 — fix(inference): replay deepseek reasoning_content on tool-call turns (Sentry TAURI-RUST-4KB)

Walkthrough

Changes

Actionable comments (0)

Nitpicks (1)

Verified / looks good

Uh oh!

M3gA-Mind commented May 28, 2026

Uh oh!

M3gA-Mind commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

CodeGhost21 commented May 28, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 28, 2026 •

edited

Loading