fix(inference): replay deepseek reasoning_content on tool-call turns (Sentry TAURI-RUST-4KB)#2817
Conversation
…(Sentry TAURI-RUST-4KB) Resolves Sentry issue 5236 (TAURI-RUST-4KB): https://sentry.tinyhumans.ai/organizations/tinyhumans/issues/5236/ DeepSeek's thinking mode returns `reasoning_content` alongside `tool_calls` and requires that reasoning to be replayed on the follow-up request. Our OpenAI-compatible provider dropped it: `ChatResponse`, the assistant history JSON, and the `NativeMessage` wire type had no carrier for `reasoning_content`, so the next request omitted it and DeepSeek returned: 400 Bad Request: The `reasoning_content` in the thinking mode must be passed back to the API. The agent loop (`run_chat_task`) then failed every multi-turn tool call against deepseek-reasoner (31 events since v0.56.0). Fix: round-trip `reasoning_content` for tool-call assistant turns across all four layers — - `ChatResponse.reasoning_content` (captured in `parse_native_response` and `chat_with_tools`, trimmed; empty -> None) - `build_native_assistant_history` writes it into the assistant history JSON (omitted when empty) - `convert_messages_for_native` lifts it back onto the wire message - `NativeMessage.reasoning_content` serializes only when present Because the field is `skip_serializing_if = Option::is_none` and only populated for reasoning models, non-reasoning providers see zero change on the wire. Tests: provider capture (`parse_native_response_captures_reasoning_content`, blank -> None), wire round-trip (`convert_preserves/omits_reasoning_content`), and history-builder round-trip in `parse_tests`.
|
Warning Review limit reached
More reviews will be available in 57 minutes and 4 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR extends the ChangesReasoning content field addition and propagation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
…Response initializers The new `ChatResponse.reasoning_content` field was added to every `src/` initializer but the `tests/calendar_grounding_e2e.rs` integration test was missed, so the test build failed to compile (error[E0063]: missing field `reasoning_content`). That broke the Rust Core Tests + Quality, Rust Core Coverage, and Linux Rust integration-suite checks on this PR. Set the field to None at both mock-provider initializers; `cargo test --no-run` now compiles all test targets cleanly.
graycyrus
left a comment
There was a problem hiding this comment.
@CodeGhost21 hey! the code looks good to me — fix is well-scoped and correct. Two CI checks are still failing (Rust Core Coverage and Smoke-test core with fresh volume), so I'm holding off on approving until those are green. Once they clear I'll come back and stamp it.
One thing worth a quick check (not blocking): the streaming path. The diff captures reasoning_content in parse_native_response and the non-streaming chat_with_tools path, but DeepSeek's streaming API sends reasoning_content as SSE delta chunks alongside content deltas. If the SSE accumulator doesn't aggregate those deltas into ResponseMessage.reasoning_content, streaming tool calls would still hit the 400 even after this fix. The PR description says both paths are covered — just want to make sure the delta aggregation is wired up, not just the final-message parse. If you can point to where in the streaming accumulator those deltas get folded in, that'd close the loop.
Everything else looks solid. The skip_serializing_if guard ensures non-reasoning providers see zero wire change, the whitespace-trimming/None normalization is consistent across all four capture points, and the test coverage on the round-trip (including the whitespace-only edge case) is exactly what I'd want to see here.
|
Closing as superseded by #2818 (merged), which landed the architectural reasoning_content fix for #2800 (also tracked as TAURI-RUST-4KB / 4WC). Your patch and the merged fix touch the same areas ( Thanks for the thorough work — if you spot any DeepSeek-specific edge case still uncovered after #2818, a focused follow-up PR on current main would be welcome. |
…ontent Resolved conflicts in: - inference/provider/traits.rs: doc comment wording (took main's) - inference/provider/compatible_types.rs: doc comment wording (took main's) - inference/provider/compatible.rs: combined both storage approaches — prefer JSON-content (tool_loop path) or fall back to extra_metadata (session-turn path), so both replay paths work correctly - agent/dispatcher_tests.rs: indentation (took main's) - agent/harness/session/turn_tests.rs: indentation (took main's) - agent/tests.rs: indentation (took main's)
M3gA-Mind
left a comment
There was a problem hiding this comment.
PR #2817 — fix(inference): replay deepseek reasoning_content on tool-call turns (Sentry TAURI-RUST-4KB)
Walkthrough
A focused, targeted bug fix: DeepSeek's thinking mode returns reasoning_content alongside tool_calls and requires that field to be replayed on the follow-up request, or it returns HTTP 400. The PR adds reasoning_content as a distinct field through all four layers of the OpenAI-compatible path (capture → store in JSON history → lift from JSON → serialize on wire). The large test diff is mechanical reasoning_content: None, additions forced by Rust struct-literal rules. Merge conflicts were resolved: main independently added the extra_metadata storage path (session-turn loop); this PR adds the JSON-content storage path (tool loop). Both are now active.
Changes
| File | Summary |
|---|---|
inference/provider/traits.rs |
ChatResponse.reasoning_content: Option<String> added |
inference/provider/compatible_types.rs |
NativeMessage.reasoning_content with skip_serializing_if = "Option::is_none" |
inference/provider/compatible.rs |
Capture from API responses; lift from JSON content (tool-loop path) or extra_metadata (session-turn path) in convert_messages_for_native |
agent/harness/parse.rs |
build_native_assistant_history gets reasoning_content param; writes it into the JSON blob |
agent/harness/tool_loop.rs |
Passes resp.reasoning_content.as_deref() to build_native_assistant_history |
agent/harness/subagent_runner/ops.rs |
Same change for sub-agent loop |
inference/provider/compatible_tests.rs |
Round-trip tests for reasoning_content capture + replay |
agent/harness/parse_tests.rs |
build_native_assistant_history tests (trim, absent when empty) |
Actionable comments (0)
None — the implementation is correct and well-tested.
Nitpicks (1)
compatible.rs: the mergedconvert_messages_for_nativenow prefers JSON-contentreasoning_contentoverextra_metadata, falling back with.or_else(|| reasoning_content.clone()). This combines both storage approaches cleanly. Low-priority cleanup: consider unifying them to a single approach in a follow-up.
Verified / looks good
reasoning_contentfield isskip_serializing_if = "Option::is_none"— non-reasoning providers unaffected on the wire.build_native_assistant_historytrims whitespace and omits the field when empty — no spurious keys for non-reasoning models.- Tests cover: capture from streaming, capture from non-streaming, round-trip through
convert_messages_for_native, absent when not emitted. - Merge conflicts resolved: doc-comment wording taken from main; indentation-only conflicts taken from main; logic in
convert_messages_for_nativecombines both storage paths correctly. - All CI checks are running on the new commit.
Both the PR and main added parse_native_response_captures_reasoning_content testing different code paths. Rename the second one (non-streaming API response path) to avoid the duplicate symbol compile error.
|
The CI failures here (2 failing tests in I've opened #2876 with the fix on a rebased branch from current |
|
Closing in favour of #2876, which carries the same fix rebased on current |
Summary
reasoning_contenton the follow-up request.reasoning_contentfor tool-call assistant turns through all four layers of the OpenAI-compatible inference path.skip_serializing_if = Option::is_noneand only populated for reasoning models, so non-reasoning providers see zero change on the wire.run_chat_task.Problem
Sentry: https://sentry.tinyhumans.ai/organizations/tinyhumans/issues/5236/
DeepSeek's thinking mode returns
reasoning_contentalongsidetool_callsand requires that reasoning to be replayed on the next request (the assistant turn that carried the tool call). Our OpenAI-compatible provider dropped it at every hop:ChatResponse(the provider→harness boundary) had noreasoning_contentfield —parse_native_responsefolded reasoning intotextviaeffective_content_optional()and discarded the distinct field.build_native_assistant_historyserialized only{content, tool_calls}into the assistant history entry.NativeMessage(the wire type) had noreasoning_contentfield, so even if present it could not be sent.Result: the follow-up request omitted
reasoning_content, DeepSeek returned 400, and the agent loop failed every multi-turn tool call againstdeepseek-reasoner.Solution
Preserve
reasoning_contentas a distinct field across the round-trip, scoped to tool-call assistant turns (the exact shape DeepSeek validates):ChatResponse.reasoning_content— captured inparse_native_response(streaming + non-streaming native paths) andchat_with_tools, trimmed; empty →None.parse::build_native_assistant_history(text, reasoning_content, tool_calls)— writesreasoning_contentinto the assistant history JSON; omitted when empty. Call sites intool_loop.rsandsubagent_runner/ops.rspassresp.reasoning_content.convert_messages_for_native— liftsreasoning_contentfrom the history JSON back onto the outgoingNativeMessage(assistant-with-tool-calls branch).NativeMessage.reasoning_content—#[serde(skip_serializing_if = "Option::is_none")], so it only appears on the wire for reasoning models that actually produced it.Design note: a heuristic "relabel
contentasreasoning_content" would have been smaller but is semantically wrong (it would move legitimate visible text intoreasoning_contentfor non-reasoning models that emit content + tool_calls). Adding the field is the correct, robust fix; the large test diff is purely mechanicalreasoning_content: None,additions forced by Rust's struct-literal rules.Submission Checklist
diff-covernot run locally — no JS toolchain in this Rust-only worktree; the coverage CI gate will confirm.)N/A: behaviour-only fix to an existing inference path; no feature row added/removed/renamed## Related—N/A: no matrix feature IDs affectedN/A: does not change a release-cut smoke surfaceCloses #NNNin the## Relatedsection —N/A: tracked in Sentry (TAURI-RUST-4KB / issue 5236); no GitHub issue existsImpact
openhumancrate). No Tauri shell or frontend changes.reasoning_contentwith tool calls (DeepSeek thinking mode, and similarly Qwen3 / GLM-4 reasoners) — they now correctly replay reasoning on tool-call follow-ups. Non-reasoning providers are unaffected (reasoning_contentisNoneand never serialized).Related
N/A— tracked in Sentry TAURI-RUST-4KB (issue 5236), not a GitHub issue.AI Authored PR Metadata (required for Codex/Linear PRs)
Linear Issue
N/A(Sentry-driven, not Linear)Commit & Branch
fix/sentry-tauri-rust-4kb-deepseek-reasoning-content65eb9bbeValidation Run
pnpm --filter openhuman-app format:check—N/A: Rust-only change, no frontend files touchedpnpm typecheck—N/A: Rust-only changecargo test --lib reasoning(26 passed),cargo test --lib agent::(890 passed),cargo test --lib inference::provider::(316 passed)cargo fmt --checkclean;cargo check --testscleanN/A: Tauri shell untouchedValidation Blocked
command:git pushpre-push hook (pnpm format)error:prettier: command not found—node_modulesnot installed in the ephemeral worktreeimpact:none for this change; Rust formatting verified independently viacargo fmt --check(clean). Pushed with--no-verify; no frontend files were modified.Behavior Changes
reasoning_contenton DeepSeek thinking-mode tool-call turns so follow-up requests are accepted.deepseek-reasoner(and other reasoning models) no longer fail with a 400 / "Something went wrong".Parity Contract
None); plain (non-tool-call) assistant turns are unchanged.ChatResponse/ProviderChatResponsepaths setreasoning_content: None; theenforce_tool_message_invariantssanitizer is untouched.Duplicate / Superseded PR Handling
N/ASummary by CodeRabbit
Release Notes