fix(kosong): parse DeepSeek inline tool calls when a backend leaves them unstructured by mikkihugo · Pull Request #798 · MoonshotAI/kimi-code

mikkihugo · 2026-06-15T20:55:53Z

Problem

DeepSeek-architecture models (deepseek-v3/r1 and derivatives such as cogito) emit tool calls in a special-token format rather than as OpenAI tool_calls:

<|tool▁calls▁begin|><|tool▁call▁begin|>function<|tool▁sep|>NAME
```json
{ ...args... }
```<|tool▁call▁end|><|tool▁calls▁end|>

DeepSeek's own API parses this server-side and returns structured tool_calls. A lot of OpenAI-compatible deployments don't — self-hosted vLLM/SGLang/llama.cpp, ollama, and some proxy layers pass the raw tokens straight through into the assistant content. kosong then sees an ordinary text turn with no tool call, the agent has nothing to dispatch, and the turn dead-ends: the model effectively narrates the call instead of making it.

Where I ran into it

ollama-cloud serving cogito-2.1:671b (DeepSeek-MoE based). At temperature: 0 it leaks the tokens on essentially every tool turn — ollama's own parser only catches them some of the time. Any DeepSeek-format model behind a backend that doesn't structure tool calls hits the same wall.

Change

OpenAILegacyChatProvider now detects a leaked DeepSeek tool-call block and parses it into structured tool_calls — but only when the response carried no structured tool call and the begin-token is present. For any backend that already does the right thing (DeepSeek's API included) it's a no-op.

Both streaming and non-streaming paths.
Streaming is marker-aware: text is forwarded live up to the begin-token, then the block is suppressed so the raw tokens never reach the UI, with a one-token holdback so a begin-token split across two deltas is still caught.
Argument blocks that aren't valid JSON are skipped rather than emitted as a broken call.

The parser and the stream filter are a small standalone module (deepseek-inline-tool-calls.ts) so they're unit-testable on their own.

Tests

test/providers/deepseek-inline-tool-calls.test.ts covers single and parallel calls, the no-token no-op, invalid-JSON skipping, and the streaming filter (passthrough, suppression, split-marker detection). Typecheck and the kosong suite pass locally.

End to end against ollama-cloud/cogito, agentic tool loops that previously never completed now run reliably.

…enAI-compatible backends DeepSeek-architecture models (deepseek-v3/r1 and derivatives like cogito) emit tool calls as special tokens rather than OpenAI tool_calls. DeepSeek's own API structures these server-side, but many compatible deployments — self-hosted vLLM/SGLang/llama.cpp, ollama, some proxies — leak the raw <|tool_calls_begin|> tokens into the assistant content, so the agent sees no tool call and the turn dead-ends. The OpenAI chat-completions provider now parses those tokens into structured tool calls and strips them from visible text, but only when the backend returned no structured call — a no-op for providers that already do the right thing. Covers both streaming (marker-aware, so a split begin-token is still caught) and non-streaming responses.

changeset-bot · 2026-06-15T20:55:56Z

🦋 Changeset detected

Latest commit: 42d0a7a

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package

Name	Type
@moonshot-ai/kosong	Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a defensive fallback to detect and parse DeepSeek-style inline tool-call tokens when an OpenAI-compatible backend fails to return structured tool_calls, ensuring tool calls are still dispatched and raw tokens aren’t shown in user-visible text.

Changes:

Introduces a DeepSeek inline tool-call parser and streaming-safe content filter.
Updates the OpenAI legacy provider to strip leaked inline tool-call blocks from visible text and emit parsed tool calls when appropriate.
Adds Vitest coverage for the parser and filter behaviors.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
packages/kosong/src/providers/deepseek-inline-tool-calls.ts	Adds parser for DeepSeek inline tool-call token blocks plus a streaming filter to suppress tokens from UI text.
packages/kosong/src/providers/openai-legacy.ts	Uses the parser/filter as a fallback when `tool_calls` are missing, stripping inline tokens from text and yielding tool calls.
packages/kosong/test/providers/deepseek-inline-tool-calls.test.ts	Adds tests for parsing and filtering inline tool calls, including split-marker streaming behavior.
.changeset/deepseek-inline-tool-calls.md	Documents the patch release and rationale for the fallback behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    if (typeof message.content === 'string' && message.content.length > 0) {
+      const text =
+        inlineToolCalls.length > 0
+          ? message.content.slice(0, message.content.indexOf(DEEPSEEK_TOOL_CALLS_BEGIN))
+          : message.content;
+      if (text.length > 0) {
+        yield { type: 'text', text } satisfies StreamedMessagePart;
+      }
    }


+  flush(): string {
+    return this.suppressing ? '' : this.buffer;
+  }


chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b7c35de0b0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-15T20:59:29Z

+          const visible = inlineFilter.push(delta.content);
+          if (visible.length > 0) {
+            yield { type: 'text', text: visible } satisfies StreamedMessagePart;


Flush held-back text before yielding structured tool calls

For normal streamed OpenAI responses that already use delta.tool_calls, this filter still holds back up to DEEPSEEK_TOOL_CALLS_BEGIN.length - 1 characters. If the content preamble is shorter than that (for example the existing "Checking weather. " prefix is 18 chars while the holdback is 19), visible is empty here, the following tool_calls delta is yielded first, and the text is only emitted by the end-of-stream flush. That reorders onMessagePart/UI output for providers that are supposed to be a no-op for this fallback; flush or release the buffered text before yielding structured tool-call parts.

Useful? React with 👍 / 👎.

…potent - Non-stream path: strip the DeepSeek token block from visible text whenever the begin token is present, not only when a call parses — so a malformed emission never renders raw tokens to the user. - DeepSeekInlineToolCallFilter.flush() clears its buffer, so it is idempotent and a reused instance can't re-emit. - Tests for both, plus non-stream provider coverage (valid and malformed blocks).

mikkihugo · 2026-06-15T21:04:09Z

Thanks for the review — both addressed in 60eab4d:

Non-stream path now strips the token block based on the begin token's presence rather than a successful parse, so a malformed emission no longer renders raw tokens (it yields the clean prefix with no dispatched call).
DeepSeekInlineToolCallFilter.flush() clears its buffer, so it's idempotent and a reused instance can't re-emit.

Added tests for both, plus non-stream provider coverage for the valid and malformed cases.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 60eab4d849

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-15T21:06:26Z

+ */
+
+const SEP = '▁'; // ▁
+export const DEEPSEEK_TOOL_CALLS_BEGIN = `<|tool${SEP}calls${SEP}begin|>`;


Support the actual DeepSeek sentinels

For the DeepSeek/vLLM leak this fallback is meant to recover, raw content uses full-width sentinels like <｜tool▁call▁begin｜> and the reported vLLM repro can omit the outer calls-begin token entirely, starting at tool_call_begin (see vllm-project/vllm#21727). Because this constant is the ASCII <|...|> form and parseDeepSeekInlineToolCalls returns early unless it is present, both streaming and non-streaming paths still render those leaked calls as plain text and dispatch no tool call, so the targeted self-hosted DeepSeek scenario continues to dead-end.

Useful? React with 👍 / 👎.

…ering Bot reviewers (Codex + advreview panel) flagged: - Real DeepSeek/vLLM leaks use full-width sentinels (U+FF5C) and can omit the outer calls-begin token. Detection is now bar-agnostic and anchors on either the calls-begin or a per-call call-begin boundary, so the self-hosted vLLM/SGLang/llama.cpp case is actually recovered. - A short text preamble before a structured tool_calls delta could be reordered after the call: a structured call now releases held text first, in order. - releaseHoldback() is a no-op once suppression has begun, so a structured call arriving mid-block can't flip the filter to passthrough and leak the rest of the raw tokens. Covered by tests (full-width, omitted wrapper, reorder, mid-block guard).

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 42d0a7ab87

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-15T21:35:50Z

+
+  /** Feed a content delta; returns the text safe to yield now (possibly empty). */
+  push(delta: string): string {
+    this.full += delta;


Avoid retaining every normal streamed response

For ordinary OpenAI-compatible streams with no DeepSeek marker, this still appends every text delta to full, but content is only read at EOF when sawToolBlock is true. That makes the fallback not a no-op for long normal completions: the provider now keeps a second copy of the entire assistant text and repeatedly concatenates it for no benefit. Consider starting the parse accumulator only after a block boundary is found, seeding it with the held buffer at that point.

Useful? React with 👍 / 👎.

Copilot AI review requested due to automatic review settings June 15, 2026 20:55

Copilot AI reviewed Jun 15, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(kosong): parse DeepSeek inline tool calls when a backend leaves them unstructured#798

fix(kosong): parse DeepSeek inline tool calls when a backend leaves them unstructured#798
mikkihugo wants to merge 3 commits into
MoonshotAI:mainfrom
mikkihugo:fix/deepseek-inline-tool-parse

mikkihugo commented Jun 15, 2026

Uh oh!

changeset-bot Bot commented Jun 15, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 15, 2026

Uh oh!

mikkihugo commented Jun 15, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 15, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mikkihugo commented Jun 15, 2026

Problem

Where I ran into it

Change

Tests

Uh oh!

changeset-bot Bot commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

mikkihugo commented Jun 15, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

changeset-bot Bot commented Jun 15, 2026 •

edited

Loading