From 5401c723070d0f0fb10c770c607b354c7d6c3329 Mon Sep 17 00:00:00 2001 From: Plaidmustache Date: Mon, 16 Feb 2026 00:33:37 +0100 Subject: [PATCH 1/3] feat(moonshot): add first-class kimi provider + routing headers Includes provider config/validation, anthropic-format handling, providers API exposure, and investigation summary doc. --- .env.example | 16 ++- ...alling-investigation-summary-2026-02-16.md | 131 ++++++++++++++++++ src/api/openai-router.js | 12 ++ src/api/providers-handler.js | 14 ++ src/clients/databricks.js | 64 ++++++--- src/clients/ollama-utils.js | 1 + src/config/index.js | 27 +++- src/orchestrator/index.js | 19 ++- 8 files changed, 261 insertions(+), 23 deletions(-) create mode 100644 docs/tool-calling-investigation-summary-2026-02-16.md diff --git a/.env.example b/.env.example index 8f9c917..10975c7 100644 --- a/.env.example +++ b/.env.example @@ -6,7 +6,7 @@ # ============================================================================== # Primary model provider to use -# Options: databricks, azure-anthropic, azure-openai, openrouter, openai, ollama, llamacpp, lmstudio, bedrock, zai, vertex +# Options: databricks, azure-anthropic, azure-openai, openrouter, openai, ollama, llamacpp, lmstudio, bedrock, zai, moonshot, vertex # Default: databricks MODEL_PROVIDER=ollama @@ -158,6 +158,14 @@ OLLAMA_MAX_TOOLS_FOR_ROUTING=3 # Options: GLM-4.7, GLM-4.5-Air, GLM-4-Plus # ZAI_MODEL=GLM-4.7 +# ============================================================================== +# Moonshot (Kimi) Configuration (Anthropic-compatible endpoint) +# ============================================================================== + +# MOONSHOT_API_KEY=your-moonshot-api-key +# MOONSHOT_ENDPOINT=https://api.moonshot.ai/anthropic/v1/messages +# MOONSHOT_MODEL=kimi-k2.5 + # ============================================================================== # Google Vertex AI Configuration (Gemini Models) # ============================================================================== @@ -361,6 +369,12 @@ HOT_RELOAD_DEBOUNCE_MS=1000 # ZAI_MODEL=GLM-4.7 # npm start +# Moonshot (Kimi): +# MODEL_PROVIDER=moonshot +# MOONSHOT_API_KEY=your-moonshot-api-key +# MOONSHOT_MODEL=kimi-k2.5 +# npm start + # Google Gemini (via Vertex AI): # MODEL_PROVIDER=vertex # VERTEX_API_KEY=your-google-api-key diff --git a/docs/tool-calling-investigation-summary-2026-02-16.md b/docs/tool-calling-investigation-summary-2026-02-16.md new file mode 100644 index 0000000..7a520a8 --- /dev/null +++ b/docs/tool-calling-investigation-summary-2026-02-16.md @@ -0,0 +1,131 @@ +# Lynkr Local Tool-Calling Investigation (Summary) + +Date: 2026-02-16 (local) +Workspace: `/Users/malone/Lynkr` +Current branch: `codex/ollama-qwen3-tool-ab` + +## 1. Why this doc exists +This is a compact record of what we tried, what actually worked, what failed, and why we decided to move forward with API models for now. + +## 2. Final decision +Use API-backed models for tool-reliable Lynkr workflows now. +Keep local Ollama tool-calling as an experiment track and revisit after upstream fixes. + +## 3. What we tested + +### A. Phase progression already completed +- Phase 0-5 completed and checkpointed. +- Phase 6 stage-1 side-by-side cutover was completed and documented. + +Relevant checkpoints: +- `4561b99` fix: prevent `/v1/messages` `dbPath` ReferenceError +- `09dc725` phase4: MLX-aware routing + tests +- `5a4ecce` phase5: launchd + health checks +- `b5e9ea4` phase6 stage-1 report +- `634c532` fix: MLX-aware routing headers behavior + +### B. Local-model strict tool-call probes (historical session results) +- `LIQUID_STRICT_SUMMARY pass=0 raw=0 fail=5 avg_ms=2976 tool_exec_sum=0 nonzero_steps=0` +- `QWEN3_STRICT_SUMMARY pass=0 raw=0 fail=5 avg_ms=3360 tool_exec_sum=0 nonzero_steps=0` +- `LLAMA31_STRICT_SUMMARY pass=0 raw=0 fail=5 avg_ms=16692 tool_exec_sum=0 nonzero_steps=0` +- Re-check on qwen3 again still failed strict probes. + +### C. Isolation tests that narrowed the fault domain +1. Direct Ollama API (`/api/chat`) with tools: +- `qwen3:1.7b`: returned tool calls +- `llama3.1:8b`: returned tool calls + +2. Lynkr OpenAI-compatible path (`/v1/chat/completions`) with tools: +- Can return `finish_reason: tool_calls` in some runs. +- Latest repro returned normal text refusal for provided tool, no execution loop. + +3. Lynkr Anthropic-compatible path (`/v1/messages`) with tools: +- Returns `200` but ended with: + - `stop_reason: end_turn` + - empty text content + - no `tool_use` block +- Routing headers show provider `ollama`. + +4. Lynkr runtime logs repeatedly show: +- tools injected: `injected 12 tools` +- `supportsTools: true` +- `toolCallsExecuted: 0` + +This strongly suggests the problem is in the Lynkr `/v1/messages` orchestration path with these local-model outputs, not simply “Ollama cannot do tools.” + +## 4. Code and branch context + +### Current working branch +- `codex/ollama-qwen3-tool-ab` + +### Uncommitted local change currently present +- `/Users/malone/Lynkr/src/clients/ollama-utils.js` + - added `"qwen3"` to tool-capable heuristic set. + +### Gemini snapshot branch (preserved separately) +Branch: `codex/liquid-gemini-snapshot` +Commits: +- `95ec106` (Liquid-specific tool instructions) +- `e7df511` (Liquid tool-call parser work in `openrouter-utils`) +- `b28826a` (MCP HTTP transport + Liquid handling changes) + +Rollback safety: +- `codex/rollback-pre-gemini` points to `634c532` (pre-gemini baseline we resumed from). + +## 5. Why this is hard (non-hand-wavy) +Tool execution here depends on three layers aligning: +1. Model emits parseable tool call objects. +2. Provider/runtime preserves tool call structure. +3. Lynkr `/v1/messages` loop recognizes + executes + re-injects correctly. + +In our runs, layer (1) and sometimes (2) worked in isolation, but layer (3) was unreliable for local models in the Anthropic-compatible loop. + +## 6. External signals (checked 2026-02-15/16) + +### Lynkr repo +- Open PR: [#39 Improve tool calling response handling](https://github.com/Fast-Editor/Lynkr/pull/39) +- Merged PRs we referenced: + - [#31 Stop wasteful tool injection in Ollama](https://github.com/Fast-Editor/Lynkr/pull/31) + - [#42 SUGGESTION_MODE_MODEL](https://github.com/Fast-Editor/Lynkr/pull/42) +- Related split/umbrella context: + - [#45 Improve tool calling response handling](https://github.com/Fast-Editor/Lynkr/pull/45) + +### Ollama repo (related edge-case reports) +- [#10976 Thinking + tools + qwen3 empty output](https://github.com/ollama/ollama/issues/10976) +- [#11381 Qwen3 function-call / think behavior issue](https://github.com/ollama/ollama/issues/11381) +- [#9802 tool-call/template handling issue](https://github.com/ollama/ollama/issues/9802) +- Official baseline behavior: [Ollama tool-calling docs](https://ollama.com/blog/tool-support) + +### Liquid docs +- Tool-use format behavior and structured output guidance: + - [Liquid tool-use docs](https://docs.liquid.ai/docs/inference/features/tool-use) +- Ollama compatibility note: + - [Liquid Ollama docs](https://docs.liquid.ai/docs/inference/ollama) + - Includes note referencing merged Ollama support PR for LFM2, with LFM2.1 support pending. + +## 7. Lynkr docs evidence we used +- `/Users/malone/Lynkr/documentation/tools.md:22` (tool flow assumptions) +- `/Users/malone/Lynkr/documentation/providers.md:421` (recommended tool models) +- `/Users/malone/Lynkr/documentation/providers.md:867` (provider comparison: Ollama tool-calling = Fair) +- `/Users/malone/Lynkr/documentation/providers.md:688` (MLX section and recommended model families) +- `/Users/malone/Lynkr/documentation/installation.md:363` (recommended Ollama models; smaller variants struggle) + +## 8. Re-entry checklist (when revisiting local tools) +1. Keep this branch frozen as baseline. +2. Re-test strict probe after Lynkr PR #39 (or equivalent) lands. +3. Re-test with one known strong local tool model only (avoid broad matrix first). +4. Validate on `/v1/messages` specifically (not only `/v1/chat/completions`). +5. Only promote after observed `toolCallsExecuted > 0` in Lynkr logs for repeated requests. + +## 9. Immediate move-forward implementation (Moonshot/Kimi) +Applied on branch `codex/ollama-qwen3-tool-ab`: +1. Added first-class `moonshot` provider support in config/dispatch/provider discovery. +2. Switched runtime to: + - `MODEL_PROVIDER=moonshot` + - `MOONSHOT_ENDPOINT=https://api.moonshot.ai/anthropic/v1/messages` + - `MOONSHOT_MODEL=kimi-k2.5` +3. Fixed misleading response headers to return actual provider (`moonshot`) from orchestrator. +4. Smoke results: + - `/health/live` and `/v1/health`: `200` + - `/v1/messages` simple prompt: model `kimi-k2.5`, text `READY` + - `/v1/messages` tool prompt: `stop_reason=tool_use`, tool name `Read` diff --git a/src/api/openai-router.js b/src/api/openai-router.js index ab283e7..86d2239 100644 --- a/src/api/openai-router.js +++ b/src/api/openai-router.js @@ -767,6 +767,18 @@ function getConfiguredProviders() { }); } + // Check Moonshot (Kimi) + if (config.moonshot?.apiKey) { + providers.push({ + name: "moonshot", + type: "moonshot-ai", + models: [ + config.moonshot.model || "kimi-k2.5", + "kimi-k2.5" + ] + }); + } + // Check Vertex AI (Google Cloud) if (config.vertex?.projectId) { providers.push({ diff --git a/src/api/providers-handler.js b/src/api/providers-handler.js index 9b85848..c6bb4fa 100644 --- a/src/api/providers-handler.js +++ b/src/api/providers-handler.js @@ -179,6 +179,20 @@ function getConfiguredProviders() { }); } + // Check Moonshot (Kimi) + if (config.moonshot?.apiKey) { + providers.push({ + name: "moonshot", + type: "moonshot-ai", + baseUrl: config.moonshot.endpoint || "https://api.moonshot.ai/api/anthropic", + enabled: true, + models: [ + { id: config.moonshot.model || "kimi-k2.5", name: "Configured Model" }, + { id: "kimi-k2.5", name: "Kimi K2.5" }, + ] + }); + } + // Check Vertex AI (Google Cloud) if (config.vertex?.projectId) { const region = config.vertex.region || "us-east5"; diff --git a/src/clients/databricks.js b/src/clients/databricks.js index 9b536cd..58a8b82 100644 --- a/src/clients/databricks.js +++ b/src/clients/databricks.js @@ -1261,16 +1261,21 @@ async function invokeBedrock(body) { * Z.AI offers GLM models through an Anthropic-compatible API at ~1/7 the cost. * Minimal transformation needed - mostly passthrough with model mapping. */ -async function invokeZai(body) { - if (!config.zai?.apiKey) { - throw new Error("Z.AI API key is not configured. Set ZAI_API_KEY in your .env file."); +async function invokeZai(body, providerOptions = {}) { + const providerConfig = providerOptions.config || config.zai || {}; + const providerName = providerOptions.providerName || "Z.AI"; + const defaultEndpoint = providerOptions.defaultEndpoint || "https://api.z.ai/api/anthropic/v1/messages"; + const defaultModel = providerOptions.defaultModel || "glm-4.7"; + + if (!providerConfig.apiKey) { + throw new Error(`${providerName} API key is not configured.`); } - const endpoint = config.zai.endpoint || "https://api.z.ai/api/anthropic/v1/messages"; + const endpoint = providerConfig.endpoint || defaultEndpoint; const isOpenAIFormat = endpoint.includes("/chat/completions"); // Model mapping: Anthropic names → Z.AI names (lowercase) - const modelMap = { + const modelMap = providerOptions.modelMap || { "claude-sonnet-4-5-20250929": "glm-4.7", "claude-sonnet-4-5": "glm-4.7", "claude-sonnet-4.5": "glm-4.7", @@ -1280,8 +1285,9 @@ async function invokeZai(body) { "claude-3-haiku": "glm-4.5-air", }; - const requestedModel = body.model || config.zai.model; - let mappedModel = modelMap[requestedModel] || config.zai.model || "glm-4.7"; + const requestedModel = body.model || providerConfig.model; + // If operator explicitly sets provider model, honor it over Claude-name mapping. + let mappedModel = providerConfig.model || modelMap[requestedModel] || defaultModel; mappedModel = mappedModel.toLowerCase(); let zaiBody; @@ -1362,7 +1368,7 @@ async function invokeZai(body) { headers = { "Content-Type": "application/json", - "Authorization": `Bearer ${config.zai.apiKey}`, + "Authorization": `Bearer ${providerConfig.apiKey}`, }; } else { // Anthropic format endpoint @@ -1376,12 +1382,12 @@ async function invokeZai(body) { injectedToolCount: STANDARD_TOOLS.length, injectedToolNames: STANDARD_TOOLS.map(t => t.name), reason: "Client did not send tools (passthrough mode)" - }, "=== INJECTING STANDARD TOOLS (Z.AI Anthropic) ==="); + }, `=== INJECTING STANDARD TOOLS (${providerName} Anthropic) ===`); } headers = { "Content-Type": "application/json", - "x-api-key": config.zai.apiKey, + "x-api-key": providerConfig.apiKey, "anthropic-version": "2023-06-01", }; } @@ -1401,20 +1407,20 @@ async function invokeZai(body) { toolNames: zaiBody.tools?.map(t => t.function?.name || t.name), toolChoice: zaiBody.tool_choice, fullRequest: JSON.stringify(zaiBody).substring(0, 500), - }, "=== Z.AI REQUEST ==="); + }, `=== ${providerName} REQUEST ===`); logger.debug({ zaiBody: JSON.stringify(zaiBody).substring(0, 1000), - }, "Z.AI request body (truncated)"); + }, `${providerName} request body (truncated)`); - // Use semaphore to limit concurrent Z.AI requests (prevents rate limiting) + // Use semaphore to limit concurrent requests (prevents rate limiting) return zaiSemaphore.run(async () => { logger.debug({ queueLength: zaiSemaphore.queue.length, currentConcurrent: zaiSemaphore.current, - }, "Z.AI semaphore status"); + }, `${providerName} semaphore status`); - const response = await performJsonRequest(endpoint, { headers, body: zaiBody }, "Z.AI"); + const response = await performJsonRequest(endpoint, { headers, body: zaiBody }, providerName); logger.info({ responseOk: response?.ok, @@ -1423,14 +1429,14 @@ async function invokeZai(body) { rawContent: response?.json?.choices?.[0]?.message?.content, hasReasoning: !!response?.json?.choices?.[0]?.message?.reasoning_content, isOpenAIFormat, - }, "=== Z.AI RAW RESPONSE ==="); + }, `=== ${providerName} RAW RESPONSE ===`); // Convert OpenAI response back to Anthropic format if needed if (isOpenAIFormat && response?.ok && response?.json) { const anthropicJson = convertOpenAIToAnthropic(response.json); logger.info({ convertedContent: JSON.stringify(anthropicJson.content).substring(0, 200), - }, "=== Z.AI CONVERTED RESPONSE ==="); + }, `=== ${providerName} CONVERTED RESPONSE ===`); // Return in the same format as other providers (with ok, status, json) return { ok: response.ok, @@ -1446,6 +1452,26 @@ async function invokeZai(body) { }); } +async function invokeMoonshot(body) { + const moonshotModelMap = { + "claude-sonnet-4-5-20250929": "kimi-k2.5", + "claude-sonnet-4-5": "kimi-k2.5", + "claude-sonnet-4.5": "kimi-k2.5", + "claude-3-5-sonnet": "kimi-k2.5", + "claude-haiku-4-5-20251001": "kimi-k2.5", + "claude-haiku-4-5": "kimi-k2.5", + "claude-3-haiku": "kimi-k2.5", + }; + + return invokeZai(body, { + providerName: "Moonshot", + config: config.moonshot, + defaultEndpoint: "https://api.moonshot.ai/anthropic/v1/messages", + defaultModel: "kimi-k2.5", + modelMap: moonshotModelMap, + }); +} + /** @@ -1883,6 +1909,8 @@ async function invokeModel(body, options = {}) { return await invokeBedrock(body); } else if (initialProvider === "zai") { return await invokeZai(body); + } else if (initialProvider === "moonshot") { + return await invokeMoonshot(body); } else if (initialProvider === "vertex") { return await invokeVertex(body); } @@ -1972,6 +2000,8 @@ async function invokeModel(body, options = {}) { return await invokeLlamaCpp(body); } else if (fallbackProvider === "zai") { return await invokeZai(body); + } else if (fallbackProvider === "moonshot") { + return await invokeMoonshot(body); } else if (fallbackProvider === "vertex") { return await invokeVertex(body); } diff --git a/src/clients/ollama-utils.js b/src/clients/ollama-utils.js index 7582f05..1815c8f 100644 --- a/src/clients/ollama-utils.js +++ b/src/clients/ollama-utils.js @@ -11,6 +11,7 @@ const TOOL_CAPABLE_MODELS = new Set([ "llama3.1", "llama3.2", "qwen2.5", + "qwen3", "mistral", "mistral-nemo", "firefunction-v2", diff --git a/src/config/index.js b/src/config/index.js index 466585d..42ee558 100644 --- a/src/config/index.js +++ b/src/config/index.js @@ -62,7 +62,7 @@ function resolveConfigPath(targetPath) { return path.resolve(normalised); } -const SUPPORTED_MODEL_PROVIDERS = new Set(["databricks", "azure-anthropic", "ollama", "openrouter", "azure-openai", "openai", "llamacpp", "lmstudio", "bedrock", "zai", "vertex"]); +const SUPPORTED_MODEL_PROVIDERS = new Set(["databricks", "azure-anthropic", "ollama", "openrouter", "azure-openai", "openai", "llamacpp", "lmstudio", "bedrock", "zai", "moonshot", "vertex"]); const rawModelProvider = (process.env.MODEL_PROVIDER ?? "databricks").toLowerCase(); // Validate MODEL_PROVIDER early with a clear error message @@ -132,6 +132,11 @@ const zaiApiKey = process.env.ZAI_API_KEY?.trim() || null; const zaiEndpoint = process.env.ZAI_ENDPOINT?.trim() || "https://api.z.ai/api/anthropic/v1/messages"; const zaiModel = process.env.ZAI_MODEL?.trim() || "GLM-4.7"; +// Moonshot configuration - Anthropic-compatible API for Kimi models +const moonshotApiKey = process.env.MOONSHOT_API_KEY?.trim() || null; +const moonshotEndpoint = process.env.MOONSHOT_ENDPOINT?.trim() || "https://api.moonshot.ai/anthropic/v1/messages"; +const moonshotModel = process.env.MOONSHOT_MODEL?.trim() || "kimi-k2.5"; + // Vertex AI (Google Gemini) configuration const vertexApiKey = process.env.VERTEX_API_KEY?.trim() || process.env.GOOGLE_API_KEY?.trim() || null; const vertexModel = process.env.VERTEX_MODEL?.trim() || "gemini-2.0-flash"; @@ -305,6 +310,12 @@ if (modelProvider === "bedrock" && !bedrockApiKey) { ); } +if (modelProvider === "moonshot" && !moonshotApiKey) { + throw new Error( + "Set MOONSHOT_API_KEY before starting the proxy.", + ); +} + // Validate hybrid routing configuration if (preferOllama) { if (!ollamaEndpoint) { @@ -319,7 +330,7 @@ if (preferOllama) { // Prevent local providers from being used as fallback (they can fail just like Ollama) const localProviders = ["ollama", "llamacpp", "lmstudio"]; if (fallbackEnabled && localProviders.includes(fallbackProvider)) { - throw new Error(`FALLBACK_PROVIDER cannot be '${fallbackProvider}' (local providers should not be fallbacks). Use cloud providers: databricks, azure-anthropic, azure-openai, openrouter, openai, bedrock`); + throw new Error(`FALLBACK_PROVIDER cannot be '${fallbackProvider}' (local providers should not be fallbacks). Use cloud providers: databricks, azure-anthropic, azure-openai, openrouter, openai, bedrock, zai, moonshot, vertex`); } // Ensure fallback provider is properly configured (only if fallback is enabled) @@ -336,6 +347,9 @@ if (preferOllama) { if (fallbackProvider === "bedrock" && !bedrockApiKey) { throw new Error("FALLBACK_PROVIDER is set to 'bedrock' but AWS_BEDROCK_API_KEY is not configured. Please set this environment variable or choose a different fallback provider."); } + if (fallbackProvider === "moonshot" && !moonshotApiKey) { + throw new Error("FALLBACK_PROVIDER is set to 'moonshot' but MOONSHOT_API_KEY is not configured. Please set this environment variable or choose a different fallback provider."); + } } } @@ -589,6 +603,11 @@ var config = { endpoint: zaiEndpoint, model: zaiModel, }, + moonshot: { + apiKey: moonshotApiKey, + endpoint: moonshotEndpoint, + model: moonshotModel, + }, vertex: { apiKey: vertexApiKey, model: vertexModel, @@ -878,7 +897,11 @@ function reloadConfig() { config.openai.apiKey = process.env.OPENAI_API_KEY?.trim() || null; config.bedrock.apiKey = process.env.AWS_BEDROCK_API_KEY?.trim() || null; config.zai.apiKey = process.env.ZAI_API_KEY?.trim() || null; + config.zai.endpoint = process.env.ZAI_ENDPOINT?.trim() || "https://api.z.ai/api/anthropic/v1/messages"; config.zai.model = process.env.ZAI_MODEL?.trim() || "GLM-4.7"; + config.moonshot.apiKey = process.env.MOONSHOT_API_KEY?.trim() || null; + config.moonshot.endpoint = process.env.MOONSHOT_ENDPOINT?.trim() || "https://api.moonshot.ai/anthropic/v1/messages"; + config.moonshot.model = process.env.MOONSHOT_MODEL?.trim() || "kimi-k2.5"; config.vertex.apiKey = process.env.VERTEX_API_KEY?.trim() || process.env.GOOGLE_API_KEY?.trim() || null; config.vertex.model = process.env.VERTEX_MODEL?.trim() || "gemini-2.0-flash"; diff --git a/src/orchestrator/index.js b/src/orchestrator/index.js index 55a47a5..38efba3 100644 --- a/src/orchestrator/index.js +++ b/src/orchestrator/index.js @@ -47,6 +47,8 @@ function getDestinationUrl(providerType) { return config.bedrock?.endpoint ?? 'unknown'; case 'zai': return config.zai?.endpoint ?? 'unknown'; + case 'moonshot': + return config.moonshot?.endpoint ?? 'unknown'; case 'vertex': return config.vertex?.endpoint ?? 'unknown'; default: @@ -1090,6 +1092,13 @@ function sanitizePayload(payload) { // Ensure tools are in Anthropic format clean.tools = ensureAnthropicToolFormat(clean.tools); } + } else if (providerType === "moonshot") { + // Moonshot uses Anthropic-compatible endpoint and tool format + if (!Array.isArray(clean.tools) || clean.tools.length === 0) { + delete clean.tools; + } else { + clean.tools = ensureAnthropicToolFormat(clean.tools); + } } else if (providerType === "vertex") { // Vertex AI supports tools - keep them in Anthropic format if (!Array.isArray(clean.tools) || clean.tools.length === 0) { @@ -2849,12 +2858,12 @@ IMPORTANT TOOL USAGE RULES: }, "=== CONVERTED ANTHROPIC RESPONSE (llama.cpp) ==="); anthropicPayload.content = policy.sanitiseContent(anthropicPayload.content); - } else if (actualProvider === "zai") { - // Z.AI responses are already converted to Anthropic format in invokeZai + } else if (actualProvider === "zai" || actualProvider === "moonshot") { + // Z.AI/Moonshot responses are already converted to Anthropic format logger.info({ hasJson: !!databricksResponse.json, jsonContent: JSON.stringify(databricksResponse.json?.content)?.substring(0, 200), - }, "=== ZAI ORCHESTRATOR DEBUG ==="); + }, "=== ANTHROPIC-COMPAT ORCHESTRATOR DEBUG ==="); anthropicPayload = databricksResponse.json; if (Array.isArray(anthropicPayload?.content)) { anthropicPayload.content = policy.sanitiseContent(anthropicPayload.content); @@ -3205,6 +3214,10 @@ IMPORTANT TOOL USAGE RULES: response: { status: 200, body: anthropicPayload, + headers: { + "X-Lynkr-Provider": databricksResponse.actualProvider || providerType, + "X-Lynkr-Routing-Method": databricksResponse.routingDecision?.method || "static", + }, terminationReason: "completion", }, steps, From 70d226e2bc87b98e1927a28aaa4baf364ceb92dc Mon Sep 17 00:00:00 2001 From: Plaidmustache Date: Mon, 16 Feb 2026 00:50:32 +0100 Subject: [PATCH 2/3] fix(moonshot): preserve anthropic tool loop semantics for /v1/messages Treat moonshot as anthropic tool-protocol provider in orchestrator, skip smart-tool filtering on tool-followup turns, and avoid standard-tool injection when tool history exists. --- src/clients/databricks.js | 18 +++++++++- src/orchestrator/index.js | 71 +++++++++++++++++++++++++++++---------- 2 files changed, 71 insertions(+), 18 deletions(-) diff --git a/src/clients/databricks.js b/src/clients/databricks.js index 58a8b82..f71d078 100644 --- a/src/clients/databricks.js +++ b/src/clients/databricks.js @@ -1375,14 +1375,30 @@ async function invokeZai(body, providerOptions = {}) { zaiBody = { ...body }; zaiBody.model = mappedModel; + const hasToolHistory = Array.isArray(zaiBody.messages) + && zaiBody.messages.some((message) => { + if (!message || !Array.isArray(message.content)) return false; + return message.content.some((block) => ( + block?.type === "tool_use" + || block?.type === "tool_result" + || block?.type === "tool_reference" + )); + }); + // Inject standard tools if client didn't send any (passthrough mode) - if (!Array.isArray(zaiBody.tools) || zaiBody.tools.length === 0) { + // IMPORTANT: do not inject on tool-followup turns, because the model + // must continue against the exact previously-declared tool schema. + if ((!Array.isArray(zaiBody.tools) || zaiBody.tools.length === 0) && !hasToolHistory) { zaiBody.tools = STANDARD_TOOLS; logger.info({ injectedToolCount: STANDARD_TOOLS.length, injectedToolNames: STANDARD_TOOLS.map(t => t.name), reason: "Client did not send tools (passthrough mode)" }, `=== INJECTING STANDARD TOOLS (${providerName} Anthropic) ===`); + } else if ((!Array.isArray(zaiBody.tools) || zaiBody.tools.length === 0) && hasToolHistory) { + logger.info({ + reason: "Skipped tool injection on tool-followup turn", + }, `=== TOOL INJECTION SKIPPED (${providerName} Anthropic) ===`); } headers = { diff --git a/src/orchestrator/index.js b/src/orchestrator/index.js index 38efba3..198a05a 100644 --- a/src/orchestrator/index.js +++ b/src/orchestrator/index.js @@ -167,6 +167,23 @@ function flattenBlocks(blocks) { .join(""); } +function hasToolProtocolHistory(messages) { + if (!Array.isArray(messages) || messages.length === 0) return false; + return messages.some((message) => { + if (!message) return false; + if (message.role === "tool" || Array.isArray(message.tool_calls)) return true; + if (!Array.isArray(message.content)) return false; + return message.content.some((block) => { + if (!block || typeof block !== "object") return false; + return ( + block.type === "tool_use" || + block.type === "tool_result" || + block.type === "tool_reference" + ); + }); + }); +} + function normaliseMessages(payload, options = {}) { const flattenContent = options.flattenContent !== false; const normalised = []; @@ -209,6 +226,19 @@ function normaliseTools(tools) { })); } +function endpointUsesOpenAIChatCompletions(endpoint) { + return typeof endpoint === "string" && endpoint.includes("/chat/completions"); +} + +function providerUsesAnthropicToolProtocol(providerType) { + if (providerType === "azure-anthropic") return true; + if (providerType === "moonshot") return !endpointUsesOpenAIChatCompletions(config.moonshot?.endpoint); + if (providerType === "zai") return !endpointUsesOpenAIChatCompletions(config.zai?.endpoint); + if (providerType === "bedrock") return true; + if (providerType === "vertex") return true; + return false; +} + /** * Ensure tools are in Anthropic format for Databricks/Claude API * Databricks expects: {name, description, input_schema} @@ -529,7 +559,7 @@ function parseExecutionContent(content) { } function createFallbackAssistantMessage(providerType, { text, toolCall }) { - if (providerType === "azure-anthropic") { + if (providerUsesAnthropicToolProtocol(providerType)) { const blocks = []; if (typeof text === "string" && text.trim().length > 0) { blocks.push({ type: "text", text: text.trim() }); @@ -560,7 +590,7 @@ function createFallbackAssistantMessage(providerType, { text, toolCall }) { function createFallbackToolResultMessage(providerType, { toolCall, execution }) { const toolName = execution.name ?? toolCall.function?.name ?? "tool"; const toolId = execution.id ?? toolCall.id ?? `tool_${Date.now()}`; - if (providerType === "azure-anthropic") { + if (providerUsesAnthropicToolProtocol(providerType)) { const parsed = parseExecutionContent(execution.content); let contentBlocks; if (typeof parsed === "string" || parsed === null) { @@ -854,7 +884,7 @@ function sanitizePayload(payload) { "databricks-claude-sonnet-4-5"; clean.model = requestedModel; const providerType = config.modelProvider?.type ?? "databricks"; - const flattenContent = providerType !== "azure-anthropic"; + const flattenContent = !providerUsesAnthropicToolProtocol(providerType); clean.messages = normaliseMessages(clean, { flattenContent }).filter((msg) => { const hasToolCalls = Array.isArray(msg?.tool_calls) && msg.tool_calls.length > 0; @@ -872,7 +902,7 @@ function sanitizePayload(payload) { } return hasToolCalls; }); - if (providerType === "azure-anthropic") { + if (providerUsesAnthropicToolProtocol(providerType)) { const cleanedMessages = []; for (const message of clean.messages) { if (isPlaceholderToolResultMessage(message)) { @@ -1130,7 +1160,13 @@ function sanitizePayload(payload) { } // Smart tool selection (universal, applies to all providers) - if (config.smartToolSelection?.enabled && Array.isArray(clean.tools) && clean.tools.length > 0) { + const hasToolHistory = hasToolProtocolHistory(clean.messages); + if ( + config.smartToolSelection?.enabled && + Array.isArray(clean.tools) && + clean.tools.length > 0 && + !hasToolHistory + ) { const classification = classifyRequestType(clean); const selectedTools = selectToolsSmartly(clean.tools, classification, { provider: providerType, @@ -1357,6 +1393,7 @@ async function runAgentLoop({ let steps = 0; let toolCallsExecuted = 0; let fallbackPerformed = false; + const anthropicToolProtocol = providerUsesAnthropicToolProtocol(providerType); const toolCallNames = new Map(); const toolCallHistory = new Map(); // Track tool calls to detect loops: signature -> count let loopWarningInjected = false; // Track if we've already warned about loops @@ -1879,7 +1916,7 @@ IMPORTANT TOOL USAGE RULES: // Detect Anthropic format: has 'content' array and 'stop_reason' at top level (no 'choices') // This handles azure-anthropic provider AND azure-openai Responses API (which we convert to Anthropic format) - const isAnthropicFormat = providerType === "azure-anthropic" || + const isAnthropicFormat = anthropicToolProtocol || (Array.isArray(databricksResponse.json?.content) && databricksResponse.json?.stop_reason !== undefined && !databricksResponse.json?.choices); if (isAnthropicFormat) { @@ -1938,7 +1975,7 @@ IMPORTANT TOOL USAGE RULES: if (toolCalls.length > 0) { // Convert OpenAI/OpenRouter format to Anthropic format for session storage let sessionContent; - if (providerType === "azure-anthropic") { + if (anthropicToolProtocol) { // Azure Anthropic already returns content in Anthropic sessionContent = databricksResponse.json?.content ?? []; } else { @@ -1999,7 +2036,7 @@ IMPORTANT TOOL USAGE RULES: }); let assistantToolMessage; - if (providerType === "azure-anthropic") { + if (anthropicToolProtocol) { // For Azure Anthropic, use the content array directly from the response // It already contains both text and tool_use blocks in the correct format assistantToolMessage = { @@ -2016,7 +2053,7 @@ IMPORTANT TOOL USAGE RULES: // Only add fallback content for Databricks format (Azure already has content) if ( - providerType !== "azure-anthropic" && + !anthropicToolProtocol && (!assistantToolMessage.content || (typeof assistantToolMessage.content === "string" && assistantToolMessage.content.trim().length === 0)) && @@ -2197,7 +2234,7 @@ IMPORTANT TOOL USAGE RULES: toolCallsExecuted += 1; let toolMessage; - if (providerType === "azure-anthropic") { + if (anthropicToolProtocol) { const parsedContent = parseExecutionContent(execution.content); const serialisedContent = typeof parsedContent === "string" || parsedContent === null @@ -2236,7 +2273,7 @@ IMPORTANT TOOL USAGE RULES: // Convert to Anthropic format for session storage let sessionToolResultContent; - if (providerType === "azure-anthropic") { + if (anthropicToolProtocol) { sessionToolResultContent = toolMessage.content; } else { sessionToolResultContent = [ @@ -2330,7 +2367,7 @@ IMPORTANT TOOL USAGE RULES: ); let toolResultMessage; - if (providerType === "azure-anthropic") { + if (anthropicToolProtocol) { // Anthropic format: tool_result in user message content array toolResultMessage = { role: "user", @@ -2357,7 +2394,7 @@ IMPORTANT TOOL USAGE RULES: // Convert to Anthropic format for session storage let sessionToolResult; - if (providerType === "azure-anthropic") { + if (anthropicToolProtocol) { sessionToolResult = toolResultMessage.content; } else { // Convert OpenRouter tool message to Anthropic format @@ -2424,7 +2461,7 @@ IMPORTANT TOOL USAGE RULES: }); let toolMessage; - if (providerType === "azure-anthropic") { + if (anthropicToolProtocol) { const parsedContent = parseExecutionContent(execution.content); const serialisedContent = typeof parsedContent === "string" || parsedContent === null @@ -2484,7 +2521,7 @@ IMPORTANT TOOL USAGE RULES: // Convert to Anthropic format for session storage let sessionToolResultContent; - if (providerType === "azure-anthropic") { + if (anthropicToolProtocol) { // Azure Anthropic already has content in correct format sessionToolResultContent = toolMessage.content; } else { @@ -3011,7 +3048,7 @@ IMPORTANT TOOL USAGE RULES: // Convert to Anthropic format for session storage let sessionFallbackContent; - if (providerType === "azure-anthropic") { + if (anthropicToolProtocol) { // Already in Anthropic format sessionFallbackContent = assistantToolMessage.content; } else { @@ -3079,7 +3116,7 @@ IMPORTANT TOOL USAGE RULES: // Convert to Anthropic format for session storage let sessionFallbackToolResult; - if (providerType === "azure-anthropic") { + if (anthropicToolProtocol) { // Already in Anthropic format sessionFallbackToolResult = toolResultMessage.content; } else { From b39b4cbb0068b221d8851b970f69dbdee6dad7cd Mon Sep 17 00:00:00 2001 From: Plaidmustache Date: Mon, 16 Feb 2026 22:13:18 +0100 Subject: [PATCH 3/3] chore(pr): exclude local investigation doc from moonshot provider PR --- ...alling-investigation-summary-2026-02-16.md | 131 ------------------ 1 file changed, 131 deletions(-) delete mode 100644 docs/tool-calling-investigation-summary-2026-02-16.md diff --git a/docs/tool-calling-investigation-summary-2026-02-16.md b/docs/tool-calling-investigation-summary-2026-02-16.md deleted file mode 100644 index 7a520a8..0000000 --- a/docs/tool-calling-investigation-summary-2026-02-16.md +++ /dev/null @@ -1,131 +0,0 @@ -# Lynkr Local Tool-Calling Investigation (Summary) - -Date: 2026-02-16 (local) -Workspace: `/Users/malone/Lynkr` -Current branch: `codex/ollama-qwen3-tool-ab` - -## 1. Why this doc exists -This is a compact record of what we tried, what actually worked, what failed, and why we decided to move forward with API models for now. - -## 2. Final decision -Use API-backed models for tool-reliable Lynkr workflows now. -Keep local Ollama tool-calling as an experiment track and revisit after upstream fixes. - -## 3. What we tested - -### A. Phase progression already completed -- Phase 0-5 completed and checkpointed. -- Phase 6 stage-1 side-by-side cutover was completed and documented. - -Relevant checkpoints: -- `4561b99` fix: prevent `/v1/messages` `dbPath` ReferenceError -- `09dc725` phase4: MLX-aware routing + tests -- `5a4ecce` phase5: launchd + health checks -- `b5e9ea4` phase6 stage-1 report -- `634c532` fix: MLX-aware routing headers behavior - -### B. Local-model strict tool-call probes (historical session results) -- `LIQUID_STRICT_SUMMARY pass=0 raw=0 fail=5 avg_ms=2976 tool_exec_sum=0 nonzero_steps=0` -- `QWEN3_STRICT_SUMMARY pass=0 raw=0 fail=5 avg_ms=3360 tool_exec_sum=0 nonzero_steps=0` -- `LLAMA31_STRICT_SUMMARY pass=0 raw=0 fail=5 avg_ms=16692 tool_exec_sum=0 nonzero_steps=0` -- Re-check on qwen3 again still failed strict probes. - -### C. Isolation tests that narrowed the fault domain -1. Direct Ollama API (`/api/chat`) with tools: -- `qwen3:1.7b`: returned tool calls -- `llama3.1:8b`: returned tool calls - -2. Lynkr OpenAI-compatible path (`/v1/chat/completions`) with tools: -- Can return `finish_reason: tool_calls` in some runs. -- Latest repro returned normal text refusal for provided tool, no execution loop. - -3. Lynkr Anthropic-compatible path (`/v1/messages`) with tools: -- Returns `200` but ended with: - - `stop_reason: end_turn` - - empty text content - - no `tool_use` block -- Routing headers show provider `ollama`. - -4. Lynkr runtime logs repeatedly show: -- tools injected: `injected 12 tools` -- `supportsTools: true` -- `toolCallsExecuted: 0` - -This strongly suggests the problem is in the Lynkr `/v1/messages` orchestration path with these local-model outputs, not simply “Ollama cannot do tools.” - -## 4. Code and branch context - -### Current working branch -- `codex/ollama-qwen3-tool-ab` - -### Uncommitted local change currently present -- `/Users/malone/Lynkr/src/clients/ollama-utils.js` - - added `"qwen3"` to tool-capable heuristic set. - -### Gemini snapshot branch (preserved separately) -Branch: `codex/liquid-gemini-snapshot` -Commits: -- `95ec106` (Liquid-specific tool instructions) -- `e7df511` (Liquid tool-call parser work in `openrouter-utils`) -- `b28826a` (MCP HTTP transport + Liquid handling changes) - -Rollback safety: -- `codex/rollback-pre-gemini` points to `634c532` (pre-gemini baseline we resumed from). - -## 5. Why this is hard (non-hand-wavy) -Tool execution here depends on three layers aligning: -1. Model emits parseable tool call objects. -2. Provider/runtime preserves tool call structure. -3. Lynkr `/v1/messages` loop recognizes + executes + re-injects correctly. - -In our runs, layer (1) and sometimes (2) worked in isolation, but layer (3) was unreliable for local models in the Anthropic-compatible loop. - -## 6. External signals (checked 2026-02-15/16) - -### Lynkr repo -- Open PR: [#39 Improve tool calling response handling](https://github.com/Fast-Editor/Lynkr/pull/39) -- Merged PRs we referenced: - - [#31 Stop wasteful tool injection in Ollama](https://github.com/Fast-Editor/Lynkr/pull/31) - - [#42 SUGGESTION_MODE_MODEL](https://github.com/Fast-Editor/Lynkr/pull/42) -- Related split/umbrella context: - - [#45 Improve tool calling response handling](https://github.com/Fast-Editor/Lynkr/pull/45) - -### Ollama repo (related edge-case reports) -- [#10976 Thinking + tools + qwen3 empty output](https://github.com/ollama/ollama/issues/10976) -- [#11381 Qwen3 function-call / think behavior issue](https://github.com/ollama/ollama/issues/11381) -- [#9802 tool-call/template handling issue](https://github.com/ollama/ollama/issues/9802) -- Official baseline behavior: [Ollama tool-calling docs](https://ollama.com/blog/tool-support) - -### Liquid docs -- Tool-use format behavior and structured output guidance: - - [Liquid tool-use docs](https://docs.liquid.ai/docs/inference/features/tool-use) -- Ollama compatibility note: - - [Liquid Ollama docs](https://docs.liquid.ai/docs/inference/ollama) - - Includes note referencing merged Ollama support PR for LFM2, with LFM2.1 support pending. - -## 7. Lynkr docs evidence we used -- `/Users/malone/Lynkr/documentation/tools.md:22` (tool flow assumptions) -- `/Users/malone/Lynkr/documentation/providers.md:421` (recommended tool models) -- `/Users/malone/Lynkr/documentation/providers.md:867` (provider comparison: Ollama tool-calling = Fair) -- `/Users/malone/Lynkr/documentation/providers.md:688` (MLX section and recommended model families) -- `/Users/malone/Lynkr/documentation/installation.md:363` (recommended Ollama models; smaller variants struggle) - -## 8. Re-entry checklist (when revisiting local tools) -1. Keep this branch frozen as baseline. -2. Re-test strict probe after Lynkr PR #39 (or equivalent) lands. -3. Re-test with one known strong local tool model only (avoid broad matrix first). -4. Validate on `/v1/messages` specifically (not only `/v1/chat/completions`). -5. Only promote after observed `toolCallsExecuted > 0` in Lynkr logs for repeated requests. - -## 9. Immediate move-forward implementation (Moonshot/Kimi) -Applied on branch `codex/ollama-qwen3-tool-ab`: -1. Added first-class `moonshot` provider support in config/dispatch/provider discovery. -2. Switched runtime to: - - `MODEL_PROVIDER=moonshot` - - `MOONSHOT_ENDPOINT=https://api.moonshot.ai/anthropic/v1/messages` - - `MOONSHOT_MODEL=kimi-k2.5` -3. Fixed misleading response headers to return actual provider (`moonshot`) from orchestrator. -4. Smoke results: - - `/health/live` and `/v1/health`: `200` - - `/v1/messages` simple prompt: model `kimi-k2.5`, text `READY` - - `/v1/messages` tool prompt: `stop_reason=tool_use`, tool name `Read`