[codex] support openai extra request params by Felix3322 · Pull Request #7 · scukeqi/Wisdom-Weasel

Felix3322 · 2026-03-19T12:37:22Z

Summary

This PR adds configurable passthrough request fields for the OpenAI-compatible backend so Wisdom-Weasel can talk to provider-specific OpenAI-style APIs that require extra parameters beyond the current fixed request body.

This is related to the follow-up discussion in #4 about supporting vendor-specific controls such as disabling thinking / reasoning output or limiting CoT / reasoning budgets on OpenAI-compatible services.

User impact

Before this change, the OpenAI-compatible request body was effectively hard-coded to:

model
messages
max_tokens
temperature

That worked for basic OpenAI-compatible services, but it blocked or limited compatibility with providers that expect extra JSON fields or custom HTTP headers. In practice that means users could not cleanly pass options like:

reasoning_effort
provider-specific thinking controls
chat_template_kwargs.enable_thinking
thinking_budget
custom routing / vendor headers

As a result, some providers would either ignore the intended behavior, return overly expensive / verbose reasoning, or require users to patch the source code for every backend variation.

Root cause

The OpenAI-compatible provider and the memory-compression request path both constructed request payloads from a fixed set of fields in C++ and had no general mechanism to:

read arbitrary nested config from weasel.yaml
serialize that config back into JSON
merge it into the outgoing request body
append custom headers without hard-coding them in source

Fix

This PR introduces a reusable config-to-JSON helper and wires it into both OpenAI-style HTTP call paths.

Code changes

add WeaselServer/ConfigJsonUtils.h
- escapes JSON strings safely
- recursively serializes Rime config maps / lists / scalars into JSON
- loads string-like header maps from config
extend OpenAICompatibleProvider with:
- llm/openai/extra_body
- llm/openai/extra_headers
extend MemoryCompressor with:
- llm/memory/extra_body
- llm/memory/extra_headers
preserve default Content-Type and Authorization behavior unless those headers are explicitly overridden in config
update README.md with concrete YAML examples for passing reasoning / thinking-related vendor parameters

Example config enabled by this PR

llm:
  openai:
    api_url: "https://your-provider.example/v1/chat/completions"
    api_key: "your-api-key"
    model: "your-model"
    max_tokens: 20
    temperature: "0.6"
    extra_body:
      reasoning_effort: "low"
      thinking:
        type: "disabled"
      chat_template_kwargs:
        enable_thinking: false
        thinking_budget: 0

Validation

I did not rely on CLion's project model for validation.

Checks performed:

verified GitHub CLI auth and repo access via gh auth status
verified the real MSVC toolchain environment via msvc-latest.bat x64
confirmed cl.exe and MSBuild.exe are available in that environment
re-ran IDE-level inspections on the modified C++ files after the changes

Full repository build was not completed locally because this workspace does not currently have BOOST_ROOT / env.bat configured for the project build script.

Notes

This PR is intentionally scoped to the OpenAI-compatible / memory-compressor passthrough parameter work discussed in #4 comments. It does not attempt to implement the larger llama.cpp constrained decoding / beam search request from the issue title.

WeaselServer/LLMProvider.cpp

Felix3322 · 2026-03-19T13:52:59Z

已按最新 review comment 处理：

OpenAI 兼容后端现在会默认优先请求 JSON 输出（内置 response_format: {"type":"json_object"}，若用户未在 extra_body.response_format 中自定义）
prompt 同步改成只返回 {"candidates":[...]} 结构
解析端补了更稳健的 JSON 字符串与数组解析，能处理转义字符，也会先尝试解析结构化 candidates
若某些兼容后端不支持该 JSON 约束，且没有用户自定义 response_format，会自动回退到原来的纯文本请求方式，避免兼容性倒退

本地已在独立干净 worktree 用真实 MSVC 环境重新编过 WeaselServer 相关目标，当前通过。

KagaJiankui · 2026-03-19T13:58:31Z

感谢极快的commit，可以考虑改变JSON和纯文本方式的顺序，因为有些提供商会主动拒绝不符合其规定格式的response_format参数：

在没有配置response_format类参数时，默认使用纯文本提示词与解析方法
只有在显式配置response_format参数之后才启用约束解码参数

如果您认为实现有困难或者就是觉得我事太多，可以忽略这个建议ovo

Felix3322 · 2026-03-19T14:00:57Z

感谢极快的commit，可以考虑改变JSON和纯文本方式的顺序，因为有些提供商会主动拒绝不符合其规定格式的response_format参数：

在没有配置response_format类参数时，默认使用纯文本提示词与解析方法

只有在显式配置response_format参数之后才启用约束解码参数

如果您认为实现有困难或者就是觉得我事太多，可以忽略这个建议ovo

你快速删除文本会出现性能问题吗？

Felix3322 · 2026-03-19T14:01:23Z

感谢极快的commit，可以考虑改变JSON和纯文本方式的顺序，因为有些提供商会主动拒绝不符合其规定格式的response_format参数：

在没有配置response_format类参数时，默认使用纯文本提示词与解析方法

只有在显式配置response_format参数之后才启用约束解码参数

如果您认为实现有困难或者就是觉得我事太多，可以忽略这个建议ovo

如果不是个例的话，backspace后就不计算候选了吧

KagaJiankui · 2026-03-19T14:04:10Z

你快速删除文本会出现性能问题吗？

我遇到的性能问题是在快速输入文本时卡顿，尤其是输入“`”符号时如果当前键入未上屏下一次键入一定无法输入，快速删除文本时没有遇到任何性能问题。

Felix3322 · 2026-03-19T14:05:45Z

你快速删除文本会出现性能问题吗？

我遇到的性能问题是在快速输入文本时卡顿，尤其是输入“`”符号时如果当前键入未上屏下一次键入一定无法输入，快速删除文本时没有遇到任何性能问题。

我输入`甚至没反应

Felix3322 · 2026-03-19T14:08:06Z

可能是我cpu缩缸了反正我这装了这个就卡

Felix3322 · 2026-03-19T14:10:05Z

我鼠标都动不了了...神经

KagaJiankui · 2026-03-19T14:12:02Z

可能是我cpu缩缸了反正我这装了这个就卡

我笔记本液金涂歪了cpu上不了3.0G但是平常都还好，也就是有时候不太跟手，然后用本地模型经常报boost::archive相关的ipc异常

Felix3322 · 2026-03-19T14:13:40Z

可能是我cpu缩缸了反正我这装了这个就卡

我笔记本液金涂歪了cpu上不了3.0G但是平常都还好，也就是有时候不太跟手，然后用本地模型经常报boost::archive相关的ipc异常

我刚换完硅脂还是上不了4G（锁在65w了国内还不管我这死地方的守候

Felix3322 · 2026-03-19T14:22:10Z

已按这条最新建议调整：

默认不再主动注入 response_format
在没有显式配置 llm/openai/extra_body/response_format 时，走纯文本提示词 + 纯文本解析
只有当用户明确配置了 response_format 时，才走结构化 JSON 输出路径
若显式 JSON 输出解析失败，仍会回退到纯文本再试一次，避免完全无候选
README 也补充了说明：response_format 现在是显式 opt-in 能力

本地已在干净 worktree 中重新用真实 MSVC 环境编过 WeaselServer 相关目标，当前通过。

对应提交：887d363 (default openai output to plain text)

Felix3322 · 2026-03-19T15:18:37Z

Superseded by #10.

PR #10 contains the current combined local working state and should be used as the active review target going forward. Closing this older split PR to avoid fragmented review.

Felix3322 · 2026-03-19T15:18:38Z

Closed as superseded by #10.

support openai extra request params

e552ebf

Felix3322 mentioned this pull request Mar 19, 2026

[feature request]希望llamacpp后端也能使用约束解码与束搜索 #4

Open

KagaJiankui reviewed Mar 19, 2026

View reviewed changes

WeaselServer/LLMProvider.cpp Outdated Show resolved Hide resolved

fix openai json output parsing

97c943c

Felix3322 mentioned this pull request Mar 19, 2026

[codex] personalize ranking and expose candidate sources #9

Closed

default openai output to plain text

887d363

Felix3322 closed this Mar 19, 2026

Conversation

Felix3322 commented Mar 19, 2026

Summary

User impact

Root cause

Fix

Code changes

Example config enabled by this PR

Validation

Notes

Uh oh!

Uh oh!

Felix3322 commented Mar 19, 2026

Uh oh!

KagaJiankui commented Mar 19, 2026

Uh oh!

Felix3322 commented Mar 19, 2026

Uh oh!

Felix3322 commented Mar 19, 2026

Uh oh!

KagaJiankui commented Mar 19, 2026

Uh oh!

Felix3322 commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Felix3322 commented Mar 19, 2026

Uh oh!

Felix3322 commented Mar 19, 2026

Uh oh!

KagaJiankui commented Mar 19, 2026

Uh oh!

Felix3322 commented Mar 19, 2026

Uh oh!

Felix3322 commented Mar 19, 2026

Uh oh!

Felix3322 commented Mar 19, 2026

Uh oh!

Felix3322 commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Felix3322 commented Mar 19, 2026 •

edited

Loading