Skip to content

[codex] support openai extra request params#7

Closed
Felix3322 wants to merge 3 commits intoscukeqi:mainfrom
Felix3322:codex/openai-extra-request-params
Closed

[codex] support openai extra request params#7
Felix3322 wants to merge 3 commits intoscukeqi:mainfrom
Felix3322:codex/openai-extra-request-params

Conversation

@Felix3322
Copy link
Copy Markdown

Summary

This PR adds configurable passthrough request fields for the OpenAI-compatible backend so Wisdom-Weasel can talk to provider-specific OpenAI-style APIs that require extra parameters beyond the current fixed request body.

This is related to the follow-up discussion in #4 about supporting vendor-specific controls such as disabling thinking / reasoning output or limiting CoT / reasoning budgets on OpenAI-compatible services.

User impact

Before this change, the OpenAI-compatible request body was effectively hard-coded to:

  • model
  • messages
  • max_tokens
  • temperature

That worked for basic OpenAI-compatible services, but it blocked or limited compatibility with providers that expect extra JSON fields or custom HTTP headers. In practice that means users could not cleanly pass options like:

  • reasoning_effort
  • provider-specific thinking controls
  • chat_template_kwargs.enable_thinking
  • thinking_budget
  • custom routing / vendor headers

As a result, some providers would either ignore the intended behavior, return overly expensive / verbose reasoning, or require users to patch the source code for every backend variation.

Root cause

The OpenAI-compatible provider and the memory-compression request path both constructed request payloads from a fixed set of fields in C++ and had no general mechanism to:

  1. read arbitrary nested config from weasel.yaml
  2. serialize that config back into JSON
  3. merge it into the outgoing request body
  4. append custom headers without hard-coding them in source

Fix

This PR introduces a reusable config-to-JSON helper and wires it into both OpenAI-style HTTP call paths.

Code changes

  • add WeaselServer/ConfigJsonUtils.h
    • escapes JSON strings safely
    • recursively serializes Rime config maps / lists / scalars into JSON
    • loads string-like header maps from config
  • extend OpenAICompatibleProvider with:
    • llm/openai/extra_body
    • llm/openai/extra_headers
  • extend MemoryCompressor with:
    • llm/memory/extra_body
    • llm/memory/extra_headers
  • preserve default Content-Type and Authorization behavior unless those headers are explicitly overridden in config
  • update README.md with concrete YAML examples for passing reasoning / thinking-related vendor parameters

Example config enabled by this PR

llm:
  openai:
    api_url: "https://your-provider.example/v1/chat/completions"
    api_key: "your-api-key"
    model: "your-model"
    max_tokens: 20
    temperature: "0.6"
    extra_body:
      reasoning_effort: "low"
      thinking:
        type: "disabled"
      chat_template_kwargs:
        enable_thinking: false
        thinking_budget: 0

Validation

I did not rely on CLion's project model for validation.

Checks performed:

  • verified GitHub CLI auth and repo access via gh auth status
  • verified the real MSVC toolchain environment via msvc-latest.bat x64
  • confirmed cl.exe and MSBuild.exe are available in that environment
  • re-ran IDE-level inspections on the modified C++ files after the changes

Full repository build was not completed locally because this workspace does not currently have BOOST_ROOT / env.bat configured for the project build script.

Notes

This PR is intentionally scoped to the OpenAI-compatible / memory-compressor passthrough parameter work discussed in #4 comments. It does not attempt to implement the larger llama.cpp constrained decoding / beam search request from the issue title.

@Felix3322
Copy link
Copy Markdown
Author

已按最新 review comment 处理:

  • OpenAI 兼容后端现在会默认优先请求 JSON 输出(内置 response_format: {"type":"json_object"},若用户未在 extra_body.response_format 中自定义)
  • prompt 同步改成只返回 {"candidates":[...]} 结构
  • 解析端补了更稳健的 JSON 字符串与数组解析,能处理转义字符,也会先尝试解析结构化 candidates
  • 若某些兼容后端不支持该 JSON 约束,且没有用户自定义 response_format,会自动回退到原来的纯文本请求方式,避免兼容性倒退

本地已在独立干净 worktree 用真实 MSVC 环境重新编过 WeaselServer 相关目标,当前通过。

@KagaJiankui
Copy link
Copy Markdown

感谢极快的commit,可以考虑改变JSON和纯文本方式的顺序,因为有些提供商会主动拒绝不符合其规定格式的response_format参数

  • 在没有配置response_format类参数时,默认使用纯文本提示词与解析方法
  • 只有在显式配置response_format参数之后才启用约束解码参数

如果您认为实现有困难或者就是觉得我事太多,可以忽略这个建议ovo

@Felix3322
Copy link
Copy Markdown
Author

感谢极快的commit,可以考虑改变JSON和纯文本方式的顺序,因为有些提供商会主动拒绝不符合其规定格式的response_format参数

  • 在没有配置response_format类参数时,默认使用纯文本提示词与解析方法

  • 只有在显式配置response_format参数之后才启用约束解码参数

如果您认为实现有困难或者就是觉得我事太多,可以忽略这个建议ovo

你快速删除文本会出现性能问题吗?

@Felix3322
Copy link
Copy Markdown
Author

感谢极快的commit,可以考虑改变JSON和纯文本方式的顺序,因为有些提供商会主动拒绝不符合其规定格式的response_format参数

  • 在没有配置response_format类参数时,默认使用纯文本提示词与解析方法

  • 只有在显式配置response_format参数之后才启用约束解码参数

如果您认为实现有困难或者就是觉得我事太多,可以忽略这个建议ovo

如果不是个例的话,backspace后就不计算候选了吧

@KagaJiankui
Copy link
Copy Markdown

你快速删除文本会出现性能问题吗?

我遇到的性能问题是在快速输入文本时卡顿,尤其是输入“`”符号时如果当前键入未上屏下一次键入一定无法输入,快速删除文本时没有遇到任何性能问题。

@Felix3322
Copy link
Copy Markdown
Author

Felix3322 commented Mar 19, 2026

你快速删除文本会出现性能问题吗?

我遇到的性能问题是在快速输入文本时卡顿,尤其是输入“`”符号时如果当前键入未上屏下一次键入一定无法输入,快速删除文本时没有遇到任何性能问题。

我输入`甚至没反应

@Felix3322
Copy link
Copy Markdown
Author

可能是我cpu缩缸了 反正我这装了这个就卡

@Felix3322
Copy link
Copy Markdown
Author

我鼠标都动不了了...神经

@KagaJiankui
Copy link
Copy Markdown

可能是我cpu缩缸了 反正我这装了这个就卡

我笔记本液金涂歪了cpu上不了3.0G但是平常都还好,也就是有时候不太跟手,然后用本地模型经常报boost::archive相关的ipc异常

@Felix3322
Copy link
Copy Markdown
Author

可能是我cpu缩缸了 反正我这装了这个就卡

我笔记本液金涂歪了cpu上不了3.0G但是平常都还好,也就是有时候不太跟手,然后用本地模型经常报boost::archive相关的ipc异常

我刚换完硅脂还是上不了4G(锁在65w了 国内还不管我这死地方的守候

@Felix3322
Copy link
Copy Markdown
Author

已按这条最新建议调整:

  • 默认不再主动注入 response_format
  • 在没有显式配置 llm/openai/extra_body/response_format 时,走纯文本提示词 + 纯文本解析
  • 只有当用户明确配置了 response_format 时,才走结构化 JSON 输出路径
  • 若显式 JSON 输出解析失败,仍会回退到纯文本再试一次,避免完全无候选
  • README 也补充了说明:response_format 现在是显式 opt-in 能力

本地已在干净 worktree 中重新用真实 MSVC 环境编过 WeaselServer 相关目标,当前通过。

对应提交:887d363 (default openai output to plain text)

@Felix3322
Copy link
Copy Markdown
Author

Superseded by #10.

PR #10 contains the current combined local working state and should be used as the active review target going forward. Closing this older split PR to avoid fragmented review.

@Felix3322
Copy link
Copy Markdown
Author

Closed as superseded by #10.

@Felix3322 Felix3322 closed this Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants