-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Problem
When using reasoning models like GLM-5 (ZhiPu/智谱) as the VLM backend, the model spends all tokens on reasoning_content and returns empty content. This is because GLM-5 requires enable_thinking: false to disable chain-of-thought reasoning for structured tasks like memory extraction.
Current Behavior
The OpenAIVLM backend has _supports_enable_thinking() which only recognizes DashScope (Alibaba Cloud) endpoints:
_DASHSCOPE_HOSTS = {
"dashscope.aliyuncs.com",
"dashscope-intl.aliyuncs.com",
}The LiteLLM backend does not support enable_thinking at all.
For ZhiPu (open.bigmodel.cn) and other OpenAI-compatible providers with reasoning models, there is no way to pass enable_thinking: false through configuration.
Proposed Solution
Add a config option in ov.conf to control thinking behavior, e.g.:
{
"vlm": {
"backend": "openai",
"model": "glm-5",
"api_base": "https://open.bigmodel.cn/api/coding/paas/v4",
"enable_thinking": false
}
}This would be more general than hardcoding host allowlists, and would work for any provider that supports the enable_thinking parameter (ZhiPu, DashScope, DeepSeek, etc.).
Alternative: support an extra_body config field to pass arbitrary parameters to the API call.
Workaround
Currently we patch openai_vlm.py directly to add open.bigmodel.cn to the host allowlist, but this gets overwritten on every upgrade.
Environment
- OpenViking: 0.2.12
- Model: GLM-5 (ZhiPu)
- Backend: openai (switched from litellm due to same limitation)
- Python: 3.14
Metadata
Metadata
Assignees
Labels
Type
Projects
Status