Compaction fails with 400 when using third-party model with large context window
What version of Kimi Code is running?
0.16.0
Which open platform/subscription were you using?
Self-configured third-party provider (ZhiPu / GLM 5.2 via OpenAI-compatible API)
Which model were you using?
zhipu/glm-5.2 (GLM 5.2, 1,000,000 context window, 131,072 max output)
What platform is your computer?
Windows 11 x64
What issue are you seeing?
When context compaction is triggered, the LLM API returns a 400 error:
Compaction cancelled
Error: [compaction.failed] APIStatusError: 400
max_tokens参数非法:限制数值范围[1,131072]
The root cause is that the compaction path in full.ts does not pass maxOutputSize to resolveCompletionBudget(), so the computed max_completion_tokens cap falls back to max_context_tokens (1,000,000 in this case), which exceeds the API's hard limit of 131,072.
What steps can reproduce the bug?
- Configure a third-party model with a large context window and a smaller max output limit:
[models."zhipu/glm-5.2"]
provider = "zhipu"
model = "glm-5.2"
max_context_size = 1000000
max_output_size = 131072
capabilities = [ "thinking" ]
[loop_control]
reserved_context_size = 50000
compaction_trigger_ratio = 0.99
-
Set this model as default_model.
-
Use Kimi Code normally until context compaction is triggered.
-
Observe the 400 error: max_tokens exceeds the allowed range [1, 131072].
What is the expected behavior?
Compaction should respect max_output_size as a hard cap on max_completion_tokens, just like the main loop does.
Root Cause Analysis
There are two code paths that call resolveCompletionBudget():
Main loop path (correct) — packages/agent-core/src/agent/index.ts lines 221-224:
const completionBudgetConfig = resolveCompletionBudget({
maxOutputSize: this.config.maxOutputSize, // ✅ passes maxOutputSize
reservedContextSize: loopControl?.reservedContextSize,
});
Compaction path (buggy) — packages/agent-core/src/agent/compaction/full.ts lines 264-270:
const provider = applyCompletionBudget({
provider: this.agent.config.provider,
budget: resolveCompletionBudget({
reservedContextSize: this.agent.kimiConfig?.loopControl?.reservedContextSize,
// ❌ maxOutputSize is NOT passed
}),
capability: this.agent.config.modelCapabilities,
});
Because maxOutputSize is not passed, resolveCompletionBudget() skips the hardCap branch (line 32-34 of completion-budget.ts) and falls through to reservedContextSize as fallback. Then in computeCompletionBudgetCap() (line 62-64), since there is no hardCap, it uses max_context_tokens (1,000,000) instead:
const cap =
args.budget.hardCap ??
(maxCtx > 0 ? maxCtx : args.budget.fallback ?? DEFAULT_UNKNOWN_CONTEXT_FALLBACK);
This results in max_completion_tokens = 1,000,000 being sent to the API, which rejects it.
Suggested Fix
In packages/agent-core/src/agent/compaction/full.ts, add maxOutputSize to the resolveCompletionBudget() call:
const provider = applyCompletionBudget({
provider: this.agent.config.provider,
budget: resolveCompletionBudget({
maxOutputSize: this.agent.config.maxOutputSize, // ← add this line
reservedContextSize: this.agent.kimiConfig?.loopControl?.reservedContextSize,
}),
capability: this.agent.config.modelCapabilities,
});
Workaround
Set the environment variable KIMI_MODEL_MAX_COMPLETION_TOKENS=131072 — this env var has the highest priority in resolveCompletionBudget() and acts as a hardCap for both paths.
Additional information
- This bug affects third-party models with large context windows (e.g., 1M tokens) where
max_context_size greatly exceeds max_output_size. Kimi's own models (e.g., K2.7 with 262K context) are less likely to trigger it.
- The code comment in
computeCompletionBudgetCap() says "The provider backend computes the safe request-specific value from the serialized prompt" — but this assumption does not hold for third-party OpenAI-compatible providers that enforce a strict max_tokens limit.
Compaction fails with 400 when using third-party model with large context window
What version of Kimi Code is running?
0.16.0
Which open platform/subscription were you using?
Self-configured third-party provider (ZhiPu / GLM 5.2 via OpenAI-compatible API)
Which model were you using?
zhipu/glm-5.2(GLM 5.2, 1,000,000 context window, 131,072 max output)What platform is your computer?
Windows 11 x64
What issue are you seeing?
When context compaction is triggered, the LLM API returns a 400 error:
The root cause is that the compaction path in
full.tsdoes not passmaxOutputSizetoresolveCompletionBudget(), so the computedmax_completion_tokenscap falls back tomax_context_tokens(1,000,000 in this case), which exceeds the API's hard limit of 131,072.What steps can reproduce the bug?
Set this model as
default_model.Use Kimi Code normally until context compaction is triggered.
Observe the 400 error:
max_tokensexceeds the allowed range[1, 131072].What is the expected behavior?
Compaction should respect
max_output_sizeas a hard cap onmax_completion_tokens, just like the main loop does.Root Cause Analysis
There are two code paths that call
resolveCompletionBudget():Main loop path (correct) —
packages/agent-core/src/agent/index.tslines 221-224:Compaction path (buggy) —
packages/agent-core/src/agent/compaction/full.tslines 264-270:Because
maxOutputSizeis not passed,resolveCompletionBudget()skips thehardCapbranch (line 32-34 ofcompletion-budget.ts) and falls through toreservedContextSizeasfallback. Then incomputeCompletionBudgetCap()(line 62-64), since there is nohardCap, it usesmax_context_tokens(1,000,000) instead:This results in
max_completion_tokens = 1,000,000being sent to the API, which rejects it.Suggested Fix
In
packages/agent-core/src/agent/compaction/full.ts, addmaxOutputSizeto theresolveCompletionBudget()call:Workaround
Set the environment variable
KIMI_MODEL_MAX_COMPLETION_TOKENS=131072— this env var has the highest priority inresolveCompletionBudget()and acts as ahardCapfor both paths.Additional information
max_context_sizegreatly exceedsmax_output_size. Kimi's own models (e.g., K2.7 with 262K context) are less likely to trigger it.computeCompletionBudgetCap()says "The provider backend computes the safe request-specific value from the serialized prompt" — but this assumption does not hold for third-party OpenAI-compatible providers that enforce a strictmax_tokenslimit.