Skip to content

Compaction fails with 400: maxOutputSize not passed to resolveCompletionBudget in compaction path #834

@li-xiu-qi

Description

@li-xiu-qi

Compaction fails with 400 when using third-party model with large context window

What version of Kimi Code is running?

0.16.0

Which open platform/subscription were you using?

Self-configured third-party provider (ZhiPu / GLM 5.2 via OpenAI-compatible API)

Which model were you using?

zhipu/glm-5.2 (GLM 5.2, 1,000,000 context window, 131,072 max output)

What platform is your computer?

Windows 11 x64

What issue are you seeing?

When context compaction is triggered, the LLM API returns a 400 error:

Compaction cancelled
Error: [compaction.failed] APIStatusError: 400
max_tokens参数非法:限制数值范围[1,131072]

The root cause is that the compaction path in full.ts does not pass maxOutputSize to resolveCompletionBudget(), so the computed max_completion_tokens cap falls back to max_context_tokens (1,000,000 in this case), which exceeds the API's hard limit of 131,072.

What steps can reproduce the bug?

  1. Configure a third-party model with a large context window and a smaller max output limit:
[models."zhipu/glm-5.2"]
provider = "zhipu"
model = "glm-5.2"
max_context_size = 1000000
max_output_size = 131072
capabilities = [ "thinking" ]

[loop_control]
reserved_context_size = 50000
compaction_trigger_ratio = 0.99
  1. Set this model as default_model.

  2. Use Kimi Code normally until context compaction is triggered.

  3. Observe the 400 error: max_tokens exceeds the allowed range [1, 131072].

What is the expected behavior?

Compaction should respect max_output_size as a hard cap on max_completion_tokens, just like the main loop does.

Root Cause Analysis

There are two code paths that call resolveCompletionBudget():

Main loop path (correct) — packages/agent-core/src/agent/index.ts lines 221-224:

const completionBudgetConfig = resolveCompletionBudget({
  maxOutputSize: this.config.maxOutputSize,      // ✅ passes maxOutputSize
  reservedContextSize: loopControl?.reservedContextSize,
});

Compaction path (buggy) — packages/agent-core/src/agent/compaction/full.ts lines 264-270:

const provider = applyCompletionBudget({
  provider: this.agent.config.provider,
  budget: resolveCompletionBudget({
    reservedContextSize: this.agent.kimiConfig?.loopControl?.reservedContextSize,
    // ❌ maxOutputSize is NOT passed
  }),
  capability: this.agent.config.modelCapabilities,
});

Because maxOutputSize is not passed, resolveCompletionBudget() skips the hardCap branch (line 32-34 of completion-budget.ts) and falls through to reservedContextSize as fallback. Then in computeCompletionBudgetCap() (line 62-64), since there is no hardCap, it uses max_context_tokens (1,000,000) instead:

const cap =
  args.budget.hardCap ??
  (maxCtx > 0 ? maxCtx : args.budget.fallback ?? DEFAULT_UNKNOWN_CONTEXT_FALLBACK);

This results in max_completion_tokens = 1,000,000 being sent to the API, which rejects it.

Suggested Fix

In packages/agent-core/src/agent/compaction/full.ts, add maxOutputSize to the resolveCompletionBudget() call:

const provider = applyCompletionBudget({
  provider: this.agent.config.provider,
  budget: resolveCompletionBudget({
    maxOutputSize: this.agent.config.maxOutputSize,  // ← add this line
    reservedContextSize: this.agent.kimiConfig?.loopControl?.reservedContextSize,
  }),
  capability: this.agent.config.modelCapabilities,
});

Workaround

Set the environment variable KIMI_MODEL_MAX_COMPLETION_TOKENS=131072 — this env var has the highest priority in resolveCompletionBudget() and acts as a hardCap for both paths.

Additional information

  • This bug affects third-party models with large context windows (e.g., 1M tokens) where max_context_size greatly exceeds max_output_size. Kimi's own models (e.g., K2.7 with 262K context) are less likely to trigger it.
  • The code comment in computeCompletionBudgetCap() says "The provider backend computes the safe request-specific value from the serialized prompt" — but this assumption does not hold for third-party OpenAI-compatible providers that enforce a strict max_tokens limit.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions