Skip to content

fix(agent-core): cap compaction output tokens when maxOutputSize is undefined#841

Open
li-xiu-qi wants to merge 1 commit into
MoonshotAI:mainfrom
li-xiu-qi:fix/compaction-output-token-cap
Open

fix(agent-core): cap compaction output tokens when maxOutputSize is undefined#841
li-xiu-qi wants to merge 1 commit into
MoonshotAI:mainfrom
li-xiu-qi:fix/compaction-output-token-cap

Conversation

@li-xiu-qi

@li-xiu-qi li-xiu-qi commented Jun 17, 2026

Copy link
Copy Markdown

Related Issue

Resolve #834

Problem

The compaction worker in full.ts does not pass maxOutputSize to resolveCompletionBudget(). As a result, computeCompletionBudgetCap() always falls back to using the full max_context_tokens as max_completion_tokens — regardless of whether maxOutputSize is configured.

Since the compaction prompt itself is large (it contains the entire history being compacted), input_tokens + max_tokens exceeds the context window, and providers that do not auto-clamp max_tokens server-side reject with 400 APIContextOverflowError.

Reported in #476, #794, and #834. PR #482 partially addresses this by passing maxOutputSize, but does not cover the case where maxOutputSize is undefined (the default for most models).

What changed

  • Pass maxOutputSize to resolveCompletionBudget in full.ts compactionRound() — aligns with the main loop in index.ts (same as PR fix(compaction): pass maxOutputSize to resolveCompletionBudget #482)
  • Add a conservative fallback cap when maxOutputSize is undefined: Math.min(Math.floor(maxCtx / 4), 8192). Compaction is a summarization operation; 8192 tokens is generous for a high-quality summary while preventing overflow on any provider. This covers the gap left by PR fix(compaction): pass maxOutputSize to resolveCompletionBudget #482.
  • Add 6 unit tests in completion-budget.test.ts covering the compaction budget resolution scenarios
  • Add a verification test (compaction-overflow-verification.test.ts) that demonstrates the overflow before and after the fix using three real model configurations

Checklist

Verification

Unit tests

  • npx vitest run packages/agent-core/test/utils/completion-budget.test.ts27/27 passed (21 existing + 6 new)
  • npx vitest run packages/agent-core/test/utils/compaction-overflow-verification.test.ts3/3 passed
  • npx oxlint packages/agent-core/src/agent/compaction/full.ts packages/agent-core/test/utils/completion-budget.test.ts packages/agent-core/test/utils/compaction-overflow-verification.test.ts --type-aware0 warnings, 0 errors

Overflow verification: before vs after fix

The verification test simulates the budget resolution logic from the original and patched compactionRound() using three real model configurations. With a typical compaction input of ~80K tokens:

Model max_context maxOutputSize Original max_tokens Original overflow? Patched max_tokens Patched overflow?
step-3.7-flash 256,000 undefined 256,000 YES ❌ (336K > 256K) 8,192 NO ✅ (88K)
kimi-for-coding 262,144 undefined 262,144 YES ❌ (342K > 262K) 8,192 NO ✅ (88K)
glm-5.2 1,000,000 131,072 1,000,000 YES ❌ (1.08M > 1M) 131,072 NO ✅ (211K)

Key finding: the original code overflows for ALL three models — even when maxOutputSize is explicitly configured (glm-5.2), because the compaction path never passes it to resolveCompletionBudget.

Reproducible error example

Using stepfun/step-3.7-flash (max_context=256K, no maxOutputSize configured) with a conversation large enough to trigger compaction:

Error: [compaction.failed] APIContextOverflowError: 400
{"detail":"{\"error\":{\"message\":\"This model's maximum context length is 256000 tokens. However, you requested 176000 output tokens and your prompt contains at least 80000 input tokens, for a total of at least 256000 tokens.\",\"type\":\"BadRequestError\",\"param\":\"input_tokens\",\"code\":400}}"}

With this fix, max_completion_tokens is capped at 8,192 instead of 256,000, so 80,000 + 8,192 = 88,192 stays well within the 256K window.

Relationship to PR #482

This PR is complementary to #482:

Both PRs can coexist — if #482 merges first, the fallback cap in this PR is still independently valuable. If this PR merges first, #482 becomes a no-op.

@changeset-bot

changeset-bot Bot commented Jun 17, 2026

Copy link
Copy Markdown

🦋 Changeset detected

Latest commit: 6402aed

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
@moonshot-ai/agent-core Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@li-xiu-qi li-xiu-qi force-pushed the fix/compaction-output-token-cap branch from 06b785e to 388f9b2 Compare June 17, 2026 10:29
…ndefined

The compaction worker in full.ts was not passing maxOutputSize to
resolveCompletionBudget, causing computeCompletionBudgetCap to fall
back to the full context window size as max_completion_tokens. When
maxOutputSize is also undefined (the default for most models), this
results in max_tokens equal to max_context_tokens, which causes
APIContextOverflowError on providers that do not auto-clamp
max_tokens server-side.

This change:
- Passes maxOutputSize to resolveCompletionBudget (aligning with the
  main loop in index.ts, same as PR MoonshotAI#482)
- Adds a conservative fallback cap of min(maxCtx/4, 8192) when
  maxOutputSize is undefined, ensuring compaction never requests the
  full context window as output tokens
- Adds tests covering the compaction budget resolution scenarios

Resolve MoonshotAI#834
@li-xiu-qi li-xiu-qi force-pushed the fix/compaction-output-token-cap branch from 388f9b2 to 6402aed Compare June 17, 2026 11:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Compaction fails with 400: maxOutputSize not passed to resolveCompletionBudget in compaction path

1 participant