feat: post-idle compact layer + aggressive idle distillation#133
Closed
feat: post-idle compact layer + aggressive idle distillation#133
Conversation
When the prompt cache goes cold after idle, reduce the cold-cache write cost by distilling aggressively and using a smaller context window: - Force-distill ALL pending messages on idle (force: true), even below the normal minMessages threshold. The cache is expiring anyway, and distilling now means less raw content in the next context. - Allow meta-distillation on idle unconditionally — cache is cold so the row ID rewrites don't cause additional cache busts. - Add post-idle compact layer: when onIdleResume() fires, skip layer 0 (full-raw passthrough) and use a tighter raw budget (20% of usable instead of 40%) for layer 1. The distilled prefix covers the older history; the raw window only needs the current turn + minimal recent context. This reduces the total cold-cache write cost by up to 20% of usable (~29K tokens on a 200K context model). - Add postIdleCompact flag to SessionState (one-shot, consumed by transformInner). Exposed in inspectSessionState for test visibility.
…savings) Add BatchLLMClient wrapper in gateway that accumulates non-urgent worker LLM calls (distillation, curation, consolidation, validation) and submits them via Anthropic's Message Batches API for 50% cost reduction. Key changes: - Add urgent flag to LLMClient.prompt() opts for batch/immediate routing - Thread urgent through distillation.run(), distillSegment(), metaDistill() - Mark compaction, overflow recovery, and query expansion as urgent - Background incremental distillation, idle curation, and worker validation are batch-safe (urgent unset) - Flush timer (30s) + auto-flush at queue capacity (50 items) - Poll timer (60s) checks batch status, streams JSONL results - Fallback to synchronous on batch API errors or missing API key - Graceful shutdown drains queue synchronously - Disable via LORE_BATCH_DISABLED=1 env var - 10 dedicated tests for batch queue behavior
6e8320a to
1aa6e9c
Compare
Owner
Author
|
Superseded by #134 which includes both post-idle compact layer and batch API integration on a rebased branch. |
auto-merge was automatically disabled
May 6, 2026 19:41
Pull request was closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to #132. When the prompt cache goes cold after idle, reduce the cold-cache write cost by distilling aggressively and using a smaller context window on resume.
force: true), even below the normalminMessagesthreshold. The cache is expiring anyway — distilling now means less raw content in the next context.onIdleResume()fires, skip layer 0 (full-raw passthrough) and use a tighter raw budget (20% of usable instead of 40%) for layer 1. The distilled prefix covers the older history; the raw window only needs the current turn + minimal recent context.How it works
session.idlefires →backgroundDistill(sessionID, true)force-distills everything + meta-distillsonIdleResume()clears caches + setspostIdleCompact = truetransformInner()seespostIdleCompact→ skips layer 0, usesrawBudget = usable * 0.20instead of0.40Cost impact
minMessages - 1(4)Files changed
packages/core/src/gradient.ts—postIdleCompactflag on SessionState, set byonIdleResume(), consumed bytransformInner()to skip layer 0 and use 20% raw budgetpackages/opencode/src/index.ts— idle handler passesforce: truetobackgroundDistill()packages/gateway/src/idle.ts— idle handler force-distills withforce: true, no threshold gate.lore.md— updated knowledge entries reflecting implemented fixes744 tests pass, 0 failures.