feat: post-idle compact layer + aggressive idle distillation by BYK · Pull Request #133 · BYK/loreai

BYK · 2026-05-06T17:29:24Z

Summary

Follow-up to #132. When the prompt cache goes cold after idle, reduce the cold-cache write cost by distilling aggressively and using a smaller context window on resume.

Force-distill ALL pending messages on idle (force: true), even below the normal minMessages threshold. The cache is expiring anyway — distilling now means less raw content in the next context.
Allow meta-distillation on idle unconditionally — cache is cold so row ID rewrites don't cause additional busts.
Post-idle compact layer: when onIdleResume() fires, skip layer 0 (full-raw passthrough) and use a tighter raw budget (20% of usable instead of 40%) for layer 1. The distilled prefix covers the older history; the raw window only needs the current turn + minimal recent context.

How it works

Session goes idle → session.idle fires → backgroundDistill(sessionID, true) force-distills everything + meta-distills
Cache TTL expires (5m default) → user returns → onIdleResume() clears caches + sets postIdleCompact = true
transformInner() sees postIdleCompact → skips layer 0, uses rawBudget = usable * 0.20 instead of 0.40
Result: distilled prefix (compact history) + tight raw window (current turn) = smaller total context = cheaper cold-cache write

Cost impact

Metric	Before (cold resume)	After (cold resume)
Raw budget	40% of usable (~58K)	20% of usable (~29K)
Undistilled messages on resume	Up to `minMessages - 1` (4)	0 (force-distilled on idle)
Meta-distill consolidation	Skipped when cache warm	Always on idle
Total context written	Full raw window + prefix	Tight raw window + consolidated prefix

Files changed

packages/core/src/gradient.ts — postIdleCompact flag on SessionState, set by onIdleResume(), consumed by transformInner() to skip layer 0 and use 20% raw budget
packages/opencode/src/index.ts — idle handler passes force: true to backgroundDistill()
packages/gateway/src/idle.ts — idle handler force-distills with force: true, no threshold gate
.lore.md — updated knowledge entries reflecting implemented fixes

744 tests pass, 0 failures.

When the prompt cache goes cold after idle, reduce the cold-cache write cost by distilling aggressively and using a smaller context window: - Force-distill ALL pending messages on idle (force: true), even below the normal minMessages threshold. The cache is expiring anyway, and distilling now means less raw content in the next context. - Allow meta-distillation on idle unconditionally — cache is cold so the row ID rewrites don't cause additional cache busts. - Add post-idle compact layer: when onIdleResume() fires, skip layer 0 (full-raw passthrough) and use a tighter raw budget (20% of usable instead of 40%) for layer 1. The distilled prefix covers the older history; the raw window only needs the current turn + minimal recent context. This reduces the total cold-cache write cost by up to 20% of usable (~29K tokens on a 200K context model). - Add postIdleCompact flag to SessionState (one-shot, consumed by transformInner). Exposed in inspectSessionState for test visibility.

…savings) Add BatchLLMClient wrapper in gateway that accumulates non-urgent worker LLM calls (distillation, curation, consolidation, validation) and submits them via Anthropic's Message Batches API for 50% cost reduction. Key changes: - Add urgent flag to LLMClient.prompt() opts for batch/immediate routing - Thread urgent through distillation.run(), distillSegment(), metaDistill() - Mark compaction, overflow recovery, and query expansion as urgent - Background incremental distillation, idle curation, and worker validation are batch-safe (urgent unset) - Flush timer (30s) + auto-flush at queue capacity (50 items) - Poll timer (60s) checks batch status, streams JSONL results - Fallback to synchronous on batch API errors or missing API key - Graceful shutdown drains queue synchronously - Disable via LORE_BATCH_DISABLED=1 env var - 10 dedicated tests for batch queue behavior

BYK · 2026-05-06T19:41:44Z

Superseded by #134 which includes both post-idle compact layer and batch API integration on a rebased branch.

BYK enabled auto-merge (squash) May 6, 2026 17:29

BYK added 2 commits May 6, 2026 19:41

BYK force-pushed the cache-cost-optimization branch from 6e8320a to 1aa6e9c Compare May 6, 2026 19:41

BYK closed this May 6, 2026

auto-merge was automatically disabled May 6, 2026 19:41
Pull request was closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: post-idle compact layer + aggressive idle distillation#133

feat: post-idle compact layer + aggressive idle distillation#133
BYK wants to merge 2 commits intomainfrom
cache-cost-optimization

BYK commented May 6, 2026

Uh oh!

BYK commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

BYK commented May 6, 2026

Summary

How it works

Cost impact

Files changed

Uh oh!

BYK commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant