Skip to content

feat: post-idle compact layer + aggressive idle distillation#133

Closed
BYK wants to merge 2 commits intomainfrom
cache-cost-optimization
Closed

feat: post-idle compact layer + aggressive idle distillation#133
BYK wants to merge 2 commits intomainfrom
cache-cost-optimization

Conversation

@BYK
Copy link
Copy Markdown
Owner

@BYK BYK commented May 6, 2026

Summary

Follow-up to #132. When the prompt cache goes cold after idle, reduce the cold-cache write cost by distilling aggressively and using a smaller context window on resume.

  • Force-distill ALL pending messages on idle (force: true), even below the normal minMessages threshold. The cache is expiring anyway — distilling now means less raw content in the next context.
  • Allow meta-distillation on idle unconditionally — cache is cold so row ID rewrites don't cause additional busts.
  • Post-idle compact layer: when onIdleResume() fires, skip layer 0 (full-raw passthrough) and use a tighter raw budget (20% of usable instead of 40%) for layer 1. The distilled prefix covers the older history; the raw window only needs the current turn + minimal recent context.

How it works

  1. Session goes idle → session.idle fires → backgroundDistill(sessionID, true) force-distills everything + meta-distills
  2. Cache TTL expires (5m default) → user returns → onIdleResume() clears caches + sets postIdleCompact = true
  3. transformInner() sees postIdleCompact → skips layer 0, uses rawBudget = usable * 0.20 instead of 0.40
  4. Result: distilled prefix (compact history) + tight raw window (current turn) = smaller total context = cheaper cold-cache write

Cost impact

Metric Before (cold resume) After (cold resume)
Raw budget 40% of usable (~58K) 20% of usable (~29K)
Undistilled messages on resume Up to minMessages - 1 (4) 0 (force-distilled on idle)
Meta-distill consolidation Skipped when cache warm Always on idle
Total context written Full raw window + prefix Tight raw window + consolidated prefix

Files changed

  • packages/core/src/gradient.tspostIdleCompact flag on SessionState, set by onIdleResume(), consumed by transformInner() to skip layer 0 and use 20% raw budget
  • packages/opencode/src/index.ts — idle handler passes force: true to backgroundDistill()
  • packages/gateway/src/idle.ts — idle handler force-distills with force: true, no threshold gate
  • .lore.md — updated knowledge entries reflecting implemented fixes

744 tests pass, 0 failures.

@BYK BYK enabled auto-merge (squash) May 6, 2026 17:29
BYK added 2 commits May 6, 2026 19:41
When the prompt cache goes cold after idle, reduce the cold-cache write
cost by distilling aggressively and using a smaller context window:

- Force-distill ALL pending messages on idle (force: true), even below
  the normal minMessages threshold. The cache is expiring anyway, and
  distilling now means less raw content in the next context.

- Allow meta-distillation on idle unconditionally — cache is cold so
  the row ID rewrites don't cause additional cache busts.

- Add post-idle compact layer: when onIdleResume() fires, skip layer 0
  (full-raw passthrough) and use a tighter raw budget (20% of usable
  instead of 40%) for layer 1. The distilled prefix covers the older
  history; the raw window only needs the current turn + minimal recent
  context. This reduces the total cold-cache write cost by up to 20%
  of usable (~29K tokens on a 200K context model).

- Add postIdleCompact flag to SessionState (one-shot, consumed by
  transformInner). Exposed in inspectSessionState for test visibility.
…savings)

Add BatchLLMClient wrapper in gateway that accumulates non-urgent worker
LLM calls (distillation, curation, consolidation, validation) and submits
them via Anthropic's Message Batches API for 50% cost reduction.

Key changes:
- Add urgent flag to LLMClient.prompt() opts for batch/immediate routing
- Thread urgent through distillation.run(), distillSegment(), metaDistill()
- Mark compaction, overflow recovery, and query expansion as urgent
- Background incremental distillation, idle curation, and worker validation
  are batch-safe (urgent unset)
- Flush timer (30s) + auto-flush at queue capacity (50 items)
- Poll timer (60s) checks batch status, streams JSONL results
- Fallback to synchronous on batch API errors or missing API key
- Graceful shutdown drains queue synchronously
- Disable via LORE_BATCH_DISABLED=1 env var
- 10 dedicated tests for batch queue behavior
@BYK BYK force-pushed the cache-cost-optimization branch from 6e8320a to 1aa6e9c Compare May 6, 2026 19:41
@BYK
Copy link
Copy Markdown
Owner Author

BYK commented May 6, 2026

Superseded by #134 which includes both post-idle compact layer and batch API integration on a rebased branch.

@BYK BYK closed this May 6, 2026
auto-merge was automatically disabled May 6, 2026 19:41

Pull request was closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant