feat: batch queue for Anthropic Message Batches API (50% worker cost savings)#134
Merged
feat: batch queue for Anthropic Message Batches API (50% worker cost savings)#134
Conversation
When the prompt cache goes cold after idle, reduce the cold-cache write cost by distilling aggressively and using a smaller context window: - Force-distill ALL pending messages on idle (force: true), even below the normal minMessages threshold. The cache is expiring anyway, and distilling now means less raw content in the next context. - Allow meta-distillation on idle unconditionally — cache is cold so the row ID rewrites don't cause additional cache busts. - Add post-idle compact layer: when onIdleResume() fires, skip layer 0 (full-raw passthrough) and use a tighter raw budget (20% of usable instead of 40%) for layer 1. The distilled prefix covers the older history; the raw window only needs the current turn + minimal recent context. This reduces the total cold-cache write cost by up to 20% of usable (~29K tokens on a 200K context model). - Add postIdleCompact flag to SessionState (one-shot, consumed by transformInner). Exposed in inspectSessionState for test visibility.
…savings) Add BatchLLMClient wrapper in gateway that accumulates non-urgent worker LLM calls (distillation, curation, consolidation, validation) and submits them via Anthropic's Message Batches API for 50% cost reduction. Key changes: - Add urgent flag to LLMClient.prompt() opts for batch/immediate routing - Thread urgent through distillation.run(), distillSegment(), metaDistill() - Mark compaction, overflow recovery, and query expansion as urgent - Background incremental distillation, idle curation, and worker validation are batch-safe (urgent unset) - Flush timer (30s) + auto-flush at queue capacity (50 items) - Poll timer (60s) checks batch status, streams JSONL results - Fallback to synchronous on batch API errors or missing API key - Graceful shutdown drains queue synchronously - Disable via LORE_BATCH_DISABLED=1 env var - 10 dedicated tests for batch queue behavior
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Gateway-only enhancement: accumulate non-urgent worker LLM calls and submit them via Anthropic's Message Batches API for 50% cost reduction on all background work.
BatchLLMClientwrapper (batch-queue.ts) interceptsLLMClient.prompt()calls. Non-urgent calls queue up; urgent calls bypass to the inner (synchronous) client immediately.urgentflag onLLMClient.prompt()opts — threaded throughdistillation.run(),distillSegment(),metaDistill(), and all gateway pipeline callers.How it works
Fallback safety
LORE_BATCH_DISABLED=1Cost impact
Files changed
packages/core/src/types.ts— addurgent?: booleantoLLMClient.prompt()optspackages/core/src/distillation.ts— threadurgentthroughrun(),distillSegment(),metaDistill()packages/core/src/search.ts— markexpandQuery()asurgent: truepackages/gateway/src/batch-queue.ts— new BatchLLMClient wrapperpackages/gateway/src/pipeline.ts— wire batch queue intogetLLMClient(), mark urgent callerspackages/gateway/src/index.ts— graceful shutdown viaresetPipelineState()packages/gateway/test/batch-queue.test.ts— new 10 dedicated testspackages/gateway/test/helpers/harness.ts— async teardown for batch queue cleanup754 tests pass, 0 failures.