feat: compile-on-ingest pipeline with SQLite status tracking and token optimizations#58
Merged
keeganthomp merged 2 commits intomainfrom Apr 11, 2026
Merged
Conversation
…n optimizations Replaces the batch "ingest N sources, then compile" model with an inline pipeline that processes each source end-to-end (extract → ingest → compile) immediately upon arrival. Key changes: - SQLite pipeline DB (.kb/pipeline.db) tracks source lifecycle in real-time: queued → extracting → ingested → compiling → compiled → enriched Uses bun:sqlite with WAL mode for concurrent reads during compilation. - Compile-on-ingest: watch daemon now compiles each source inline instead of accumulating a batch. Cross-reference enrichment is batched separately. - Anthropic prompt caching: system prompts marked with cache_control: ephemeral so repeated compilations reuse cached system prompts server-side. - Compact topic map: replaces full INDEX.md context with a dense slug[tags] representation, saving ~40-70% input tokens per compilation. - Relevant-only article context: instead of loading all articles a source previously produced, scores articles by tag overlap and sends only relevant ones, reducing context waste. - Fast model routing: short/simple sources (< 2000 words, no complex structure) automatically use the fast model for lower cost and latency. - New `kib sources` CLI command shows pipeline status with color-coded lifecycle indicators, token usage, and article counts. - Source status field added to manifest schema for portability. 524 tests pass (27 new), all lint checks pass. https://claude.ai/code/session_01Ta23zoCERDxSnhCjqvzuQ1
Implements caveman-style text compression (inspired by JuliusBrussee/caveman and wilpel/caveman-compression) to minimize LLM token usage during compilation. Key changes: - Caveman compression module (compile/caveman.ts): Strips articles, filler words, hedging phrases, weak verbs, verbose connectors, and redundant expressions from text while preserving code blocks, URLs, file paths, YAML frontmatter, wikilinks, and all technical terms. Zero dependencies, pure regex/string ops. - Compressed system prompt: Rewritten in telegraphic style — 807 chars (202 tokens) down from ~1300 chars (325 tokens) = 38% reduction. Brevity constraints improve LLM accuracy per 2026 research. - Source content compression: Raw sources caveman-compressed before LLM call. 16-27% savings depending on prose density. Technical content untouched. - Article context compression: Existing articles sent as context also compressed, saving ~20% on context tokens. - Compressed prompt structure: Section headers shortened (CURRENT WIKI INDEX → INDEX, EXISTING ARTICLES THAT MAY NEED UPDATES → EXISTING, etc.) - Enrichment prompt compressed similarly. Combined savings stack (all optimizations from both commits): Old: ~3575 tokens/compilation → New: ~2582 tokens/compilation = 28% reduction + Anthropic prompt caching (system prompt reused server-side) + Fast model routing (short sources use cheaper model) + Relevant-only article context (fewer articles sent) 546 tests pass (22 new caveman tests), all lint checks pass. https://claude.ai/code/session_01Ta23zoCERDxSnhCjqvzuQ1
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Replaces the batch "ingest N sources, then compile" model with an inline
pipeline that processes each source end-to-end (extract → ingest → compile)
immediately upon arrival.
Key changes:
SQLite pipeline DB (.kb/pipeline.db) tracks source lifecycle in real-time:
queued → extracting → ingested → compiling → compiled → enriched
Uses bun:sqlite with WAL mode for concurrent reads during compilation.
Compile-on-ingest: watch daemon now compiles each source inline instead of
accumulating a batch. Cross-reference enrichment is batched separately.
Anthropic prompt caching: system prompts marked with cache_control: ephemeral
so repeated compilations reuse cached system prompts server-side.
Compact topic map: replaces full INDEX.md context with a dense slug[tags]
representation, saving ~40-70% input tokens per compilation.
Relevant-only article context: instead of loading all articles a source
previously produced, scores articles by tag overlap and sends only relevant
ones, reducing context waste.
Fast model routing: short/simple sources (< 2000 words, no complex structure)
automatically use the fast model for lower cost and latency.
New
kib sourcesCLI command shows pipeline status with color-codedlifecycle indicators, token usage, and article counts.
Source status field added to manifest schema for portability.
524 tests pass (27 new), all lint checks pass.
https://claude.ai/code/session_01Ta23zoCERDxSnhCjqvzuQ1