Skip to content

feat: compile-on-ingest pipeline with SQLite status tracking and token optimizations#58

Merged
keeganthomp merged 2 commits intomainfrom
claude/optimize-compilation-performance-XhdcD
Apr 11, 2026
Merged

feat: compile-on-ingest pipeline with SQLite status tracking and token optimizations#58
keeganthomp merged 2 commits intomainfrom
claude/optimize-compilation-performance-XhdcD

Conversation

@keeganthomp
Copy link
Copy Markdown
Owner

Replaces the batch "ingest N sources, then compile" model with an inline
pipeline that processes each source end-to-end (extract → ingest → compile)
immediately upon arrival.

Key changes:

  • SQLite pipeline DB (.kb/pipeline.db) tracks source lifecycle in real-time:
    queued → extracting → ingested → compiling → compiled → enriched
    Uses bun:sqlite with WAL mode for concurrent reads during compilation.

  • Compile-on-ingest: watch daemon now compiles each source inline instead of
    accumulating a batch. Cross-reference enrichment is batched separately.

  • Anthropic prompt caching: system prompts marked with cache_control: ephemeral
    so repeated compilations reuse cached system prompts server-side.

  • Compact topic map: replaces full INDEX.md context with a dense slug[tags]
    representation, saving ~40-70% input tokens per compilation.

  • Relevant-only article context: instead of loading all articles a source
    previously produced, scores articles by tag overlap and sends only relevant
    ones, reducing context waste.

  • Fast model routing: short/simple sources (< 2000 words, no complex structure)
    automatically use the fast model for lower cost and latency.

  • New kib sources CLI command shows pipeline status with color-coded
    lifecycle indicators, token usage, and article counts.

  • Source status field added to manifest schema for portability.

524 tests pass (27 new), all lint checks pass.

https://claude.ai/code/session_01Ta23zoCERDxSnhCjqvzuQ1

claude added 2 commits April 11, 2026 12:05
…n optimizations

Replaces the batch "ingest N sources, then compile" model with an inline
pipeline that processes each source end-to-end (extract → ingest → compile)
immediately upon arrival.

Key changes:

- SQLite pipeline DB (.kb/pipeline.db) tracks source lifecycle in real-time:
  queued → extracting → ingested → compiling → compiled → enriched
  Uses bun:sqlite with WAL mode for concurrent reads during compilation.

- Compile-on-ingest: watch daemon now compiles each source inline instead of
  accumulating a batch. Cross-reference enrichment is batched separately.

- Anthropic prompt caching: system prompts marked with cache_control: ephemeral
  so repeated compilations reuse cached system prompts server-side.

- Compact topic map: replaces full INDEX.md context with a dense slug[tags]
  representation, saving ~40-70% input tokens per compilation.

- Relevant-only article context: instead of loading all articles a source
  previously produced, scores articles by tag overlap and sends only relevant
  ones, reducing context waste.

- Fast model routing: short/simple sources (< 2000 words, no complex structure)
  automatically use the fast model for lower cost and latency.

- New `kib sources` CLI command shows pipeline status with color-coded
  lifecycle indicators, token usage, and article counts.

- Source status field added to manifest schema for portability.

524 tests pass (27 new), all lint checks pass.

https://claude.ai/code/session_01Ta23zoCERDxSnhCjqvzuQ1
Implements caveman-style text compression (inspired by JuliusBrussee/caveman
and wilpel/caveman-compression) to minimize LLM token usage during compilation.

Key changes:

- Caveman compression module (compile/caveman.ts): Strips articles, filler
  words, hedging phrases, weak verbs, verbose connectors, and redundant
  expressions from text while preserving code blocks, URLs, file paths,
  YAML frontmatter, wikilinks, and all technical terms. Zero dependencies,
  pure regex/string ops.

- Compressed system prompt: Rewritten in telegraphic style — 807 chars (202
  tokens) down from ~1300 chars (325 tokens) = 38% reduction. Brevity
  constraints improve LLM accuracy per 2026 research.

- Source content compression: Raw sources caveman-compressed before LLM call.
  16-27% savings depending on prose density. Technical content untouched.

- Article context compression: Existing articles sent as context also
  compressed, saving ~20% on context tokens.

- Compressed prompt structure: Section headers shortened (CURRENT WIKI INDEX
  → INDEX, EXISTING ARTICLES THAT MAY NEED UPDATES → EXISTING, etc.)

- Enrichment prompt compressed similarly.

Combined savings stack (all optimizations from both commits):
  Old: ~3575 tokens/compilation → New: ~2582 tokens/compilation = 28% reduction
  + Anthropic prompt caching (system prompt reused server-side)
  + Fast model routing (short sources use cheaper model)
  + Relevant-only article context (fewer articles sent)

546 tests pass (22 new caveman tests), all lint checks pass.

https://claude.ai/code/session_01Ta23zoCERDxSnhCjqvzuQ1
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 11, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
kib Ready Ready Preview, Comment Apr 11, 2026 1:04pm

Request Review

@keeganthomp keeganthomp merged commit 65deb05 into main Apr 11, 2026
3 checks passed
@keeganthomp keeganthomp deleted the claude/optimize-compilation-performance-XhdcD branch April 11, 2026 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants