feat(rag): Phase 1.5 — ChatRAGBuilder consumer ordering (#918)#922
feat(rag): Phase 1.5 — ChatRAGBuilder consumer ordering (#918)#922
Conversation
Adds PromptTier enum (INVARIANT / SEMI_STABLE / VOLATILE) and makes every RAGSource declare its tier. RAGComposer sorts collected sections deterministically by (tier, sourceName) before returning. Why: today the composer's parallel section assembly produces a different byte order on every chat call. llama-server / DMR's prefix-KV-cache reuse never fires, so each turn reprocesses the full 14k-token prompt from scratch (~35s prompt eval at 400 tok/s). With deterministic ordering AND stable bytes within each tier, the unchanging INVARIANT prefix gets reused — only the VOLATILE suffix needs evaluation. Expected: ~70× faster prompt eval per turn for repeat-context turns. Architecture (per docs/architecture/MULTIMODAL-WORKER-AND-PREFIX-REUSE.md): - INVARIANT: persona identity, tool definitions, recipe rules, docs (PersonaIdentity, ToolDefinitions, CodeTool, Documentation, ToolMethodology, ProjectContext) - SEMI_STABLE: history, memories, participants, governance — append-only (ConversationHistory, LiveRoomAwareness, Governance, OpenProposals, SentinelAwareness, GlobalAwareness, SocialMediaRAG, SemanticMemory) - VOLATILE: latest message, audio chunks, current activity, UI state (ActivityContext, CodebaseSearch, MediaArtifact, VoiceConversation, WidgetContext) Implementation note: tier is a class-level declaration on each RAGSource (required field, no Option<>). Sources return Omit<RAGSection, 'tier'> from load() and fromBatchResult(); RAGComposer injects the source's declared tier when wrapping the section. Single-source-of-truth classification per source — no per-return-statement repetition. Phases 2 (slot pinning) and 3 (composition cache) build on this. Phase 4 (multimodal content parts) depends on #917 ModelMetadata. tsc clean. Branch: feature/prefix-reuse-and-multimodal off main. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…boot CodebaseIndexer ran 64-batches back-to-back with NO yield between batches. Each batch ~1.5s + ~80MB RSS growth. With 5000+ chunks in src/, that's 78+ batches × 1.5s = 2+ minutes of total event-loop saturation immediately after every boot. Local personas couldn't respond, voice couldn't connect, anything that needed the bus was blocked until indexing finished. Two changes: - Batch size 64→16 (smaller per-batch RSS hit, ~4× more chances for other IO to interleave between IPC roundtrips) - 50ms pause between batches via setTimeout (yields the event loop so chat/voice/personas can process while indexing runs) The throughput cost is small (16 vs 64 chunks per IPC) and the inter-batch pause is invisible at human timescales. The chat-arrival latency win is huge — system is responsive within seconds of boot instead of minutes. The deeper fix is querying GpuPressureWatcher / ResourcePressureWatcher before each batch and backing off when pressure is high — same principle Joel called out for InferenceCoordinator slot capacity. That's a follow-up; this is the floor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ability (#918) Phase 1 (already shipped in PR #920) sorted RAGComposer's section list by (tier, sourceName). This commit makes ChatRAGBuilder respect that order when assembling the final prompt string, so the byte-prefix actually IS stable end-to-end. Three reorderings in section 2.4 of buildContext(): 1. Tool definitions injection moved from end to start (after identity). Tool defs are INVARIANT — they belong in the byte-stable prefix region, not after VOLATILE content. 2. The generic source loop already iterates Map in insertion order, which equals tier-sorted order from extractFromComposition (which inserts in result.sections order, which Phase 1 sorted). So the loop now produces INVARIANT → SEMI_STABLE → VOLATILE content automatically — no per-section sorting needed. 3. HumanPresenceTracker injection moved from before-the-loop to after-the-loop. Presence is volatile (changes when users switch rooms) and must live in the suffix, never in the byte-stable prefix. Final assembly order: identity (INVARIANT, from PersonaIdentitySource) → tool definitions (INVARIANT) → loop in tier order (INVARIANT remaining → SEMI_STABLE → VOLATILE) → human presence (VOLATILE) → conversation history (already separate, lives in messages array) Net effect for prefix-reuse: with the same persona+recipe, the INVARIANT region of the prompt is byte-identical across thousands of turns. llama-server / DMR's prefix-KV-cache match fires on the INVARIANT prefix; only the VOLATILE suffix gets reprocessed. Combined with future per-persona slot pinning (Phase 2), this is the ~70× prompt-eval speedup the design doc promised. tsc clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR advances the prompt prefix-reuse work for Chat RAG by introducing tier-aware section ordering and then reordering consumer-side prompt assembly to keep volatile bytes in the suffix (to enable KV-cache prefix reuse).
Changes:
- Add
PromptTierand thread tier metadata throughRAGSource/RAGSection, injecting tier inRAGComposerand sorting sections deterministically by(tier, sourceName). - Update all RAG sources to declare a tier and to return
Omit<RAGSection, 'tier'>so the composer is the single tier authority. - Reorder
ChatRAGBuilderinjections (tool definitions earlier; human presence later) and throttle codebase indexing embedding batches to reduce startup event-loop starvation.
Reviewed changes
Copilot reviewed 24 out of 24 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/system/rag/builders/ChatRAGBuilder.ts | Reorders prompt injections to improve stable-prefix behavior (tool defs earlier; presence later). |
| src/system/rag/services/CodebaseIndexer.ts | Reduces embedding batch size and adds inter-batch pause to yield the event loop. |
| src/system/rag/shared/RAGComposer.ts | Injects tiers into sections and sorts sections deterministically by tier + name. |
| src/system/rag/shared/RAGSource.ts | Adds tier to RAGSource/RAGSection, changes load/fromBatchResult return types, and re-exports PromptTier. |
| src/system/rag/shared/RAGTypes.ts | Introduces PromptTier and documents tier ordering contract. |
| src/system/rag/sources/ActivityContextSource.ts | Declares tier and updates load() return type to omit tier. |
| src/system/rag/sources/CodeToolSource.ts | Declares tier and updates load()/helpers to omit tier. |
| src/system/rag/sources/CodebaseSearchSource.ts | Declares tier and updates load() return type to omit tier. |
| src/system/rag/sources/ConversationHistorySource.ts | Declares tier and updates load()/helpers to omit tier. |
| src/system/rag/sources/DocumentationSource.ts | Declares tier and updates load() return type to omit tier. |
| src/system/rag/sources/GlobalAwarenessSource.ts | Declares tier and updates load()/fromBatchResult()/helpers to omit tier. |
| src/system/rag/sources/GovernanceSource.ts | Declares tier and updates load() return type to omit tier. |
| src/system/rag/sources/LiveRoomAwarenessSource.ts | Declares tier and updates load()/helpers to omit tier. |
| src/system/rag/sources/MediaArtifactSource.ts | Declares tier and updates load() return type to omit tier. |
| src/system/rag/sources/OpenProposalsSource.ts | Declares tier and updates EMPTY_SECTION/load() to omit tier. |
| src/system/rag/sources/PersonaIdentitySource.ts | Declares tier and updates load()/helpers to omit tier. |
| src/system/rag/sources/ProjectContextSource.ts | Declares tier and updates cached section types + load()/helpers to omit tier. |
| src/system/rag/sources/SemanticMemorySource.ts | Declares tier and updates load()/fromBatchResult()/helpers to omit tier. |
| src/system/rag/sources/SentinelAwarenessSource.ts | Declares tier and updates load() return type to omit tier. |
| src/system/rag/sources/SocialMediaRAGSource.ts | Declares tier and updates load()/helpers to omit tier. |
| src/system/rag/sources/ToolDefinitionsSource.ts | Declares tier and updates load()/helpers to omit tier. |
| src/system/rag/sources/ToolMethodologySource.ts | Declares tier and updates load() return type to omit tier. |
| src/system/rag/sources/VoiceConversationSource.ts | Declares tier and updates load()/helpers to omit tier. |
| src/system/rag/sources/WidgetContextSource.ts | Declares tier and updates load()/helpers to omit tier. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| import type { RAGBuildOptions, LLMMessage, RAGArtifact, PersonaMemory, PersonaIdentity, RecipeStrategy } from './RAGTypes'; | ||
| import { PromptTier } from './RAGTypes'; | ||
|
|
||
| // Re-export so source files only need one import | ||
| export { PromptTier } from './RAGTypes'; |
There was a problem hiding this comment.
The file imports and re-exports PromptTier as a runtime symbol (import { PromptTier } ... + export { PromptTier } ...), but PromptTier is a const enum (erased at emit). In ESM this can cause a hard runtime failure when the re-exported binding doesn’t exist. Safer options: (a) make PromptTier a normal enum, or (b) remove the runtime re-export and have consumers import type { PromptTier } / use string literals.
| import type { RAGBuildOptions, LLMMessage, RAGArtifact, PersonaMemory, PersonaIdentity, RecipeStrategy } from './RAGTypes'; | |
| import { PromptTier } from './RAGTypes'; | |
| // Re-export so source files only need one import | |
| export { PromptTier } from './RAGTypes'; | |
| import type { RAGBuildOptions, LLMMessage, RAGArtifact, PersonaMemory, PersonaIdentity, RecipeStrategy, PromptTier } from './RAGTypes'; | |
| // Re-export so source files only need one import | |
| export type { PromptTier } from './RAGTypes'; |
| export class ToolDefinitionsSource implements RAGSource { | ||
| readonly name = 'tool-definitions'; | ||
| readonly tier = PromptTier.INVARIANT; | ||
| readonly priority = 45; | ||
| readonly defaultBudgetPercent = 10; |
There was a problem hiding this comment.
ToolDefinitionsSource is marked PromptTier.INVARIANT, but its output explicitly depends on context.options.currentMessage (contextual group selection + group hints). That makes the section change turn-to-turn, which defeats the stable-prefix goal and violates the tier contract. Either make tool definitions truly invariant (no currentMessage-dependent selection) or split into an invariant “tool catalog/specs” source and a volatile “tool hints for this turn” source (and mark this one SEMI_STABLE/VOLATILE accordingly).
| * Within each tier, sources are sorted by name (alphabetical) so the byte | ||
| * order is fully deterministic. |
There was a problem hiding this comment.
The PromptTier doc says “Within each tier, sources are sorted by name (alphabetical)”, but ChatRAGBuilder now explicitly hoists tool-definitions ahead of other INVARIANT sources. Either update this doc to reflect the actual consumer-side ordering rules, or move all ordering decisions into the composer so the documented contract matches the emitted prompt order.
| * Within each tier, sources are sorted by name (alphabetical) so the byte | |
| * order is fully deterministic. | |
| * Within a tier, source order must be deterministic. Consumers may apply | |
| * tier-specific ordering rules before any fallback alphabetical ordering; for | |
| * example, `tool-definitions` is hoisted ahead of other INVARIANT sources. |
| export const enum PromptTier { | ||
| /** Persona system prompt, recipe rules, role identity, tool definitions. | ||
| * Changes ~weekly when persona/recipe is edited. Identical bytes across | ||
| * thousands of turns for the same persona+recipe. */ | ||
| INVARIANT = 'invariant', | ||
| /** Conversation history, active genome adapters, participants, governance | ||
| * state. Grows monotonically — new content APPENDS to the existing | ||
| * prefix, doesn't rewrite earlier bytes. */ | ||
| SEMI_STABLE = 'semi_stable', | ||
| /** Latest user message, audio chunks, current timestamp, last-second | ||
| * pressure observations. Changes every request. The only region the | ||
| * server actually has to reprocess token-by-token. */ | ||
| VOLATILE = 'volatile', | ||
| } |
There was a problem hiding this comment.
PromptTier is declared as a const enum, but it’s also imported/re-exported as a value (export { PromptTier } ...). Since const enums are erased in JS output, this can break ESM/bundler consumers with “module does not provide an export named 'PromptTier'”. Consider switching PromptTier to a normal export enum (or as const object + union type), or make the re-export/imports type-only and stop re-exporting it as a runtime value.
Revert: Phase 1.5 ChatRAGBuilder consumer ordering (#922) — bisecting silence regression
|
Reverted via #926 — bisecting a silence regression observed on clean main after tonight's chain merge (#921 → #923 → #920 → #922). #922 was the most recent merge so reverted first. If retest after revert shows main responds, this PR's logic introduced the regression and the consumer-side reordering needs rework before re-merging. Original commits still in git history (). |
Builds on PR #920 (Phase 1 — composer-side stable-first ordering). This is the consumer side that completes the prefix-reuse story end-to-end.
What changes
ChatRAGBuilder.buildContextsection 2.4 reorders three injections:extractFromCompositioninserts inresult.sectionsorder; that array was tier-sorted byRAGComposer).HumanPresenceTrackerinjection moved from start → end. Presence is volatile (changes when users switch rooms) — must live in suffix.Final assembly order:
Why
PR #920 made the section list byte-deterministic. But the consumer (this builder) was injecting volatile content (human presence) BEFORE the tier-sorted loop and INVARIANT content (tool defs) AFTER. Result: the assembled prompt string still had non-stable bytes in the prefix region — Phase 1 alone wasn't enough for actual prefix-reuse.
This commit makes the assembled string byte-identical-prefix across requests for the same persona+recipe. Combined with future Phase 2 (per-persona DMR slot pinning), llama-server/DMR's prefix-KV-cache reuse fires for real and the ~70× prompt-eval speedup actualizes.
Verification
tscclean.system/rag/builders/ChatRAGBuilder.ts, +40 −19.feature/prefix-reuse-and-multimodal). Merge after feat(rag): Phase 1 — stable-first ordering for prefix-reuse (#918) #920 lands.sha256(prompt[:invariant_len])should be identical across consecutive turns of the same persona.Sequencing
🤖 Generated with Claude Code