feat(rag): Phase 1.5 — ChatRAGBuilder consumer ordering (#918) by joelteply · Pull Request #922 · CambrianTech/continuum

joelteply · 2026-04-18T00:02:35Z

Builds on PR #920 (Phase 1 — composer-side stable-first ordering). This is the consumer side that completes the prefix-reuse story end-to-end.

What changes

ChatRAGBuilder.buildContext section 2.4 reorders three injections:

Tool definitions moved from end → start (after identity). They're INVARIANT — belong in the byte-stable prefix region.
Generic source loop unchanged, already iterates Map in insertion order which equals tier-sorted order from Phase 1 (extractFromComposition inserts in result.sections order; that array was tier-sorted by RAGComposer).
HumanPresenceTracker injection moved from start → end. Presence is volatile (changes when users switch rooms) — must live in suffix.

Final assembly order:

identity (INVARIANT)
→ tool definitions (INVARIANT)
→ loop in tier order (remaining INVARIANT → SEMI_STABLE → VOLATILE)
→ human presence (VOLATILE)

Why

PR #920 made the section list byte-deterministic. But the consumer (this builder) was injecting volatile content (human presence) BEFORE the tier-sorted loop and INVARIANT content (tool defs) AFTER. Result: the assembled prompt string still had non-stable bytes in the prefix region — Phase 1 alone wasn't enough for actual prefix-reuse.

This commit makes the assembled string byte-identical-prefix across requests for the same persona+recipe. Combined with future Phase 2 (per-persona DMR slot pinning), llama-server/DMR's prefix-KV-cache reuse fires for real and the ~70× prompt-eval speedup actualizes.

Verification

tsc clean.
One file changed: system/rag/builders/ChatRAGBuilder.ts, +40 −19.
Stacks on top of PR feat(rag): Phase 1 — stable-first ordering for prefix-reuse (#918) #920 (feature/prefix-reuse-and-multimodal). Merge after feat(rag): Phase 1 — stable-first ordering for prefix-reuse (#918) #920 lands.
Runtime test (still gate-blocked by Personas go silent after first response wave — Rust full_evaluate gate or InferenceCoordinator slot leak #919 silence-after-first-wave until PR fix(coord): InferenceCoordinator queues instead of denying — fixes #919 #921 lands): sha256(prompt[:invariant_len]) should be identical across consecutive turns of the same persona.

Sequencing

PR fix(coord): InferenceCoordinator queues instead of denying — fixes #919 #921 lands (queue fix for Personas go silent after first response wave — Rust full_evaluate gate or InferenceCoordinator slot leak #919) — unblocks runtime testing
PR feat(rag): Phase 1 — stable-first ordering for prefix-reuse (#918) #920 lands (Phase 1)
This PR lands (Phase 1.5) — completes the consumer side
memento's ModelMetadata refactor: declarative struct, no Option<>, adapter queries its own source #917 lands → unblocks Phase 4
Phase 2 (slot pinning) → and the ~70× actually shows up live

🤖 Generated with Claude Code

Adds PromptTier enum (INVARIANT / SEMI_STABLE / VOLATILE) and makes every RAGSource declare its tier. RAGComposer sorts collected sections deterministically by (tier, sourceName) before returning. Why: today the composer's parallel section assembly produces a different byte order on every chat call. llama-server / DMR's prefix-KV-cache reuse never fires, so each turn reprocesses the full 14k-token prompt from scratch (~35s prompt eval at 400 tok/s). With deterministic ordering AND stable bytes within each tier, the unchanging INVARIANT prefix gets reused — only the VOLATILE suffix needs evaluation. Expected: ~70× faster prompt eval per turn for repeat-context turns. Architecture (per docs/architecture/MULTIMODAL-WORKER-AND-PREFIX-REUSE.md): - INVARIANT: persona identity, tool definitions, recipe rules, docs (PersonaIdentity, ToolDefinitions, CodeTool, Documentation, ToolMethodology, ProjectContext) - SEMI_STABLE: history, memories, participants, governance — append-only (ConversationHistory, LiveRoomAwareness, Governance, OpenProposals, SentinelAwareness, GlobalAwareness, SocialMediaRAG, SemanticMemory) - VOLATILE: latest message, audio chunks, current activity, UI state (ActivityContext, CodebaseSearch, MediaArtifact, VoiceConversation, WidgetContext) Implementation note: tier is a class-level declaration on each RAGSource (required field, no Option<>). Sources return Omit<RAGSection, 'tier'> from load() and fromBatchResult(); RAGComposer injects the source's declared tier when wrapping the section. Single-source-of-truth classification per source — no per-return-statement repetition. Phases 2 (slot pinning) and 3 (composition cache) build on this. Phase 4 (multimodal content parts) depends on #917 ModelMetadata. tsc clean. Branch: feature/prefix-reuse-and-multimodal off main. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…boot CodebaseIndexer ran 64-batches back-to-back with NO yield between batches. Each batch ~1.5s + ~80MB RSS growth. With 5000+ chunks in src/, that's 78+ batches × 1.5s = 2+ minutes of total event-loop saturation immediately after every boot. Local personas couldn't respond, voice couldn't connect, anything that needed the bus was blocked until indexing finished. Two changes: - Batch size 64→16 (smaller per-batch RSS hit, ~4× more chances for other IO to interleave between IPC roundtrips) - 50ms pause between batches via setTimeout (yields the event loop so chat/voice/personas can process while indexing runs) The throughput cost is small (16 vs 64 chunks per IPC) and the inter-batch pause is invisible at human timescales. The chat-arrival latency win is huge — system is responsive within seconds of boot instead of minutes. The deeper fix is querying GpuPressureWatcher / ResourcePressureWatcher before each batch and backing off when pressure is high — same principle Joel called out for InferenceCoordinator slot capacity. That's a follow-up; this is the floor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ability (#918) Phase 1 (already shipped in PR #920) sorted RAGComposer's section list by (tier, sourceName). This commit makes ChatRAGBuilder respect that order when assembling the final prompt string, so the byte-prefix actually IS stable end-to-end. Three reorderings in section 2.4 of buildContext(): 1. Tool definitions injection moved from end to start (after identity). Tool defs are INVARIANT — they belong in the byte-stable prefix region, not after VOLATILE content. 2. The generic source loop already iterates Map in insertion order, which equals tier-sorted order from extractFromComposition (which inserts in result.sections order, which Phase 1 sorted). So the loop now produces INVARIANT → SEMI_STABLE → VOLATILE content automatically — no per-section sorting needed. 3. HumanPresenceTracker injection moved from before-the-loop to after-the-loop. Presence is volatile (changes when users switch rooms) and must live in the suffix, never in the byte-stable prefix. Final assembly order: identity (INVARIANT, from PersonaIdentitySource) → tool definitions (INVARIANT) → loop in tier order (INVARIANT remaining → SEMI_STABLE → VOLATILE) → human presence (VOLATILE) → conversation history (already separate, lives in messages array) Net effect for prefix-reuse: with the same persona+recipe, the INVARIANT region of the prompt is byte-identical across thousands of turns. llama-server / DMR's prefix-KV-cache match fires on the INVARIANT prefix; only the VOLATILE suffix gets reprocessed. Combined with future per-persona slot pinning (Phase 2), this is the ~70× prompt-eval speedup the design doc promised. tsc clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

This PR advances the prompt prefix-reuse work for Chat RAG by introducing tier-aware section ordering and then reordering consumer-side prompt assembly to keep volatile bytes in the suffix (to enable KV-cache prefix reuse).

Changes:

Add PromptTier and thread tier metadata through RAGSource/RAGSection, injecting tier in RAGComposer and sorting sections deterministically by (tier, sourceName).
Update all RAG sources to declare a tier and to return Omit<RAGSection, 'tier'> so the composer is the single tier authority.
Reorder ChatRAGBuilder injections (tool definitions earlier; human presence later) and throttle codebase indexing embedding batches to reduce startup event-loop starvation.

Reviewed changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
src/system/rag/builders/ChatRAGBuilder.ts	Reorders prompt injections to improve stable-prefix behavior (tool defs earlier; presence later).
src/system/rag/services/CodebaseIndexer.ts	Reduces embedding batch size and adds inter-batch pause to yield the event loop.
src/system/rag/shared/RAGComposer.ts	Injects tiers into sections and sorts sections deterministically by tier + name.
src/system/rag/shared/RAGSource.ts	Adds `tier` to `RAGSource`/`RAGSection`, changes `load`/`fromBatchResult` return types, and re-exports `PromptTier`.
src/system/rag/shared/RAGTypes.ts	Introduces `PromptTier` and documents tier ordering contract.
src/system/rag/sources/ActivityContextSource.ts	Declares tier and updates `load()` return type to omit `tier`.
src/system/rag/sources/CodeToolSource.ts	Declares tier and updates `load()`/helpers to omit `tier`.
src/system/rag/sources/CodebaseSearchSource.ts	Declares tier and updates `load()` return type to omit `tier`.
src/system/rag/sources/ConversationHistorySource.ts	Declares tier and updates `load()`/helpers to omit `tier`.
src/system/rag/sources/DocumentationSource.ts	Declares tier and updates `load()` return type to omit `tier`.
src/system/rag/sources/GlobalAwarenessSource.ts	Declares tier and updates `load()`/`fromBatchResult()`/helpers to omit `tier`.
src/system/rag/sources/GovernanceSource.ts	Declares tier and updates `load()` return type to omit `tier`.
src/system/rag/sources/LiveRoomAwarenessSource.ts	Declares tier and updates `load()`/helpers to omit `tier`.
src/system/rag/sources/MediaArtifactSource.ts	Declares tier and updates `load()` return type to omit `tier`.
src/system/rag/sources/OpenProposalsSource.ts	Declares tier and updates `EMPTY_SECTION`/`load()` to omit `tier`.
src/system/rag/sources/PersonaIdentitySource.ts	Declares tier and updates `load()`/helpers to omit `tier`.
src/system/rag/sources/ProjectContextSource.ts	Declares tier and updates cached section types + `load()`/helpers to omit `tier`.
src/system/rag/sources/SemanticMemorySource.ts	Declares tier and updates `load()`/`fromBatchResult()`/helpers to omit `tier`.
src/system/rag/sources/SentinelAwarenessSource.ts	Declares tier and updates `load()` return type to omit `tier`.
src/system/rag/sources/SocialMediaRAGSource.ts	Declares tier and updates `load()`/helpers to omit `tier`.
src/system/rag/sources/ToolDefinitionsSource.ts	Declares tier and updates `load()`/helpers to omit `tier`.
src/system/rag/sources/ToolMethodologySource.ts	Declares tier and updates `load()` return type to omit `tier`.
src/system/rag/sources/VoiceConversationSource.ts	Declares tier and updates `load()`/helpers to omit `tier`.
src/system/rag/sources/WidgetContextSource.ts	Declares tier and updates `load()`/helpers to omit `tier`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-18T00:09:09Z

 import type { RAGBuildOptions, LLMMessage, RAGArtifact, PersonaMemory, PersonaIdentity, RecipeStrategy } from './RAGTypes';
+import { PromptTier } from './RAGTypes';
+
+// Re-export so source files only need one import
+export { PromptTier } from './RAGTypes';


The file imports and re-exports PromptTier as a runtime symbol (import { PromptTier } ... + export { PromptTier } ...), but PromptTier is a const enum (erased at emit). In ESM this can cause a hard runtime failure when the re-exported binding doesn’t exist. Safer options: (a) make PromptTier a normal enum, or (b) remove the runtime re-export and have consumers import type { PromptTier } / use string literals.

Suggested change

import type { RAGBuildOptions, LLMMessage, RAGArtifact, PersonaMemory, PersonaIdentity, RecipeStrategy } from './RAGTypes';

import { PromptTier } from './RAGTypes';

// Re-export so source files only need one import

export { PromptTier } from './RAGTypes';

import type { RAGBuildOptions, LLMMessage, RAGArtifact, PersonaMemory, PersonaIdentity, RecipeStrategy, PromptTier } from './RAGTypes';

// Re-export so source files only need one import

export type { PromptTier } from './RAGTypes';

Copilot · 2026-04-18T00:09:10Z

 export class ToolDefinitionsSource implements RAGSource {
  readonly name = 'tool-definitions';
+  readonly tier = PromptTier.INVARIANT;
  readonly priority = 45;
  readonly defaultBudgetPercent = 10;


ToolDefinitionsSource is marked PromptTier.INVARIANT, but its output explicitly depends on context.options.currentMessage (contextual group selection + group hints). That makes the section change turn-to-turn, which defeats the stable-prefix goal and violates the tier contract. Either make tool definitions truly invariant (no currentMessage-dependent selection) or split into an invariant “tool catalog/specs” source and a volatile “tool hints for this turn” source (and mark this one SEMI_STABLE/VOLATILE accordingly).

Copilot · 2026-04-18T00:09:10Z

+ * Within each tier, sources are sorted by name (alphabetical) so the byte
+ * order is fully deterministic.


The PromptTier doc says “Within each tier, sources are sorted by name (alphabetical)”, but ChatRAGBuilder now explicitly hoists tool-definitions ahead of other INVARIANT sources. Either update this doc to reflect the actual consumer-side ordering rules, or move all ordering decisions into the composer so the documented contract matches the emitted prompt order.

Suggested change

* Within each tier, sources are sorted by name (alphabetical) so the byte

* order is fully deterministic.

* Within a tier, source order must be deterministic. Consumers may apply

* tier-specific ordering rules before any fallback alphabetical ordering; for

* example, `tool-definitions` is hoisted ahead of other INVARIANT sources.

Copilot · 2026-04-18T00:09:10Z

+export const enum PromptTier {
+  /** Persona system prompt, recipe rules, role identity, tool definitions.
+   * Changes ~weekly when persona/recipe is edited. Identical bytes across
+   * thousands of turns for the same persona+recipe. */
+  INVARIANT = 'invariant',
+  /** Conversation history, active genome adapters, participants, governance
+   * state. Grows monotonically — new content APPENDS to the existing
+   * prefix, doesn't rewrite earlier bytes. */
+  SEMI_STABLE = 'semi_stable',
+  /** Latest user message, audio chunks, current timestamp, last-second
+   * pressure observations. Changes every request. The only region the
+   * server actually has to reprocess token-by-token. */
+  VOLATILE = 'volatile',
+}


PromptTier is declared as a const enum, but it’s also imported/re-exported as a value (export { PromptTier } ...). Since const enums are erased in JS output, this can break ESM/bundler consumers with “module does not provide an export named 'PromptTier'”. Consider switching PromptTier to a normal export enum (or as const object + union type), or make the re-export/imports type-only and stop re-exporting it as a runtime value.

…onsumer-ordering" This reverts commit 3dfc3a8, reversing changes made to a6419b8.

Revert: Phase 1.5 ChatRAGBuilder consumer ordering (#922) — bisecting silence regression

joelteply · 2026-04-18T00:49:13Z

Reverted via #926 — bisecting a silence regression observed on clean main after tonight's chain merge (#921 → #923 → #920 → #922). #922 was the most recent merge so reverted first. If retest after revert shows main responds, this PR's logic introduced the regression and the consumer-side reordering needs rework before re-merging. Original commits still in git history ().

joelteply and others added 3 commits April 17, 2026 18:05

Copilot AI review requested due to automatic review settings April 18, 2026 00:02

Copilot started reviewing on behalf of joelteply April 18, 2026 00:03 View session

github-actions Bot added the size: L label Apr 18, 2026

Copilot AI reviewed Apr 18, 2026

View reviewed changes

joelteply merged commit 3dfc3a8 into main Apr 18, 2026
8 checks passed

joelteply deleted the feature/phase-1-5-consumer-ordering branch April 18, 2026 00:22

joelteply added a commit that referenced this pull request Apr 18, 2026

Revert "Merge pull request #922 from CambrianTech/feature/phase-1-5-c…

b9a51d7

…onsumer-ordering" This reverts commit 3dfc3a8, reversing changes made to a6419b8.

joelteply mentioned this pull request Apr 18, 2026

Revert: Phase 1.5 ChatRAGBuilder consumer ordering (#922) — bisecting silence regression #926

Merged

joelteply added a commit that referenced this pull request Apr 18, 2026

Merge pull request #926 from CambrianTech/revert/pr-922-phase-1-5

20999e3

Revert: Phase 1.5 ChatRAGBuilder consumer ordering (#922) — bisecting silence regression

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rag): Phase 1.5 — ChatRAGBuilder consumer ordering (#918)#922

feat(rag): Phase 1.5 — ChatRAGBuilder consumer ordering (#918)#922
joelteply merged 3 commits intomainfrom
feature/phase-1-5-consumer-ordering

joelteply commented Apr 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Copilot AI Apr 18, 2026

Uh oh!

Uh oh!

joelteply commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		* Within each tier, sources are sorted by name (alphabetical) so the byte
		* order is fully deterministic.

- * Within each tier, sources are sorted by name (alphabetical) so the byte
- * order is fully deterministic.
+ * Within a tier, source order must be deterministic. Consumers may apply
+ * tier-specific ordering rules before any fallback alphabetical ordering; for
+ * example, `tool-definitions` is hoisted ahead of other INVARIANT sources.

Conversation

joelteply commented Apr 18, 2026

What changes

Why

Verification

Sequencing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

joelteply commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants