Entrolution · gvonness-apolitical · Mar 4, 2026 · Mar 4, 2026 · Mar 4, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -5,6 +5,20 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [0.9.2] - 2026-03-04
+
+### Added
+
+- **Budget-aware MMR selection**: The MMR greedy selection loop now tracks a running token budget. Candidates that would exceed the remaining budget are excluded from consideration at each step, preventing large chunks from consuming diversity slots when they can't fit in the response. Below the MMR threshold (< 10 candidates), budget filtering still applies.
+- **Budget-aware chain formatting**: Chain output assembly now iterates through chunks in order and only includes those that fit within the remaining token budget. Chunks that would exceed the budget are dropped entirely — no partial chunks are returned. Step numbering adjusts to reflect included chunks only.
+- **Oversized chunk filtering**: Chunks individually larger than the query's `maxTokens` are now excluded a priori from both search and chain pipelines:
+  - **Search**: Filtered out before MMR reranking so they don't waste diversity slots.
+  - **Chains**: Traversed through for graph connectivity (the chain doesn't break) but excluded from the path output, token count, and median score calculation. Oversized seeds are similarly traversed but not included in chain output.
+
+### Changed
+
+- **Length penalty token extraction**: The `chunkTokens` variable used for the length penalty is now computed unconditionally (outside the `lengthPenalty.enabled` guard) and reused for the size filter and MMR budget map, eliminating redundant `getChunkById` lookups.
+
 ## [0.9.1] - 2026-03-04
 
 ### Fixed

diff --git a/docs/guides/how-it-works.md b/docs/guides/how-it-works.md
@@ -58,19 +58,21 @@ The `search` tool finds semantically similar context:
 2. **Parallel search**: Run vector search and BM25 keyword search simultaneously
 3. **RRF fusion**: Merge both ranked lists using Reciprocal Rank Fusion (k=60)
 4. **Cluster expansion**: Expand results through HDBSCAN cluster siblings
-5. **Rank and deduplicate**: Recency boost, deduplication
-6. **MMR reranking**: Reorder candidates using Maximal Marginal Relevance to balance relevance with diversity
-7. **Token budgeting**: Fit within response limits
+5. **Rank and deduplicate**: Recency boost, length penalty, deduplication
+6. **Size filter**: Exclude chunks individually larger than the response budget
+7. **MMR reranking**: Budget-aware Maximal Marginal Relevance — balances relevance with diversity while tracking remaining token budget so large chunks don't consume diversity slots they can't fill
+8. **Token budgeting**: Assemble within response limits (no partial chunks)
 
 ### Recall/Predict (episodic)
 
 The `recall` and `predict` tools reconstruct narrative chains:
 
 1. **Seed discovery**: Same as search (embed → vector + keyword → RRF → cluster expand) to find top 5 seeds
-2. **Multi-path chain walking**: For each seed, DFS with backtracking explores all reachable paths (backward for recall, forward for predict). At branching points (agent transitions, cross-session links), all branches are explored and emitted as candidates
-3. **Chain scoring**: Each candidate chain scored by median per-node cosine similarity to the query
+2. **Multi-path chain walking**: For each seed, DFS with backtracking explores all reachable paths (backward for recall, forward for predict). Oversized chunks (larger than the token budget) are traversed through for graph connectivity but excluded from path output and scoring. At branching points (agent transitions, cross-session links), all branches are explored and emitted as candidates
+3. **Chain scoring**: Each candidate chain scored by median per-node cosine similarity to the query (oversized chunks excluded from median)
 4. **Best chain selection**: Highest median score among candidates with ≥ 2 chunks
-5. **Fallback**: If no qualifying chain found, fall back to search-style results
+5. **Budget-aware formatting**: Iterate through the selected chain, accepting only chunks that fit within the remaining token budget — no partial chunks
+6. **Fallback**: If no qualifying chain found, fall back to search-style results
 
 ### Hybrid Search
 
@@ -95,6 +97,8 @@ MMR(c) = λ × relevance − (1−λ) × max_similarity(c, already_selected)
 
 The first pick is always the top relevance hit. As selected items saturate a semantic neighbourhood, candidates from different topics become competitive — including cluster siblings that cover the same topic from a different angle. This benefits all search results, not just cluster expansion: even without clusters, MMR prevents near-duplicate vector hits from monopolising the token budget.
 
+MMR is also budget-aware: it tracks remaining token budget during selection and excludes candidates that would exceed it. This prevents large chunks from winning a diversity slot only to be truncated or dropped during final assembly.
+
 Controlled by `retrieval.mmrLambda` (default: 0.7). See [Configuration Reference](../reference/configuration.md#retrieval).
 
 ### Graceful Degradation

diff --git a/docs/reference/mcp-tools.md b/docs/reference/mcp-tools.md
@@ -270,7 +270,14 @@ These diagnostics help distinguish between "memory is empty" and "memory exists
 
 ## Token Limits
 
-Response sizes are controlled by `tokens.mcpMaxResponse` in the configuration (default: 20000 tokens). The `predict` tool uses half this budget. Long responses are truncated to fit within the budget.
+Response sizes are controlled by `tokens.mcpMaxResponse` in the configuration (default: 20000 tokens). The `predict` tool uses half this budget.
+
+Budget enforcement is applied at multiple stages:
+
+1. **Pre-filter**: Chunks individually larger than the token budget are excluded before ranking.
+2. **MMR selection**: The diversity reranking loop tracks remaining budget and skips candidates that would exceed it.
+3. **Chain formatting**: Chain output assembly drops chunks that would exceed the remaining budget — no partial chunks are returned.
+4. **Final assembly**: Search results are assembled within the budget with overhead accounting for per-chunk formatting.
 
 ## Error Handling
 

diff --git a/docs/reference/traversal-algorithm.md b/docs/reference/traversal-algorithm.md
@@ -36,9 +36,10 @@ function walkAllPaths(seedId, options):
     pathVisited = Set()                         // Per-path, not global
     pathState = { chunkIds: [], scores: [], tokens: 0 }
 
-    // Initialize with seed
+    // Initialize with seed (oversized seeds traversed but excluded from path)
     pathVisited.add(seedId)
-    pushToPath(seedId)
+    if seedChunk.tokens <= tokenBudget:
+        pushToPath(seedId)
 
     dfs(seedId, depth=1, consecutiveSkips=0)
     return candidates
@@ -66,6 +67,12 @@ function dfs(currentId, depth, consecutiveSkips):
             pathVisited.delete(edge.nextId)
             continue
 
+        // Oversized chunk: pass through without adding to path
+        if chunk.tokens > tokenBudget:
+            dfs(edge.nextId, depth + 1, 0)      // Continue traversal
+            pathVisited.delete(edge.nextId)
+            continue
+
         score = scoreNode(edge.nextId)          // Memoized
         if tokens + chunk.tokens > tokenBudget:
             emit(currentPath)                   // Budget hit — emit, don't extend
@@ -109,6 +116,14 @@ Path state (`chunkIds`, `chunks`, `nodeScores`, token count) uses push/pop with
 
 When `agentFilter` is set and a non-matching chunk is encountered, the chunk is skipped (not added to output) but its edges are explored. `consecutiveSkips` is passed as a parameter to recursion — each recursive frame gets its own count, reset to 0 when a matching chunk is found. This prevents cross-frame interference during backtracking.
 
+### Oversized chunk passthrough
+
+Chunks individually larger than the token budget are treated as transparent nodes: traversed for graph connectivity but excluded from the path output, token count, and median score. This prevents a single large chunk from breaking an otherwise viable chain. The same applies to oversized seeds — they serve as DFS starting points but don't appear in the output.
+
+### Budget-aware chain formatting
+
+After chain selection, the output assembly iterates through chunks in order and only includes those that fit within the remaining token budget. Chunks that would exceed the budget are dropped entirely — no partial chunks are returned. Step numbering (`[1/N]`) reflects included chunks only.
+
 ## Pipeline Integration
 
 The chain walker is part of the episodic retrieval pipeline:
@@ -122,12 +137,14 @@ Query
   │
   ├─ 4. walkChains(seedIds, { direction, tokenBudget, queryEmbedding })
   │     ├─ For each seed, DFS with backtracking explores all paths
+  │     ├─ Oversized chunks (> tokenBudget) passed through, not added to path
   │     ├─ Emit candidate at: dead end, depth limit, or token budget
   │     └─ Per-path visited set prevents cycles within a path
   │
   ├─ 5. selectBestChain(candidates) → highest median per-node similarity with ≥ 2 chunks
   │
-  ├─ 6. If chain found → reverse for chronological output (recall only)
+  ├─ 6. If chain found → budget-aware formatting (drop chunks exceeding remaining budget)
+  │     └─ Reverse for chronological output (recall only)
   └─ 7. Else → fall back to search-style ranked results
 ```
 

diff --git a/package.json b/package.json
@@ -1,6 +1,6 @@
 {
   "name": "causantic",
-  "version": "0.9.1",
+  "version": "0.9.2",
   "description": "Long-term memory for Claude Code — local-first, graph-augmented, self-benchmarking",
   "type": "module",
   "private": false,

diff --git a/src/retrieval/chain-assembler.ts b/src/retrieval/chain-assembler.ts
@@ -15,6 +15,7 @@ import { approximateTokens } from '../utils/token-counter.js';
 import { searchContext, type SearchRequest } from './search-assembler.js';
 import { walkChains, selectBestChain, type Chain } from './chain-walker.js';
 import { formatChainChunk } from './formatting.js';
+import type { StoredChunk } from '../storage/types.js';
 
 /**
  * Request for episodic retrieval.
@@ -183,8 +184,8 @@ async function runEpisodicPipeline(
     };
   }
 
-  // 4. Format chain as narrative
-  const formatted = formatChain(bestChain, direction);
+  // 4. Format chain as narrative (budget-aware: drops chunks that exceed remaining budget)
+  const formatted = formatChain(bestChain, direction, maxTokens);
 
   return {
     text: formatted.text,
@@ -205,10 +206,15 @@ async function runEpisodicPipeline(
 /**
  * Format a chain as ordered narrative.
  * Backward chains are reversed for chronological output (problem → solution).
+ *
+ * Budget-aware: iterates through chunks in order, accepting only those that fit
+ * within the remaining token budget. Chunks that would exceed the budget are
+ * dropped entirely (no partial chunks).
  */
 function formatChain(
   chain: Chain,
   direction: 'forward' | 'backward',
+  maxTokens: number,
 ): {
   text: string;
   tokenCount: number;
@@ -223,6 +229,19 @@ function formatChain(
   const orderedChunks = direction === 'backward' ? [...chain.chunks].reverse() : chain.chunks;
   const orderedIds = direction === 'backward' ? [...chain.chunkIds].reverse() : chain.chunkIds;
 
+  // First pass: select chunks that fit within budget
+  const included: Array<{ chunk: StoredChunk; id: string; chunkTokens: number }> = [];
+  let budgetRemaining = maxTokens;
+
+  for (let i = 0; i < orderedChunks.length; i++) {
+    const chunk = orderedChunks[i];
+    const chunkTokens = chunk.approxTokens || approximateTokens(chunk.content);
+    if (chunkTokens > budgetRemaining) continue;
+    included.push({ chunk, id: orderedIds[i], chunkTokens });
+    budgetRemaining -= chunkTokens;
+  }
+
+  // Second pass: format included chunks with correct step numbering
   const parts: string[] = [];
   const resultChunks: Array<{
     id: string;
@@ -232,14 +251,12 @@ function formatChain(
   }> = [];
   let totalTokens = 0;
 
-  for (let i = 0; i < orderedChunks.length; i++) {
-    const chunk = orderedChunks[i];
-    const chunkTokens = chunk.approxTokens || approximateTokens(chunk.content);
-
-    parts.push(formatChainChunk(chunk, chunk.content, i + 1, orderedChunks.length));
+  for (let i = 0; i < included.length; i++) {
+    const { chunk, id, chunkTokens } = included[i];
+    parts.push(formatChainChunk(chunk, chunk.content, i + 1, included.length));
     totalTokens += chunkTokens;
     resultChunks.push({
-      id: orderedIds[i],
+      id,
       sessionSlug: chunk.sessionSlug,
       weight: chain.medianScore,
       preview: chunk.content.slice(0, 100) + (chunk.content.length > 100 ? '...' : ''),

diff --git a/src/retrieval/chain-walker.ts b/src/retrieval/chain-walker.ts
@@ -187,9 +187,20 @@ async function walkAllPaths(
         continue;
       }
 
-      const nodeScore = await scoreMemo(nextId);
       const chunkTokens = chunk.approxTokens || 100;
 
+      // Oversized chunk (exceeds total budget on its own): pass through without
+      // adding to path. The chain doesn't break — we continue traversing — but
+      // this node won't appear in the output or affect the median score.
+      if (chunkTokens > tokenBudget) {
+        await dfs(nextId, depth + 1, 0);
+        anyChildEmitted = true;
+        pathVisited.delete(nextId);
+        continue;
+      }
+
+      const nodeScore = await scoreMemo(nextId);
+
       // Token budget: emit current path, don't extend
       if (pathTokens + chunkTokens > tokenBudget && pathChunkIds.length > 0) {
         emitCandidate();
@@ -223,16 +234,18 @@ async function walkAllPaths(
     }
   }
 
-  // Initialize with seed
-  const seedScore = await scoreMemo(seedId);
+  // Initialize with seed (oversized seeds are traversed but excluded from path)
   const seedTokens = seedChunk.approxTokens || 100;
 
   pathVisited.add(seedId);
-  pathChunkIds.push(seedId);
-  pathChunks.push(seedChunk);
-  pathScores.push(seedScore);
-  pathScore = seedScore;
-  pathTokens = seedTokens;
+  if (seedTokens <= tokenBudget) {
+    const seedScore = await scoreMemo(seedId);
+    pathChunkIds.push(seedId);
+    pathChunks.push(seedChunk);
+    pathScores.push(seedScore);
+    pathScore = seedScore;
+    pathTokens = seedTokens;
+  }
 
   await dfs(seedId, 1, 0);
 

diff --git a/src/retrieval/mmr.ts b/src/retrieval/mmr.ts
@@ -17,6 +17,14 @@ export interface MMRConfig {
   lambda: number;
 }
 
+/** Optional token budget for budget-aware MMR selection. */
+export interface MMRBudgetOptions {
+  /** Total token budget for selected results. */
+  tokenBudget: number;
+  /** Token count per chunk ID. */
+  chunkTokenCounts: Map<string, number>;
+}
+
 /** Minimum candidate count to trigger MMR. Below this, diversity doesn't matter. */
 const MMR_THRESHOLD = 10;
 
@@ -29,14 +37,21 @@ const MMR_THRESHOLD = 10;
  * Short-circuits when fewer than 10 candidates (too few for diversity to matter).
  * Candidates without embeddings are treated as novel (diversity = 0).
  * Original RRF scores are preserved — only order changes.
+ *
+ * When `budget` is provided, candidates that would exceed the remaining token
+ * budget are excluded from consideration at each step. This prevents large chunks
+ * from consuming diversity slots when they can't fit in the response.
  */
 export async function reorderWithMMR(
   candidates: RankedItem[],
   queryEmbedding: number[],
   config: MMRConfig,
+  budget?: MMRBudgetOptions,
 ): Promise<RankedItem[]> {
   if (candidates.length < MMR_THRESHOLD) {
-    return candidates;
+    if (!budget) return candidates;
+    // Still apply budget filtering even without MMR reranking
+    return applyBudgetFilter(candidates, budget);
   }
 
   const { lambda } = config;
@@ -60,13 +75,21 @@ export async function reorderWithMMR(
   const selected: RankedItem[] = [];
   const selectedEmbeddings: number[][] = [];
   const remaining = new Set(candidates.map((_, i) => i));
+  let budgetRemaining = budget?.tokenBudget ?? Infinity;
 
   while (remaining.size > 0) {
     let bestIdx = -1;
     let bestMMR = -Infinity;
 
     for (const idx of remaining) {
       const c = candidates[idx];
+
+      // Skip candidates that exceed remaining budget
+      if (budget) {
+        const tokens = budget.chunkTokenCounts.get(c.chunkId) ?? 0;
+        if (tokens > budgetRemaining) continue;
+      }
+
       const rel = normalizedScores.get(c.chunkId)!;
 
       // Compute max similarity to already-selected items
@@ -87,10 +110,17 @@ export async function reorderWithMMR(
       }
     }
 
+    // No candidate fits remaining budget
+    if (bestIdx === -1) break;
+
     remaining.delete(bestIdx);
     const picked = candidates[bestIdx];
     selected.push(picked);
 
+    if (budget) {
+      budgetRemaining -= budget.chunkTokenCounts.get(picked.chunkId) ?? 0;
+    }
+
     const pickedEmb = embeddings.get(picked.chunkId);
     if (pickedEmb) {
       selectedEmbeddings.push(pickedEmb);
@@ -99,3 +129,17 @@ export async function reorderWithMMR(
 
   return selected;
 }
+
+/** Budget-only filter for below-threshold candidate lists (no MMR reranking). */
+function applyBudgetFilter(candidates: RankedItem[], budget: MMRBudgetOptions): RankedItem[] {
+  const result: RankedItem[] = [];
+  let remaining = budget.tokenBudget;
+  for (const c of candidates) {
+    const tokens = budget.chunkTokenCounts.get(c.chunkId) ?? 0;
+    if (tokens <= remaining) {
+      result.push(c);
+      remaining -= tokens;
+    }
+  }
+  return result;
+}