diff --git a/CHANGELOG.md b/CHANGELOG.md index c109da3..ec0178c 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,6 +5,20 @@ All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## [0.9.2] - 2026-03-04 + +### Added + +- **Budget-aware MMR selection**: The MMR greedy selection loop now tracks a running token budget. Candidates that would exceed the remaining budget are excluded from consideration at each step, preventing large chunks from consuming diversity slots when they can't fit in the response. Below the MMR threshold (< 10 candidates), budget filtering still applies. +- **Budget-aware chain formatting**: Chain output assembly now iterates through chunks in order and only includes those that fit within the remaining token budget. Chunks that would exceed the budget are dropped entirely — no partial chunks are returned. Step numbering adjusts to reflect included chunks only. +- **Oversized chunk filtering**: Chunks individually larger than the query's `maxTokens` are now excluded a priori from both search and chain pipelines: + - **Search**: Filtered out before MMR reranking so they don't waste diversity slots. + - **Chains**: Traversed through for graph connectivity (the chain doesn't break) but excluded from the path output, token count, and median score calculation. Oversized seeds are similarly traversed but not included in chain output. + +### Changed + +- **Length penalty token extraction**: The `chunkTokens` variable used for the length penalty is now computed unconditionally (outside the `lengthPenalty.enabled` guard) and reused for the size filter and MMR budget map, eliminating redundant `getChunkById` lookups. + ## [0.9.1] - 2026-03-04 ### Fixed diff --git a/docs/guides/how-it-works.md b/docs/guides/how-it-works.md index 8e3be61..162988e 100644 --- a/docs/guides/how-it-works.md +++ b/docs/guides/how-it-works.md @@ -58,19 +58,21 @@ The `search` tool finds semantically similar context: 2. **Parallel search**: Run vector search and BM25 keyword search simultaneously 3. **RRF fusion**: Merge both ranked lists using Reciprocal Rank Fusion (k=60) 4. **Cluster expansion**: Expand results through HDBSCAN cluster siblings -5. **Rank and deduplicate**: Recency boost, deduplication -6. **MMR reranking**: Reorder candidates using Maximal Marginal Relevance to balance relevance with diversity -7. **Token budgeting**: Fit within response limits +5. **Rank and deduplicate**: Recency boost, length penalty, deduplication +6. **Size filter**: Exclude chunks individually larger than the response budget +7. **MMR reranking**: Budget-aware Maximal Marginal Relevance — balances relevance with diversity while tracking remaining token budget so large chunks don't consume diversity slots they can't fill +8. **Token budgeting**: Assemble within response limits (no partial chunks) ### Recall/Predict (episodic) The `recall` and `predict` tools reconstruct narrative chains: 1. **Seed discovery**: Same as search (embed → vector + keyword → RRF → cluster expand) to find top 5 seeds -2. **Multi-path chain walking**: For each seed, DFS with backtracking explores all reachable paths (backward for recall, forward for predict). At branching points (agent transitions, cross-session links), all branches are explored and emitted as candidates -3. **Chain scoring**: Each candidate chain scored by median per-node cosine similarity to the query +2. **Multi-path chain walking**: For each seed, DFS with backtracking explores all reachable paths (backward for recall, forward for predict). Oversized chunks (larger than the token budget) are traversed through for graph connectivity but excluded from path output and scoring. At branching points (agent transitions, cross-session links), all branches are explored and emitted as candidates +3. **Chain scoring**: Each candidate chain scored by median per-node cosine similarity to the query (oversized chunks excluded from median) 4. **Best chain selection**: Highest median score among candidates with ≥ 2 chunks -5. **Fallback**: If no qualifying chain found, fall back to search-style results +5. **Budget-aware formatting**: Iterate through the selected chain, accepting only chunks that fit within the remaining token budget — no partial chunks +6. **Fallback**: If no qualifying chain found, fall back to search-style results ### Hybrid Search @@ -95,6 +97,8 @@ MMR(c) = λ × relevance − (1−λ) × max_similarity(c, already_selected) The first pick is always the top relevance hit. As selected items saturate a semantic neighbourhood, candidates from different topics become competitive — including cluster siblings that cover the same topic from a different angle. This benefits all search results, not just cluster expansion: even without clusters, MMR prevents near-duplicate vector hits from monopolising the token budget. +MMR is also budget-aware: it tracks remaining token budget during selection and excludes candidates that would exceed it. This prevents large chunks from winning a diversity slot only to be truncated or dropped during final assembly. + Controlled by `retrieval.mmrLambda` (default: 0.7). See [Configuration Reference](../reference/configuration.md#retrieval). ### Graceful Degradation diff --git a/docs/reference/mcp-tools.md b/docs/reference/mcp-tools.md index 006ae30..9b5d076 100644 --- a/docs/reference/mcp-tools.md +++ b/docs/reference/mcp-tools.md @@ -270,7 +270,14 @@ These diagnostics help distinguish between "memory is empty" and "memory exists ## Token Limits -Response sizes are controlled by `tokens.mcpMaxResponse` in the configuration (default: 20000 tokens). The `predict` tool uses half this budget. Long responses are truncated to fit within the budget. +Response sizes are controlled by `tokens.mcpMaxResponse` in the configuration (default: 20000 tokens). The `predict` tool uses half this budget. + +Budget enforcement is applied at multiple stages: + +1. **Pre-filter**: Chunks individually larger than the token budget are excluded before ranking. +2. **MMR selection**: The diversity reranking loop tracks remaining budget and skips candidates that would exceed it. +3. **Chain formatting**: Chain output assembly drops chunks that would exceed the remaining budget — no partial chunks are returned. +4. **Final assembly**: Search results are assembled within the budget with overhead accounting for per-chunk formatting. ## Error Handling diff --git a/docs/reference/traversal-algorithm.md b/docs/reference/traversal-algorithm.md index 90b78c6..1478475 100644 --- a/docs/reference/traversal-algorithm.md +++ b/docs/reference/traversal-algorithm.md @@ -36,9 +36,10 @@ function walkAllPaths(seedId, options): pathVisited = Set() // Per-path, not global pathState = { chunkIds: [], scores: [], tokens: 0 } - // Initialize with seed + // Initialize with seed (oversized seeds traversed but excluded from path) pathVisited.add(seedId) - pushToPath(seedId) + if seedChunk.tokens <= tokenBudget: + pushToPath(seedId) dfs(seedId, depth=1, consecutiveSkips=0) return candidates @@ -66,6 +67,12 @@ function dfs(currentId, depth, consecutiveSkips): pathVisited.delete(edge.nextId) continue + // Oversized chunk: pass through without adding to path + if chunk.tokens > tokenBudget: + dfs(edge.nextId, depth + 1, 0) // Continue traversal + pathVisited.delete(edge.nextId) + continue + score = scoreNode(edge.nextId) // Memoized if tokens + chunk.tokens > tokenBudget: emit(currentPath) // Budget hit — emit, don't extend @@ -109,6 +116,14 @@ Path state (`chunkIds`, `chunks`, `nodeScores`, token count) uses push/pop with When `agentFilter` is set and a non-matching chunk is encountered, the chunk is skipped (not added to output) but its edges are explored. `consecutiveSkips` is passed as a parameter to recursion — each recursive frame gets its own count, reset to 0 when a matching chunk is found. This prevents cross-frame interference during backtracking. +### Oversized chunk passthrough + +Chunks individually larger than the token budget are treated as transparent nodes: traversed for graph connectivity but excluded from the path output, token count, and median score. This prevents a single large chunk from breaking an otherwise viable chain. The same applies to oversized seeds — they serve as DFS starting points but don't appear in the output. + +### Budget-aware chain formatting + +After chain selection, the output assembly iterates through chunks in order and only includes those that fit within the remaining token budget. Chunks that would exceed the budget are dropped entirely — no partial chunks are returned. Step numbering (`[1/N]`) reflects included chunks only. + ## Pipeline Integration The chain walker is part of the episodic retrieval pipeline: @@ -122,12 +137,14 @@ Query │ ├─ 4. walkChains(seedIds, { direction, tokenBudget, queryEmbedding }) │ ├─ For each seed, DFS with backtracking explores all paths + │ ├─ Oversized chunks (> tokenBudget) passed through, not added to path │ ├─ Emit candidate at: dead end, depth limit, or token budget │ └─ Per-path visited set prevents cycles within a path │ ├─ 5. selectBestChain(candidates) → highest median per-node similarity with ≥ 2 chunks │ - ├─ 6. If chain found → reverse for chronological output (recall only) + ├─ 6. If chain found → budget-aware formatting (drop chunks exceeding remaining budget) + │ └─ Reverse for chronological output (recall only) └─ 7. Else → fall back to search-style ranked results ``` diff --git a/package.json b/package.json index 76b99c6..456f997 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "causantic", - "version": "0.9.1", + "version": "0.9.2", "description": "Long-term memory for Claude Code — local-first, graph-augmented, self-benchmarking", "type": "module", "private": false, diff --git a/src/retrieval/chain-assembler.ts b/src/retrieval/chain-assembler.ts index c2f9a94..3ba3c12 100644 --- a/src/retrieval/chain-assembler.ts +++ b/src/retrieval/chain-assembler.ts @@ -15,6 +15,7 @@ import { approximateTokens } from '../utils/token-counter.js'; import { searchContext, type SearchRequest } from './search-assembler.js'; import { walkChains, selectBestChain, type Chain } from './chain-walker.js'; import { formatChainChunk } from './formatting.js'; +import type { StoredChunk } from '../storage/types.js'; /** * Request for episodic retrieval. @@ -183,8 +184,8 @@ async function runEpisodicPipeline( }; } - // 4. Format chain as narrative - const formatted = formatChain(bestChain, direction); + // 4. Format chain as narrative (budget-aware: drops chunks that exceed remaining budget) + const formatted = formatChain(bestChain, direction, maxTokens); return { text: formatted.text, @@ -205,10 +206,15 @@ async function runEpisodicPipeline( /** * Format a chain as ordered narrative. * Backward chains are reversed for chronological output (problem → solution). + * + * Budget-aware: iterates through chunks in order, accepting only those that fit + * within the remaining token budget. Chunks that would exceed the budget are + * dropped entirely (no partial chunks). */ function formatChain( chain: Chain, direction: 'forward' | 'backward', + maxTokens: number, ): { text: string; tokenCount: number; @@ -223,6 +229,19 @@ function formatChain( const orderedChunks = direction === 'backward' ? [...chain.chunks].reverse() : chain.chunks; const orderedIds = direction === 'backward' ? [...chain.chunkIds].reverse() : chain.chunkIds; + // First pass: select chunks that fit within budget + const included: Array<{ chunk: StoredChunk; id: string; chunkTokens: number }> = []; + let budgetRemaining = maxTokens; + + for (let i = 0; i < orderedChunks.length; i++) { + const chunk = orderedChunks[i]; + const chunkTokens = chunk.approxTokens || approximateTokens(chunk.content); + if (chunkTokens > budgetRemaining) continue; + included.push({ chunk, id: orderedIds[i], chunkTokens }); + budgetRemaining -= chunkTokens; + } + + // Second pass: format included chunks with correct step numbering const parts: string[] = []; const resultChunks: Array<{ id: string; @@ -232,14 +251,12 @@ function formatChain( }> = []; let totalTokens = 0; - for (let i = 0; i < orderedChunks.length; i++) { - const chunk = orderedChunks[i]; - const chunkTokens = chunk.approxTokens || approximateTokens(chunk.content); - - parts.push(formatChainChunk(chunk, chunk.content, i + 1, orderedChunks.length)); + for (let i = 0; i < included.length; i++) { + const { chunk, id, chunkTokens } = included[i]; + parts.push(formatChainChunk(chunk, chunk.content, i + 1, included.length)); totalTokens += chunkTokens; resultChunks.push({ - id: orderedIds[i], + id, sessionSlug: chunk.sessionSlug, weight: chain.medianScore, preview: chunk.content.slice(0, 100) + (chunk.content.length > 100 ? '...' : ''), diff --git a/src/retrieval/chain-walker.ts b/src/retrieval/chain-walker.ts index fe0a019..f6b07fc 100644 --- a/src/retrieval/chain-walker.ts +++ b/src/retrieval/chain-walker.ts @@ -187,9 +187,20 @@ async function walkAllPaths( continue; } - const nodeScore = await scoreMemo(nextId); const chunkTokens = chunk.approxTokens || 100; + // Oversized chunk (exceeds total budget on its own): pass through without + // adding to path. The chain doesn't break — we continue traversing — but + // this node won't appear in the output or affect the median score. + if (chunkTokens > tokenBudget) { + await dfs(nextId, depth + 1, 0); + anyChildEmitted = true; + pathVisited.delete(nextId); + continue; + } + + const nodeScore = await scoreMemo(nextId); + // Token budget: emit current path, don't extend if (pathTokens + chunkTokens > tokenBudget && pathChunkIds.length > 0) { emitCandidate(); @@ -223,16 +234,18 @@ async function walkAllPaths( } } - // Initialize with seed - const seedScore = await scoreMemo(seedId); + // Initialize with seed (oversized seeds are traversed but excluded from path) const seedTokens = seedChunk.approxTokens || 100; pathVisited.add(seedId); - pathChunkIds.push(seedId); - pathChunks.push(seedChunk); - pathScores.push(seedScore); - pathScore = seedScore; - pathTokens = seedTokens; + if (seedTokens <= tokenBudget) { + const seedScore = await scoreMemo(seedId); + pathChunkIds.push(seedId); + pathChunks.push(seedChunk); + pathScores.push(seedScore); + pathScore = seedScore; + pathTokens = seedTokens; + } await dfs(seedId, 1, 0); diff --git a/src/retrieval/mmr.ts b/src/retrieval/mmr.ts index 56f42bc..35ad173 100644 --- a/src/retrieval/mmr.ts +++ b/src/retrieval/mmr.ts @@ -17,6 +17,14 @@ export interface MMRConfig { lambda: number; } +/** Optional token budget for budget-aware MMR selection. */ +export interface MMRBudgetOptions { + /** Total token budget for selected results. */ + tokenBudget: number; + /** Token count per chunk ID. */ + chunkTokenCounts: Map; +} + /** Minimum candidate count to trigger MMR. Below this, diversity doesn't matter. */ const MMR_THRESHOLD = 10; @@ -29,14 +37,21 @@ const MMR_THRESHOLD = 10; * Short-circuits when fewer than 10 candidates (too few for diversity to matter). * Candidates without embeddings are treated as novel (diversity = 0). * Original RRF scores are preserved — only order changes. + * + * When `budget` is provided, candidates that would exceed the remaining token + * budget are excluded from consideration at each step. This prevents large chunks + * from consuming diversity slots when they can't fit in the response. */ export async function reorderWithMMR( candidates: RankedItem[], queryEmbedding: number[], config: MMRConfig, + budget?: MMRBudgetOptions, ): Promise { if (candidates.length < MMR_THRESHOLD) { - return candidates; + if (!budget) return candidates; + // Still apply budget filtering even without MMR reranking + return applyBudgetFilter(candidates, budget); } const { lambda } = config; @@ -60,6 +75,7 @@ export async function reorderWithMMR( const selected: RankedItem[] = []; const selectedEmbeddings: number[][] = []; const remaining = new Set(candidates.map((_, i) => i)); + let budgetRemaining = budget?.tokenBudget ?? Infinity; while (remaining.size > 0) { let bestIdx = -1; @@ -67,6 +83,13 @@ export async function reorderWithMMR( for (const idx of remaining) { const c = candidates[idx]; + + // Skip candidates that exceed remaining budget + if (budget) { + const tokens = budget.chunkTokenCounts.get(c.chunkId) ?? 0; + if (tokens > budgetRemaining) continue; + } + const rel = normalizedScores.get(c.chunkId)!; // Compute max similarity to already-selected items @@ -87,10 +110,17 @@ export async function reorderWithMMR( } } + // No candidate fits remaining budget + if (bestIdx === -1) break; + remaining.delete(bestIdx); const picked = candidates[bestIdx]; selected.push(picked); + if (budget) { + budgetRemaining -= budget.chunkTokenCounts.get(picked.chunkId) ?? 0; + } + const pickedEmb = embeddings.get(picked.chunkId); if (pickedEmb) { selectedEmbeddings.push(pickedEmb); @@ -99,3 +129,17 @@ export async function reorderWithMMR( return selected; } + +/** Budget-only filter for below-threshold candidate lists (no MMR reranking). */ +function applyBudgetFilter(candidates: RankedItem[], budget: MMRBudgetOptions): RankedItem[] { + const result: RankedItem[] = []; + let remaining = budget.tokenBudget; + for (const c of candidates) { + const tokens = budget.chunkTokenCounts.get(c.chunkId) ?? 0; + if (tokens <= remaining) { + result.push(c); + remaining -= tokens; + } + } + return result; +} diff --git a/src/retrieval/search-assembler.ts b/src/retrieval/search-assembler.ts index 7caae86..0fd2416 100644 --- a/src/retrieval/search-assembler.ts +++ b/src/retrieval/search-assembler.ts @@ -236,11 +236,15 @@ export async function searchContext(request: SearchRequest): Promise(); for (const item of deduped) { const chunk = getChunkById(item.chunkId); if (!chunk) continue; + const chunkTokens = chunk.approxTokens || 500; + chunkTokenMap.set(item.chunkId, chunkTokens); + // Time-decay boost: 1 + decayFactor * exp(-ageHours * ln2 / halfLifeHours) const ageMs = now - new Date(chunk.startTime).getTime(); const ageHours = Math.max(0, ageMs / (1000 * 60 * 60)); @@ -252,7 +256,6 @@ export async function searchContext(request: SearchRequest): Promise b.score - a.score); - // 7.5. MMR reranking (diversity-aware ordering) - const reordered = await reorderWithMMR(deduped, queryResult.embedding, mmrReranking); + // 7.1. Exclude oversized chunks (larger than the response budget). + // These can never be fully returned and would waste MMR diversity slots. + const sizeBounded = deduped.filter((item) => { + const tokens = chunkTokenMap.get(item.chunkId); + return tokens !== undefined && tokens <= maxTokens; + }); + + // 7.5. MMR reranking (diversity-aware, budget-aware ordering) + const reordered = await reorderWithMMR(sizeBounded, queryResult.embedding, mmrReranking, { + tokenBudget: maxTokens, + chunkTokenCounts: chunkTokenMap, + }); // 7.6. Normalize scores for display (top result = 1.0) if (reordered.length > 0) { diff --git a/test/retrieval/chain-assembler.test.ts b/test/retrieval/chain-assembler.test.ts index e6cbefd..2b37d6c 100644 --- a/test/retrieval/chain-assembler.test.ts +++ b/test/retrieval/chain-assembler.test.ts @@ -329,6 +329,75 @@ describe('chain-assembler', () => { }); }); + describe('budget-aware chain formatting', () => { + it('drops chunks that exceed remaining token budget', async () => { + // Backward chain: [C, B_large, A] — reversed for output: [A, B_large, C] + const chain: Chain = { + chunkIds: ['C', 'B', 'A'], + chunks: [ + makeChunk('C', { approxTokens: 100 }), + makeChunk('B', { approxTokens: 5000 }), + makeChunk('A', { approxTokens: 100 }), + ], + nodeScores: [0.8, 0.8, 0.8], + score: 2.4, + tokenCount: 5200, + medianScore: 0.8, + }; + mockChains = [chain]; + mockBestChain = chain; + + const result = await recallContext({ query: 'test', maxTokens: 500 }); + + expect(result.mode).toBe('chain'); + // After reversal: [A(100), B(5000), C(100)] + // Budget: A fits (100 <= 500), B doesn't (5000 > 400), C fits (100 <= 400) + expect(result.chunks.length).toBe(2); + expect(result.chunks[0].id).toBe('A'); + expect(result.chunks[1].id).toBe('C'); + expect(result.tokenCount).toBe(200); + }); + + it('includes all chunks when budget is sufficient', async () => { + const chain = makeChain(['A', 'B', 'C'], 2.0); + mockChains = [chain]; + mockBestChain = chain; + + const result = await recallContext({ query: 'test', maxTokens: 10000 }); + + expect(result.mode).toBe('chain'); + expect(result.chunks.length).toBe(3); + expect(result.tokenCount).toBe(300); + }); + + it('step numbering reflects included chunks only', async () => { + // Chain with a large middle chunk that gets dropped + const chain: Chain = { + chunkIds: ['A', 'B', 'C'], + chunks: [ + makeChunk('A', { approxTokens: 100 }), + makeChunk('B', { approxTokens: 5000 }), + makeChunk('C', { approxTokens: 100 }), + ], + nodeScores: [0.8, 0.8, 0.8], + score: 2.4, + tokenCount: 5200, + medianScore: 0.8, + }; + mockChains = [chain]; + mockBestChain = chain; + + const result = await predictContext({ query: 'test', maxTokens: 500 }); + + expect(result.mode).toBe('chain'); + expect(result.chunks.length).toBe(2); + // Text should show "1/2" and "2/2", not "1/3" and "3/3" + expect(result.text).toContain('[1/2 |'); + expect(result.text).toContain('[2/2 |'); + expect(result.text).not.toContain('[1/3 |'); + }); + }); + describe('chain formatting', () => { it('formats chain chunks with position indicators', async () => { const chain = makeChain(['A', 'B'], 1.5); diff --git a/test/retrieval/chain-walker.test.ts b/test/retrieval/chain-walker.test.ts index af8afc1..20a2ff9 100644 --- a/test/retrieval/chain-walker.test.ts +++ b/test/retrieval/chain-walker.test.ts @@ -519,6 +519,109 @@ describe('chain-walker', () => { }); }); + describe('oversized chunk filtering', () => { + it('traverses through oversized mid-chain chunk without including it', async () => { + // A(100) → B(50000) → C(100), budget=20000 + // B exceeds budget on its own — should be skipped but chain continues to C + mockChunks.set('A', makeChunk('A', { approxTokens: 100 })); + mockChunks.set('B', makeChunk('B', { approxTokens: 50000 })); + mockChunks.set('C', makeChunk('C', { approxTokens: 100 })); + + mockForwardEdges.set('A', [makeEdge('A', 'B')]); + mockForwardEdges.set('B', [makeEdge('B', 'C')]); + + const qEmb = unitVec(1, 0, 0); + mockEmbeddings.set('A', unitVec(0.9, 0.1, 0)); + mockEmbeddings.set('B', unitVec(0.8, 0.2, 0)); + mockEmbeddings.set('C', unitVec(0.7, 0.3, 0)); + + const chains = await walkChains(['A'], { + direction: 'forward', + tokenBudget: 20000, + queryEmbedding: qEmb, + }); + + expect(chains.length).toBe(1); + // B is skipped (oversized), but chain continues: A → C + expect(chains[0].chunkIds).toEqual(['A', 'C']); + expect(chains[0].tokenCount).toBe(200); + // Only 2 scores (A and C), B excluded from median + expect(chains[0].nodeScores.length).toBe(2); + }); + + it('skips oversized seed but continues DFS from it', async () => { + // Seed S(50000) → A(100) → B(100), budget=20000 + // S exceeds budget — excluded from path, but DFS continues through it + mockChunks.set('S', makeChunk('S', { approxTokens: 50000 })); + mockChunks.set('A', makeChunk('A', { approxTokens: 100 })); + mockChunks.set('B', makeChunk('B', { approxTokens: 100 })); + + mockForwardEdges.set('S', [makeEdge('S', 'A')]); + mockForwardEdges.set('A', [makeEdge('A', 'B')]); + + const qEmb = unitVec(1, 0, 0); + mockEmbeddings.set('S', unitVec(0.9, 0.1, 0)); + mockEmbeddings.set('A', unitVec(0.8, 0.2, 0)); + mockEmbeddings.set('B', unitVec(0.7, 0.3, 0)); + + const chains = await walkChains(['S'], { + direction: 'forward', + tokenBudget: 20000, + queryEmbedding: qEmb, + }); + + expect(chains.length).toBe(1); + // S excluded, chain is just [A, B] + expect(chains[0].chunkIds).toEqual(['A', 'B']); + expect(chains[0].tokenCount).toBe(200); + }); + + it('oversized chunk does not affect median score', async () => { + // Without the fix, an oversized chunk at a branch point would break the chain. + // With the fix, it's traversed through and its score doesn't pollute the median. + // A(100) → B(50000) → C(100), A(100) → D(100) + mockChunks.set('A', makeChunk('A', { approxTokens: 100 })); + mockChunks.set('B', makeChunk('B', { approxTokens: 50000 })); + mockChunks.set('C', makeChunk('C', { approxTokens: 100 })); + mockChunks.set('D', makeChunk('D', { approxTokens: 100 })); + + mockForwardEdges.set('A', [makeEdge('A', 'B'), makeEdge('A', 'D')]); + mockForwardEdges.set('B', [makeEdge('B', 'C')]); + + const qEmb = unitVec(1, 0, 0); + mockEmbeddings.set('A', unitVec(0.9, 0.1, 0)); + mockEmbeddings.set('B', unitVec(0.1, 0.9, 0)); // low similarity — would drag median down + mockEmbeddings.set('C', unitVec(0.85, 0.15, 0)); + mockEmbeddings.set('D', unitVec(0.5, 0.5, 0)); + + const chains = await walkChains(['A'], { + direction: 'forward', + tokenBudget: 20000, + queryEmbedding: qEmb, + }); + + // Should have two chains: [A, C] (through oversized B) and [A, D] + const pathACchain = chains.find((c) => c.chunkIds.includes('A') && c.chunkIds.includes('C')); + expect(pathACchain).toBeDefined(); + // B's low score should NOT be in nodeScores + expect(pathACchain!.nodeScores.length).toBe(2); + }); + + it('returns no candidates when only node is oversized seed with no reachable children', async () => { + mockChunks.set('S', makeChunk('S', { approxTokens: 50000 })); + mockEmbeddings.set('S', unitVec(1, 0, 0)); + + const chains = await walkChains(['S'], { + direction: 'forward', + tokenBudget: 20000, + queryEmbedding: unitVec(1, 0, 0), + }); + + // Oversized orphan seed: nothing usable + expect(chains.length).toBe(0); + }); + }); + describe('multi-path edge exploration', () => { it('explores both branches at a branching point', async () => { // A has two edges: A→B (weight 0.7) and A→C (weight 1.0) diff --git a/test/retrieval/mmr.test.ts b/test/retrieval/mmr.test.ts index f18dcd3..95cd4ff 100644 --- a/test/retrieval/mmr.test.ts +++ b/test/retrieval/mmr.test.ts @@ -203,4 +203,73 @@ describe('reorderWithMMR', () => { expect(r.source).toBe(original?.source); } }); + + describe('budget-aware selection', () => { + it('excludes candidates that exceed remaining budget', async () => { + // 12 candidates: first 3 are large (500 each), rest are small (100 each) + const candidates: RankedItem[] = []; + const tokenCounts = new Map(); + for (let i = 0; i < 12; i++) { + candidates.push(makeItem(`c${i}`, 1.0 - i * 0.05)); + mockEmbeddings.set(`c${i}`, makeEmbedding([1, 0.01 * i])); + tokenCounts.set(`c${i}`, i < 3 ? 500 : 100); + } + + const result = await reorderWithMMR(candidates, queryEmbedding, defaultConfig, { + tokenBudget: 1000, + chunkTokenCounts: tokenCounts, + }); + + // Total budget: 1000. Large chunks (500 each) take at most 2 slots. + // Remaining budget goes to small chunks (100 each). + const totalTokens = result.reduce((s, r) => s + (tokenCounts.get(r.chunkId) ?? 0), 0); + expect(totalTokens).toBeLessThanOrEqual(1000); + expect(result.length).toBeGreaterThan(0); + expect(result.length).toBeLessThan(12); + }); + + it('stops when no candidate fits remaining budget', async () => { + const candidates = Array.from({ length: 12 }, (_, i) => makeItem(`c${i}`, 1.0 - i * 0.05)); + const tokenCounts = new Map(); + for (const c of candidates) { + mockEmbeddings.set(c.chunkId, makeEmbedding([1, 0])); + tokenCounts.set(c.chunkId, 600); // Each chunk = 600 tokens + } + + const result = await reorderWithMMR(candidates, queryEmbedding, defaultConfig, { + tokenBudget: 1000, + chunkTokenCounts: tokenCounts, + }); + + // Budget 1000, each chunk 600 → only 1 fits + expect(result).toHaveLength(1); + }); + + it('applies budget filtering below MMR threshold', async () => { + // 5 candidates (below threshold of 10) — MMR skipped but budget still applies + const candidates = Array.from({ length: 5 }, (_, i) => makeItem(`c${i}`, 1.0 - i * 0.1)); + const tokenCounts = new Map(); + tokenCounts.set('c0', 300); + tokenCounts.set('c1', 300); + tokenCounts.set('c2', 300); + tokenCounts.set('c3', 100); + tokenCounts.set('c4', 100); + + const result = await reorderWithMMR(candidates, queryEmbedding, defaultConfig, { + tokenBudget: 500, + chunkTokenCounts: tokenCounts, + }); + + // c0(300) fits, c1(300) doesn't (600 > 500), c2(300) doesn't, + // c3(100) fits (400 <= 500), c4(100) fits (500 <= 500) + expect(result).toHaveLength(3); + expect(result.map((r) => r.chunkId)).toEqual(['c0', 'c3', 'c4']); + }); + + it('without budget, below-threshold candidates returned unchanged', async () => { + const candidates = Array.from({ length: 5 }, (_, i) => makeItem(`c${i}`, 1.0 - i * 0.1)); + const result = await reorderWithMMR(candidates, queryEmbedding, defaultConfig); + expect(result).toEqual(candidates); + }); + }); });