Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,20 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.9.2] - 2026-03-04

### Added

- **Budget-aware MMR selection**: The MMR greedy selection loop now tracks a running token budget. Candidates that would exceed the remaining budget are excluded from consideration at each step, preventing large chunks from consuming diversity slots when they can't fit in the response. Below the MMR threshold (< 10 candidates), budget filtering still applies.
- **Budget-aware chain formatting**: Chain output assembly now iterates through chunks in order and only includes those that fit within the remaining token budget. Chunks that would exceed the budget are dropped entirely — no partial chunks are returned. Step numbering adjusts to reflect included chunks only.
- **Oversized chunk filtering**: Chunks individually larger than the query's `maxTokens` are now excluded a priori from both search and chain pipelines:
- **Search**: Filtered out before MMR reranking so they don't waste diversity slots.
- **Chains**: Traversed through for graph connectivity (the chain doesn't break) but excluded from the path output, token count, and median score calculation. Oversized seeds are similarly traversed but not included in chain output.

### Changed

- **Length penalty token extraction**: The `chunkTokens` variable used for the length penalty is now computed unconditionally (outside the `lengthPenalty.enabled` guard) and reused for the size filter and MMR budget map, eliminating redundant `getChunkById` lookups.

## [0.9.1] - 2026-03-04

### Fixed
Expand Down
16 changes: 10 additions & 6 deletions docs/guides/how-it-works.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,19 +58,21 @@ The `search` tool finds semantically similar context:
2. **Parallel search**: Run vector search and BM25 keyword search simultaneously
3. **RRF fusion**: Merge both ranked lists using Reciprocal Rank Fusion (k=60)
4. **Cluster expansion**: Expand results through HDBSCAN cluster siblings
5. **Rank and deduplicate**: Recency boost, deduplication
6. **MMR reranking**: Reorder candidates using Maximal Marginal Relevance to balance relevance with diversity
7. **Token budgeting**: Fit within response limits
5. **Rank and deduplicate**: Recency boost, length penalty, deduplication
6. **Size filter**: Exclude chunks individually larger than the response budget
7. **MMR reranking**: Budget-aware Maximal Marginal Relevance — balances relevance with diversity while tracking remaining token budget so large chunks don't consume diversity slots they can't fill
8. **Token budgeting**: Assemble within response limits (no partial chunks)

### Recall/Predict (episodic)

The `recall` and `predict` tools reconstruct narrative chains:

1. **Seed discovery**: Same as search (embed → vector + keyword → RRF → cluster expand) to find top 5 seeds
2. **Multi-path chain walking**: For each seed, DFS with backtracking explores all reachable paths (backward for recall, forward for predict). At branching points (agent transitions, cross-session links), all branches are explored and emitted as candidates
3. **Chain scoring**: Each candidate chain scored by median per-node cosine similarity to the query
2. **Multi-path chain walking**: For each seed, DFS with backtracking explores all reachable paths (backward for recall, forward for predict). Oversized chunks (larger than the token budget) are traversed through for graph connectivity but excluded from path output and scoring. At branching points (agent transitions, cross-session links), all branches are explored and emitted as candidates
3. **Chain scoring**: Each candidate chain scored by median per-node cosine similarity to the query (oversized chunks excluded from median)
4. **Best chain selection**: Highest median score among candidates with ≥ 2 chunks
5. **Fallback**: If no qualifying chain found, fall back to search-style results
5. **Budget-aware formatting**: Iterate through the selected chain, accepting only chunks that fit within the remaining token budget — no partial chunks
6. **Fallback**: If no qualifying chain found, fall back to search-style results

### Hybrid Search

Expand All @@ -95,6 +97,8 @@ MMR(c) = λ × relevance − (1−λ) × max_similarity(c, already_selected)

The first pick is always the top relevance hit. As selected items saturate a semantic neighbourhood, candidates from different topics become competitive — including cluster siblings that cover the same topic from a different angle. This benefits all search results, not just cluster expansion: even without clusters, MMR prevents near-duplicate vector hits from monopolising the token budget.

MMR is also budget-aware: it tracks remaining token budget during selection and excludes candidates that would exceed it. This prevents large chunks from winning a diversity slot only to be truncated or dropped during final assembly.

Controlled by `retrieval.mmrLambda` (default: 0.7). See [Configuration Reference](../reference/configuration.md#retrieval).

### Graceful Degradation
Expand Down
9 changes: 8 additions & 1 deletion docs/reference/mcp-tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -270,7 +270,14 @@ These diagnostics help distinguish between "memory is empty" and "memory exists

## Token Limits

Response sizes are controlled by `tokens.mcpMaxResponse` in the configuration (default: 20000 tokens). The `predict` tool uses half this budget. Long responses are truncated to fit within the budget.
Response sizes are controlled by `tokens.mcpMaxResponse` in the configuration (default: 20000 tokens). The `predict` tool uses half this budget.

Budget enforcement is applied at multiple stages:

1. **Pre-filter**: Chunks individually larger than the token budget are excluded before ranking.
2. **MMR selection**: The diversity reranking loop tracks remaining budget and skips candidates that would exceed it.
3. **Chain formatting**: Chain output assembly drops chunks that would exceed the remaining budget — no partial chunks are returned.
4. **Final assembly**: Search results are assembled within the budget with overhead accounting for per-chunk formatting.

## Error Handling

Expand Down
23 changes: 20 additions & 3 deletions docs/reference/traversal-algorithm.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,10 @@ function walkAllPaths(seedId, options):
pathVisited = Set() // Per-path, not global
pathState = { chunkIds: [], scores: [], tokens: 0 }

// Initialize with seed
// Initialize with seed (oversized seeds traversed but excluded from path)
pathVisited.add(seedId)
pushToPath(seedId)
if seedChunk.tokens <= tokenBudget:
pushToPath(seedId)

dfs(seedId, depth=1, consecutiveSkips=0)
return candidates
Expand Down Expand Up @@ -66,6 +67,12 @@ function dfs(currentId, depth, consecutiveSkips):
pathVisited.delete(edge.nextId)
continue

// Oversized chunk: pass through without adding to path
if chunk.tokens > tokenBudget:
dfs(edge.nextId, depth + 1, 0) // Continue traversal
pathVisited.delete(edge.nextId)
continue

score = scoreNode(edge.nextId) // Memoized
if tokens + chunk.tokens > tokenBudget:
emit(currentPath) // Budget hit — emit, don't extend
Expand Down Expand Up @@ -109,6 +116,14 @@ Path state (`chunkIds`, `chunks`, `nodeScores`, token count) uses push/pop with

When `agentFilter` is set and a non-matching chunk is encountered, the chunk is skipped (not added to output) but its edges are explored. `consecutiveSkips` is passed as a parameter to recursion — each recursive frame gets its own count, reset to 0 when a matching chunk is found. This prevents cross-frame interference during backtracking.

### Oversized chunk passthrough

Chunks individually larger than the token budget are treated as transparent nodes: traversed for graph connectivity but excluded from the path output, token count, and median score. This prevents a single large chunk from breaking an otherwise viable chain. The same applies to oversized seeds — they serve as DFS starting points but don't appear in the output.

### Budget-aware chain formatting

After chain selection, the output assembly iterates through chunks in order and only includes those that fit within the remaining token budget. Chunks that would exceed the budget are dropped entirely — no partial chunks are returned. Step numbering (`[1/N]`) reflects included chunks only.

## Pipeline Integration

The chain walker is part of the episodic retrieval pipeline:
Expand All @@ -122,12 +137,14 @@ Query
├─ 4. walkChains(seedIds, { direction, tokenBudget, queryEmbedding })
│ ├─ For each seed, DFS with backtracking explores all paths
│ ├─ Oversized chunks (> tokenBudget) passed through, not added to path
│ ├─ Emit candidate at: dead end, depth limit, or token budget
│ └─ Per-path visited set prevents cycles within a path
├─ 5. selectBestChain(candidates) → highest median per-node similarity with ≥ 2 chunks
├─ 6. If chain found → reverse for chronological output (recall only)
├─ 6. If chain found → budget-aware formatting (drop chunks exceeding remaining budget)
│ └─ Reverse for chronological output (recall only)
└─ 7. Else → fall back to search-style ranked results
```

Expand Down
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "causantic",
"version": "0.9.1",
"version": "0.9.2",
"description": "Long-term memory for Claude Code — local-first, graph-augmented, self-benchmarking",
"type": "module",
"private": false,
Expand Down
33 changes: 25 additions & 8 deletions src/retrieval/chain-assembler.ts
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ import { approximateTokens } from '../utils/token-counter.js';
import { searchContext, type SearchRequest } from './search-assembler.js';
import { walkChains, selectBestChain, type Chain } from './chain-walker.js';
import { formatChainChunk } from './formatting.js';
import type { StoredChunk } from '../storage/types.js';

/**
* Request for episodic retrieval.
Expand Down Expand Up @@ -183,8 +184,8 @@ async function runEpisodicPipeline(
};
}

// 4. Format chain as narrative
const formatted = formatChain(bestChain, direction);
// 4. Format chain as narrative (budget-aware: drops chunks that exceed remaining budget)
const formatted = formatChain(bestChain, direction, maxTokens);

return {
text: formatted.text,
Expand All @@ -205,10 +206,15 @@ async function runEpisodicPipeline(
/**
* Format a chain as ordered narrative.
* Backward chains are reversed for chronological output (problem → solution).
*
* Budget-aware: iterates through chunks in order, accepting only those that fit
* within the remaining token budget. Chunks that would exceed the budget are
* dropped entirely (no partial chunks).
*/
function formatChain(
chain: Chain,
direction: 'forward' | 'backward',
maxTokens: number,
): {
text: string;
tokenCount: number;
Expand All @@ -223,6 +229,19 @@ function formatChain(
const orderedChunks = direction === 'backward' ? [...chain.chunks].reverse() : chain.chunks;
const orderedIds = direction === 'backward' ? [...chain.chunkIds].reverse() : chain.chunkIds;

// First pass: select chunks that fit within budget
const included: Array<{ chunk: StoredChunk; id: string; chunkTokens: number }> = [];
let budgetRemaining = maxTokens;

for (let i = 0; i < orderedChunks.length; i++) {
const chunk = orderedChunks[i];
const chunkTokens = chunk.approxTokens || approximateTokens(chunk.content);
if (chunkTokens > budgetRemaining) continue;
included.push({ chunk, id: orderedIds[i], chunkTokens });
budgetRemaining -= chunkTokens;
}

// Second pass: format included chunks with correct step numbering
const parts: string[] = [];
const resultChunks: Array<{
id: string;
Expand All @@ -232,14 +251,12 @@ function formatChain(
}> = [];
let totalTokens = 0;

for (let i = 0; i < orderedChunks.length; i++) {
const chunk = orderedChunks[i];
const chunkTokens = chunk.approxTokens || approximateTokens(chunk.content);

parts.push(formatChainChunk(chunk, chunk.content, i + 1, orderedChunks.length));
for (let i = 0; i < included.length; i++) {
const { chunk, id, chunkTokens } = included[i];
parts.push(formatChainChunk(chunk, chunk.content, i + 1, included.length));
totalTokens += chunkTokens;
resultChunks.push({
id: orderedIds[i],
id,
sessionSlug: chunk.sessionSlug,
weight: chain.medianScore,
preview: chunk.content.slice(0, 100) + (chunk.content.length > 100 ? '...' : ''),
Expand Down
29 changes: 21 additions & 8 deletions src/retrieval/chain-walker.ts
Original file line number Diff line number Diff line change
Expand Up @@ -187,9 +187,20 @@ async function walkAllPaths(
continue;
}

const nodeScore = await scoreMemo(nextId);
const chunkTokens = chunk.approxTokens || 100;

// Oversized chunk (exceeds total budget on its own): pass through without
// adding to path. The chain doesn't break — we continue traversing — but
// this node won't appear in the output or affect the median score.
if (chunkTokens > tokenBudget) {
await dfs(nextId, depth + 1, 0);
anyChildEmitted = true;
pathVisited.delete(nextId);
continue;
}

const nodeScore = await scoreMemo(nextId);

// Token budget: emit current path, don't extend
if (pathTokens + chunkTokens > tokenBudget && pathChunkIds.length > 0) {
emitCandidate();
Expand Down Expand Up @@ -223,16 +234,18 @@ async function walkAllPaths(
}
}

// Initialize with seed
const seedScore = await scoreMemo(seedId);
// Initialize with seed (oversized seeds are traversed but excluded from path)
const seedTokens = seedChunk.approxTokens || 100;

pathVisited.add(seedId);
pathChunkIds.push(seedId);
pathChunks.push(seedChunk);
pathScores.push(seedScore);
pathScore = seedScore;
pathTokens = seedTokens;
if (seedTokens <= tokenBudget) {
const seedScore = await scoreMemo(seedId);
pathChunkIds.push(seedId);
pathChunks.push(seedChunk);
pathScores.push(seedScore);
pathScore = seedScore;
pathTokens = seedTokens;
}

await dfs(seedId, 1, 0);

Expand Down
46 changes: 45 additions & 1 deletion src/retrieval/mmr.ts
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,14 @@ export interface MMRConfig {
lambda: number;
}

/** Optional token budget for budget-aware MMR selection. */
export interface MMRBudgetOptions {
/** Total token budget for selected results. */
tokenBudget: number;
/** Token count per chunk ID. */
chunkTokenCounts: Map<string, number>;
}

/** Minimum candidate count to trigger MMR. Below this, diversity doesn't matter. */
const MMR_THRESHOLD = 10;

Expand All @@ -29,14 +37,21 @@ const MMR_THRESHOLD = 10;
* Short-circuits when fewer than 10 candidates (too few for diversity to matter).
* Candidates without embeddings are treated as novel (diversity = 0).
* Original RRF scores are preserved — only order changes.
*
* When `budget` is provided, candidates that would exceed the remaining token
* budget are excluded from consideration at each step. This prevents large chunks
* from consuming diversity slots when they can't fit in the response.
*/
export async function reorderWithMMR(
candidates: RankedItem[],
queryEmbedding: number[],
config: MMRConfig,
budget?: MMRBudgetOptions,
): Promise<RankedItem[]> {
if (candidates.length < MMR_THRESHOLD) {
return candidates;
if (!budget) return candidates;
// Still apply budget filtering even without MMR reranking
return applyBudgetFilter(candidates, budget);
}

const { lambda } = config;
Expand All @@ -60,13 +75,21 @@ export async function reorderWithMMR(
const selected: RankedItem[] = [];
const selectedEmbeddings: number[][] = [];
const remaining = new Set(candidates.map((_, i) => i));
let budgetRemaining = budget?.tokenBudget ?? Infinity;

while (remaining.size > 0) {
let bestIdx = -1;
let bestMMR = -Infinity;

for (const idx of remaining) {
const c = candidates[idx];

// Skip candidates that exceed remaining budget
if (budget) {
const tokens = budget.chunkTokenCounts.get(c.chunkId) ?? 0;
if (tokens > budgetRemaining) continue;
}

const rel = normalizedScores.get(c.chunkId)!;

// Compute max similarity to already-selected items
Expand All @@ -87,10 +110,17 @@ export async function reorderWithMMR(
}
}

// No candidate fits remaining budget
if (bestIdx === -1) break;

remaining.delete(bestIdx);
const picked = candidates[bestIdx];
selected.push(picked);

if (budget) {
budgetRemaining -= budget.chunkTokenCounts.get(picked.chunkId) ?? 0;
}

const pickedEmb = embeddings.get(picked.chunkId);
if (pickedEmb) {
selectedEmbeddings.push(pickedEmb);
Expand All @@ -99,3 +129,17 @@ export async function reorderWithMMR(

return selected;
}

/** Budget-only filter for below-threshold candidate lists (no MMR reranking). */
function applyBudgetFilter(candidates: RankedItem[], budget: MMRBudgetOptions): RankedItem[] {
const result: RankedItem[] = [];
let remaining = budget.tokenBudget;
for (const c of candidates) {
const tokens = budget.chunkTokenCounts.get(c.chunkId) ?? 0;
if (tokens <= remaining) {
result.push(c);
remaining -= tokens;
}
}
return result;
}
Loading