diff --git a/CLAUDE.md b/CLAUDE.md
index 4f635958..1a2b8d3f 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -1,28 +1,51 @@
 # CLAUDE.md
 
-A hierarchical, reasoning-native document intelligence engine written in Rust.
+Vectorless is a reasoning-native document intelligence engine written in Rust.
+
+## Principles
+
+- **Reason, don't vector.** — Every retrieval decision is an LLM decision.
+- **Model fails, we fail.** — No silent degradation. No heuristic fallbacks.
+- **No thought, no answer.** — Only LLM-reasoned output counts as an answer.
 
 ## Project Structure
 
 - `rust/` - Rust core engine
-  - `src/client/` - Client API (EngineBuilder, Engine)
-  - `src/config/` - Configuration types
-  - `src/document/` - Document parsers (Markdown, PDF)
-  - `src/index/` - Index building and pipeline
-  - `src/retrieval/` - Retrieval engine (beam search, MCTS, greedy, hybrid strategies)
-  - `src/storage/` - Storage layer
-  - `src/llm/` - LLM client abstraction
+  - `src/client/` - Client API (EngineBuilder, Engine) - facade layer, no business logic
+  - `src/document/` - Document data structures (DocumentTree, NavigationIndex, ReasoningIndex)
+  - `src/index/` - Compile pipeline (8-stage, checkpointing, incremental update)
+  - `src/retrieval/` - Retrieval dispatch layer (preprocessing, dispatch, postprocessing, cache, streaming)
+  - `src/query/` - Query understanding and planning (intent classification, rewrite, decomposition)
+  - `src/agent/` - Retrieval execution (Worker: doc navigation, Orchestrator: supervisor loop + multi-doc fusion)
+  - `src/rerank/` - Result reranking and answer synthesis (dedup, scoring, fusion, synthesis)
+  - `src/scoring/` - Scoring and ranking strategies (BM25, relevance scoring, score combination)
+  - `src/llm/` - LLM client (connection pool, memo/caching, throttle/rate-limiting, fallback)
+  - `src/storage/` - Persistence (Workspace, LRU cache, backend abstraction file/memory)
   - `src/graph/` - Cross-document relationship graph
-  - `src/memo/` - Caching and reasoning memo
-  - `src/metrics/` - Metrics and usage tracking
+  - `src/metrics/` - Metrics collection and reporting
   - `src/events/` - Event system for progress monitoring
-  - `src/throttle/` - Rate limiting
-  - `src/utils/` - Utility functions
+  - `src/config/` - Configuration types and validation
+  - `src/error.rs` - Unified error types
+  - `src/utils/` - Utility functions (token counting, fingerprinting, validation)
   - `examples/` - Rust examples (flow, indexing, pdf, batch, etc.)
-- `python/` - Python SDK (PyO3 bindings)
+- `python/` - Python SDK (PyO3 bindings) + CLI
 - `docs/` - Docusaurus documentation site
 - `samples/` - Sample files
 
+### Retrieval Call Flow
+
+```
+Engine.query()
+  → retrieval/dispatcher
+    → query/understand() → QueryPlan (LLM intent + concepts + strategy)
+    → Orchestrator (always, single or multi-doc)
+      → analyze(QueryPlan) → dispatch plan
+      → supervisor loop:
+          dispatch Workers → evaluate() →
+          if insufficient → replan() → loop
+      → rerank/ (dedup → BM25 score → synthesis/fusion)
+```
+
 ## Build Commands
 
 ```bash
diff --git a/Cargo.toml b/Cargo.toml
index 950779db..8e278a32 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -3,7 +3,7 @@ members = ["rust", "python"]
 resolver = "2"
 
 [workspace.package]
-version = "0.1.29"
+version = "0.1.30"
 edition = "2024"
 authors = ["zTgx <beautifularea@gmail.com>"]
 license = "Apache-2.0"
diff --git a/README.md b/README.md
index b9a34abd..847b0540 100644
--- a/README.md
+++ b/README.md
@@ -13,58 +13,15 @@
 
 </div>
 
-**Vectorless** is a reasoning-native document engine designed to be the foundational layer for AI applications that need structured access to documents, with the core written in Rust. It does not use vector databases, embeddings, or similarity search. Instead, it will reason through any of your structured documents — **PDFs, Markdown, reports, contracts** — and retrieve only what's relevant. Nothing more, nothing less.
+**Vectorless** is a reasoning-native document engine with the core written in Rust. It will reason through any of your structured documents — **PDFs, Markdown, reports, contracts** — and retrieve only what's relevant. Nothing more, nothing less.
 
+- **Reason, don't vector.** — Retrieval is guided by reasoning over document structure.
+- **Model fails, we fail.** — No silent degradation. No heuristic fallbacks.
+- **No thought, no answer.** — Only LLM-reasoned output counts as an answer.
 
 
-## How It Works
-
-<div align="center">
-  <img src="https://vectorless.dev/img/workflow.svg" alt="Vectorless Workflow" width="900">
-</div>
-
-<div align="center">
-  <img src="https://vectorless.dev/img/demo.gif" alt="Vectorless Demo" width="900">
-</div>
-
 ## Quick Start
 
-### Rust
-
-```toml
-[dependencies]
-vectorless = "0.1"
-```
-
-```rust
-use vectorless::{EngineBuilder, IndexContext, QueryContext};
-
-#[tokio::main]
-async fn main() -> vectorless::Result<()> {
-    let engine = EngineBuilder::new()
-        .with_key("sk-...")
-        .with_model("gpt-4o")
-        .with_endpoint("https://api.openai.com/v1")
-        .build()
-        .await?;
-
-    // Index a document
-    let result = engine.index(IndexContext::from_path("./report.pdf")).await?;
-    let doc_id = result.doc_id().unwrap();
-
-    // Query
-    let result = engine.query(
-        QueryContext::new("What is the total revenue?")
-            .with_doc_ids(vec![doc_id.to_string()])
-    ).await?;
-    println!("{}", result.content);
-
-    Ok(())
-}
-```
-
-### Python
-
 ```bash
 pip install vectorless
 ```
@@ -89,52 +46,6 @@ async def main():
 asyncio.run(main())
 ```
 
-## Core Concepts
-
-### Semantic Tree Index
-
-When you index a document, Vectorless builds a tree structure that mirrors the document's hierarchy:
-
-```
-Annual Report 2024
-├── Executive Summary
-│   ├── Financial Highlights
-│   └── Strategic Outlook
-├── Financial Statements
-│   ├── Revenue Analysis        ← "What is the total revenue?" lands here
-│   ├── Operating Expenses
-│   └── Net Income
-└── Risk Factors
-    ├── Market Risks
-    └── Regulatory Risks
-```
-
-Each node contains a summary generated by the LLM. During retrieval, the engine uses these summaries to reason about which path to follow — just like a human would scan a table of contents.
-
-### Cross-Document Graph
-
-When multiple documents are indexed, Vectorless builds a relationship graph connecting them through shared keywords and concepts. This enables queries across your entire document collection.
-
-```python
-# Query across all indexed documents
-result = await engine.query(
-    QueryContext("Compare revenue trends across all reports")
-)
-```
-
-### Workspace Persistence
-
-Indexed documents are stored in a workspace — there's no need to reprocess files between sessions:
-
-```python
-engine = Engine(api_key="sk-...", model="gpt-4o", endpoint="https://api.openai.com/v1")
-
-# List all indexed documents
-docs = await engine.list()
-for doc in docs:
-    print(f"{doc.name} ({doc.format}) — {doc.page_count} pages")
-```
-
 ## What It's For
 
 Vectorless is designed for applications that need **precise** document retrieval:
diff --git a/docs/docs/architecture.mdx b/docs/docs/architecture.mdx
index d98fa7e6..3bb37ca6 100644
--- a/docs/docs/architecture.mdx
+++ b/docs/docs/architecture.mdx
@@ -55,29 +55,66 @@ TreeNode
 
 ## Retrieval Pipeline
 
-The retrieval pipeline consists of four phases:
+The retrieval pipeline is a supervisor loop driven entirely by LLM reasoning. Every decision — which documents to query, how to navigate, whether evidence is sufficient — is made by the model, not by heuristics.
 
-1. **Analyze** — Detect query complexity, extract keywords, decompose complex queries
-2. **Plan** — Select retrieval strategy and search algorithm
-3. **Search** — Execute tree traversal with Pilot guidance
-4. **Evaluate** — Score, deduplicate, and aggregate results
+### Principles
 
-### Pilot
+- **Reason, don't vector.** — Every retrieval decision is an LLM decision.
+- **Model fails, we fail.** — No silent degradation. No heuristic fallbacks.
+- **No thought, no answer.** — Only LLM-reasoned output counts as an answer.
 
-The Pilot is the core intelligence component. It provides LLM-guided navigation at key decision points:
+### Flow
 
-- **Fork points** — When multiple children exist, Pilot evaluates which path to follow
-- **Backtracking** — When a path yields insufficient results, Pilot suggests alternatives
-- **Binary pruning** — Quick relevance filter for nodes with many children
+```text
+Engine.query()
+  → Dispatcher
+    → Query Understanding (LLM) → QueryPlan (intent, concepts, strategy)
+    → Orchestrator (always — single or multi-doc)
+      → Analyze (LLM selects documents + tasks)
+      → Supervisor Loop:
+          Dispatch Workers → Evaluate (LLM sufficiency check)
+          → if insufficient → Replan (LLM) → loop
+      → Rerank (dedup → BM25 score → synthesis/fusion)
+```
+
+### Query Understanding
+
+Every query first passes through LLM-based understanding:
+
+| Field | Description |
+|-------|-------------|
+| **Intent** | Factual, Analytical, Navigational, or Summary |
+| **Complexity** | Simple, Moderate, or Complex |
+| **Key Concepts** | LLM-extracted concepts (distinct from keywords) |
+| **Strategy Hint** | focused, exploratory, comparative, or summary |
+
+### Orchestrator (Supervisor)
+
+The Orchestrator is the central coordinator. It always runs — even for single-document queries. Its supervisor loop:
+
+1. **Analyze** — LLM reviews DocCards and selects relevant documents with specific tasks
+2. **Dispatch** — Fan-out Workers in parallel (one per document)
+3. **Evaluate** — LLM checks if collected evidence is sufficient to answer the query
+4. **Replan** (if insufficient) — LLM identifies missing information and dispatches additional Workers
+
+### Worker (Evidence Collector)
+
+Each Worker navigates a single document's tree to collect evidence:
+
+1. **Bird's-eye** — `ls` the root for an overview
+2. **Plan** — LLM generates a navigation plan
+3. **Navigate** — Loop: LLM → command → execute → repeat (with budget)
+4. **Return** — Collected evidence only — no answer synthesis
+
+Workers use tree commands (`ls`, `cd`, `cat`, `grep`, `find`, `findtree`) and a `check` command for self-evaluation.
+
+### Rerank Pipeline
 
-### Search Algorithms
+After all Workers complete, the Orchestrator runs the final pipeline:
 
-| Algorithm | Description | Use Case |
-|-----------|-------------|----------|
-| **Beam Search** | Explores multiple paths with backtracking | General purpose (recommended) |
-| **MCTS** | Monte Carlo Tree Search with UCT selection | Complex multi-hop queries |
-| **Pure Pilot** | Greedy single-path, Pilot at every level | High-accuracy, higher token cost |
-| **ToC Navigator** | Table-of-contents based location | Broad queries ("what is this about?") |
+1. **Dedup** — Remove duplicate and low-quality evidence
+2. **BM25 Scoring** — Rank evidence by keyword relevance
+3. **Answer Generation** — LLM synthesizes or fuses evidence into a final answer
 
 ## Cross-Document Graph
 
diff --git a/docs/docs/indexing/overview.mdx b/docs/docs/indexing/overview.mdx
index f50d01a1..dbc93256 100644
--- a/docs/docs/indexing/overview.mdx
+++ b/docs/docs/indexing/overview.mdx
@@ -2,97 +2,222 @@
 sidebar_position: 1
 ---
 
-# Indexing Overview
+# Indexing Pipeline
 
-The indexing pipeline transforms documents into searchable hierarchical trees. This page describes each stage and how they work together.
+The compile pipeline transforms raw documents into hierarchical tree structures with pre-computed navigation indexes, ready for Agent-driven retrieval.
 
-## Pipeline Stages
+## Pipeline Overview
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│                      Compile Pipeline                           │
+│                                                                 │
+│  Document ──→ Parse ──→ Build ──→ Validate ──→ Split           │
+│  (md/pdf)     (10)     (20)     (22)       (25)               │
+│                                                                 │
+│             ──→ Enhance ──→ Enrich ──→ ReasoningIndex (45)     │
+│                (30)       (40)                                  │
+│                                                                 │
+│             ──→ NavigationIndex ──→ Optimize                   │
+│                 (50)              (60)                          │
+│                                                                 │
+│  Output: DocumentTree + ReasoningIndex + NavigationIndex       │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+Numbers in parentheses are stage priorities — lower values execute first. Stages at the same priority level run in parallel when their dependency graph allows it.
+
+---
+
+## Stage Details
 
 ### Parse (Priority 10)
 
-Parses raw documents into a list of `RawNode` structures:
+Parses raw documents into `RawNode` structures, preserving the source hierarchy.
+
+| Format | Strategy |
+|--------|----------|
+| **Markdown** | Splits by heading levels (`#`, `##`, `###`). Each heading becomes a node, content between headings becomes node content. |
+| **PDF** | Extracts text per page, groups into sections by layout analysis and font-size heuristics. |
 
-- **Markdown** — Splits by headings (`#`, `##`, `###`), preserving hierarchy
-- **PDF** — Extracts text per page, groups into sections by layout analysis
+Each `RawNode` carries: title, content, heading level, line range, page number, and an estimated token count.
 
 ### Build (Priority 20)
 
-Constructs an arena-based `DocumentTree` from raw nodes:
+Constructs an arena-based [`DocumentTree`](https://docs.rs/vectorless) from the parsed raw nodes.
+
+- Creates parent-child relationships based on heading level nesting
+- Applies **thinning** — collapses single-child chains where a parent has exactly one child and no meaningful content of its own. This reduces unnecessary tree depth
+- Assigns sequential node IDs and hierarchical structure indices (e.g. `"1"`, `"1.2"`, `"1.2.3"`)
 
-- Creates parent-child relationships based on heading levels
-- Applies **thinning** — merges single-child chains to reduce tree depth
-- Merges content from nodes that don't add structural value
+### Validate (Priority 22, optional)
 
-### Validate (Priority 22)
+Checks tree integrity before expensive LLM stages run:
 
-Checks tree integrity:
+- No orphaned nodes (every node reachable from root)
+- Consistent depth values (child depth = parent depth + 1)
+- Valid parent-child relationships in the arena
 
-- No orphaned nodes
-- Consistent depth values
-- Valid parent-child relationships
+Failures at this stage prevent broken trees from propagating downstream.
 
-### Split (Priority 25)
+### Split (Priority 25, optional)
 
-Splits oversized leaf nodes that exceed the token threshold (default: 4000 tokens):
+Splits oversized leaf nodes that exceed the token threshold (default: 4000 tokens).
 
-- Finds paragraph or heading boundaries for clean splits
-- Preserves semantic coherence within each split
+- Finds paragraph boundaries or heading-like patterns for clean splits
+- Each split becomes a new leaf node under the same parent
+- Preserves semantic coherence — splits don't break mid-sentence
 
 ### Enhance (Priority 30)
 
-Generates LLM summaries for tree nodes:
+Generates LLM summaries for tree nodes. This is the most expensive stage (LLM API calls) and produces data that all downstream indexes consume.
+
+**Leaf nodes** receive content-oriented summaries:
+
+> *What does this section say?*
 
-- **Full** — Summarize every node (default)
-- **Selective** — Only branch nodes above a token threshold
-- **Lazy** — Generate summaries on-demand at query time
+**Non-leaf (branch) nodes** receive structured navigation output with three components:
 
-Branch nodes get navigation-oriented summaries ("what does this section cover"), while leaf nodes get content-oriented summaries ("what does this section say").
+| Component | Purpose | Consumed by |
+|-----------|---------|-------------|
+| **OVERVIEW** | 2-3 sentence routing summary ("what this branch covers") | `NavEntry.overview` |
+| **QUESTIONS** | 3-5 typical questions this branch can answer | `NavEntry.question_hints` |
+| **TAGS** | 2-4 topic keywords for fast-path matching | `NavEntry.topic_tags` |
+
+Example LLM output for a non-leaf node:
+
+```
+OVERVIEW: Covers the Wix payment ecosystem, spanning payment processing, account management, and financial operations.
+QUESTIONS: How to set up Wix Payments?, What payout schedules are available?, How to handle chargebacks?
+TAGS: payments, billing, invoices, refunds
+```
+
+The stage supports four strategies:
+
+| Strategy | Behavior |
+|----------|----------|
+| **Full** | Summarize every node (default) |
+| **Selective** | Only branch nodes above a token threshold |
+| **Lazy** | Generate summaries on-demand at query time |
+| **None** | Skip LLM summaries entirely |
+
+All LLM calls run concurrently with configurable concurrency limits, and results are cached via the [MemoStore](../features/summary-strategies.mdx) for incremental re-indexing.
 
 ### Enrich (Priority 40)
 
-Adds metadata to the tree:
+Adds structural metadata to the tree — no LLM calls, pure computation:
 
-- **Page ranges** — Propagates page boundaries from leaves to parents
-- **Token statistics** — Calculates total tokens and node counts
-- **Cross-reference resolution** — Parses "see Section 2.1", "Appendix G" references and resolves them to actual `NodeId`s in the tree
+- **Page ranges** — Propagates page boundaries from leaves up to parent nodes, enabling page-level citation
+- **Token statistics** — Calculates total tokens and node counts per subtree
+- **Cross-reference resolution** — Parses inline references like "see Section 2.1" or "refer to Appendix G" and resolves them to actual `NodeId`s in the tree
 - **Document description** — Generates a description from the root summary
 
 ### Reasoning Index (Priority 45)
 
-Builds a pre-computed index for fast retrieval:
+Builds a [`ReasoningIndex`](https://docs.rs/vectorless) — a flat keyword-to-node mapping optimized for traditional retrieval:
+
+| Field | Description |
+|-------|-------------|
+| `topic_paths` | Keywords → nodes with weighted mappings. Title keywords get 2.0×, summary 1.5×, content 1.0× |
+| `summary_shortcut` | Pre-computed document overview for "what is this about" queries |
+| `section_map` | Depth-1 section titles → `NodeId` for fast ToC lookup |
+| `hot_nodes` | Frequently retrieved nodes tracked over time |
 
-- **Topic paths** — Maps keywords to nodes with weights (title: 2.0x, summary: 1.5x, content: 1.0x)
-- **Synonym expansion** — Expands top keywords with LLM-generated synonyms (0.6x weight)
-- **Summary shortcut** — Pre-computed document overview for "what is this about" queries
-- **Section map** — Depth-1 section titles for fast ToC lookup
+Optionally expands top keywords with **LLM-generated synonyms** at 0.6× weight, enabling fuzzy keyword matching without vector embeddings.
 
-### Optimize (Priority 60)
+### Navigation Index (Priority 50)
 
-Final tree optimization:
+Builds a [`NavigationIndex`](https://docs.rs/vectorless) — the primary data source for the retrieval Agent. This stage is pure data organization: no LLM calls, it restructures data produced by the Enhance stage.
 
-- Removes redundant metadata
-- Compacts tree structure for efficient storage
+For every non-leaf node, it creates:
 
-## Pipeline Options
+**`NavEntry`** — routing metadata for the Agent to decide "should I enter this branch?"
+
+```rust
+pub struct NavEntry {
+    pub overview: String,          // Routing summary from Enhance
+    pub question_hints: Vec<String>, // Typical questions from Enhance
+    pub topic_tags: Vec<String>,   // Topic keywords from Enhance
+    pub leaf_count: usize,         // Total leaves in this subtree
+    pub level: usize,              // Depth in tree (root = 0)
+}
+```
+
+**`ChildRoute`** — compact routing info for one child, enabling progressive disclosure:
+
+```rust
+pub struct ChildRoute {
+    pub node_id: NodeId,
+    pub title: String,
+    pub description: String,       // One-sentence routing description
+    pub leaf_count: usize,         // Leaves in this child's subtree
+}
+```
+
+The Agent reads `child_routes` at each decision point to see all available sub-topics and their descriptions, then chooses where to navigate next — without accessing the content layer.
+
+This design is the in-memory equivalent of the [SKILL.md / INDEX.md files](https://arxiv.org/abs/2604.14572) described in the Corpus2Skill paper.
+
+### Optimize (Priority 60, optional)
+
+Final tree structure optimizations:
+
+- **Merge small leaves** — Adjacent sibling leaves below the token threshold are merged into a single node, with content prefixed by `## Title` to preserve boundaries
+- **Remove empty intermediates** — Non-leaf nodes with no content and exactly one child are marked for removal (collapsing the chain)
+
+---
+
+## Data Flow
+
+```
+Document (md/pdf)
+      │
+      ▼
+  ┌─────────┐     ┌──────────────┐     ┌─────────────────┐
+  │ TreeNode  │     │ReasoningIndex│     │NavigationIndex  │
+  │ (content) │     │ (keyword →   │     │ (Node → NavEntry│
+  │           │     │  node map)   │     │ + ChildRoutes)  │
+  └────┬─────┘     └──────┬───────┘     └───────┬─────────┘
+       │                  │                      │
+       │    Retrieved by  │   Retrieved by       │
+       │    content       │   keyword lookup     │   Agent navigation
+       │    collection    │                      │
+       ▼                  ▼                      ▼
+  ┌──────────────────────────────────────────────────────┐
+  │                  Retrieval Phase                      │
+  │  Agent reads NavigationIndex to decide where to go,  │
+  │  then reads TreeNode.content only when needed.        │
+  └──────────────────────────────────────────────────────┘
+```
+
+---
+
+## Usage
 
 ```python
-from vectorless import IndexOptions
+from vectorless import Engine
 
-# Default options (synonym expansion enabled)
-opts = IndexOptions()
+engine = Engine.builder().build()
 
-# Force re-indexing
-opts = IndexOptions(mode="force")
+# Compile a document (runs all stages)
+result = engine.compile("./docs/")
 
-# Disable summaries for speed
-opts = IndexOptions(generate_summaries=False)
+# Access the indexes
+print(f"Tree nodes: {result.node_count()}")
+print(f"Reasoning index keywords: {result.keyword_count()}")
+print(f"Navigation entries: {result.nav_entry_count()}")
 ```
 
-## Incremental Indexing
+```rust
+use vectorless::client::EngineBuilder;
+
+let engine = EngineBuilder::new().build()?;
+let result = engine.compile("./docs/").await?;
+
+println!("Tree nodes: {}", result.node_count());
+println!("Nav entries: {}", result.nav_entry_count());
+```
 
-When indexing with `mode="incremental"`, the pipeline:
+## Configuration
 
-1. Computes a content fingerprint (hash) of the input
-2. Compares against the previously stored fingerprint
-3. Skips reprocessing if the content hasn't changed
-4. Reuses existing summaries and reasoning index data for unchanged nodes
+See [Indexing Configuration](./configuration.mdx) for all available options including summary strategies, token thresholds, and concurrency settings.
diff --git a/docs/docusaurus.config.ts b/docs/docusaurus.config.ts
index b72d033d..b798b053 100644
--- a/docs/docusaurus.config.ts
+++ b/docs/docusaurus.config.ts
@@ -57,6 +57,7 @@ const config: Config = {
   themeConfig: {
     image: 'img/theme-logo.png',
     colorMode: {
+      defaultMode: 'dark',
       respectPrefersColorScheme: true,
     },
     navbar: {
diff --git a/docs/src/components/GitHubStar/styles.module.css b/docs/src/components/GitHubStar/styles.module.css
index 051e45c1..53f9df4c 100644
--- a/docs/src/components/GitHubStar/styles.module.css
+++ b/docs/src/components/GitHubStar/styles.module.css
@@ -10,7 +10,7 @@
   height: 24px;
   padding: 0 8px;
   background-color: #f3f3f3;
-  color: #333;
+  color: #111827;
   border: 1px solid #d5d5d5;
   border-radius: 3px;
   text-decoration: none;
@@ -23,7 +23,7 @@
 .githubStarButton:hover {
   background-color: #e6e6e6;
   text-decoration: none;
-  color: #333;
+  color: #111827;
 }
 
 .githubStarText {
@@ -41,7 +41,7 @@
   min-width: 40px;
   height: 24px;
   background-color: #f3f3f3;
-  color: #333;
+  color: #111827;
   position: relative;
   margin-left: 5px;
   padding: 0 5px;
@@ -61,7 +61,7 @@
 .githubStarCount:hover {
   background-color: #e6e6e6;
   text-decoration: none;
-  color: #333;
+  color: #111827;
 }
 
 .githubStarCount::after,
diff --git a/docs/src/css/custom.css b/docs/src/css/custom.css
index b6f2ebc6..b1b96b06 100644
--- a/docs/src/css/custom.css
+++ b/docs/src/css/custom.css
@@ -2,59 +2,65 @@
  * Any CSS included here will be global. The classic template
  * bundles Infima by default. Infima is a CSS framework designed to
  * work well for content-centric websites.
+ *
+ * Color scheme: dark-first with amber accent.
  */
 
 :root {
-  --primary: #AF788B;
-  --primary-dark: #8B5E6F;
-  --primary-deeper: #6E4556;
-  --primary-light: #C9A0AE;
-  --primary-soft: rgba(175, 120, 139, 0.12);
-  --text: #1e293b;
-  --text-light: #5a4a52;
-  --bg: #ffffff;
-  --bg-secondary: #F5EBEE;
-  --bg-offset: #F6F8FA;
-  --card-bg: #FAF5F7;
-  --border: #E2E8F0;
+  --primary: #F59E0B;
+  --primary-dark: #D97706;
+  --primary-deeper: #B45309;
+  --primary-light: #FBBF24;
+  --primary-soft: rgba(245, 158, 11, 0.12);
+  --accent-tech: #5E6AD2;
+  --accent-green: #10B981;
+  --text: #1F2937;
+  --text-light: #6B7280;
+  --bg: #FFFFFF;
+  --bg-secondary: #F9FAFB;
+  --bg-offset: #F3F4F6;
+  --card-bg: #FFFFFF;
+  --border: #E5E7EB;
   --code-bg: #0F172A;
   --code-text: #E2E8F0;
-  --code-comment: #6272A4;
-  --code-keyword: #FF79C6;
-
-  --ifm-color-primary: #AF788B;
-  --ifm-color-primary-dark: #8B5E6F;
-  --ifm-color-primary-darker: #7a5062;
-  --ifm-color-primary-darkest: #6E4556;
-  --ifm-color-primary-light: #C9A0AE;
-  --ifm-color-primary-lighter: #d3b0bb;
-  --ifm-color-primary-lightest: #e8d0d8;
+  --code-comment: #6B7280;
+  --code-keyword: #5E6AD2;
+
+  --ifm-color-primary: #F59E0B;
+  --ifm-color-primary-dark: #D97706;
+  --ifm-color-primary-darker: #B45309;
+  --ifm-color-primary-darkest: #92400E;
+  --ifm-color-primary-light: #FBBF24;
+  --ifm-color-primary-lighter: #FCD34D;
+  --ifm-color-primary-lightest: #FDE68A;
   --ifm-code-font-size: 95%;
-  --docusaurus-highlighted-code-line-bg: rgba(175, 120, 139, 0.1);
+  --ifm-background-color: #FFFFFF;
+  --docusaurus-highlighted-code-line-bg: rgba(245, 158, 11, 0.08);
 }
 
 [data-theme='dark'] {
-  --text: #EEF2FF;
-  --text-light: #8B9AB0;
-  --bg: #0A0C10;
-  --bg-secondary: #11151A;
-  --bg-offset: #11151A;
-  --card-bg: #11151A;
-  --border: #1E293B;
-  --primary-soft: rgba(201, 160, 174, 0.15);
-  --code-bg: #010409;
-  --code-text: #E2E8F0;
-  --code-comment: #6272A4;
-  --code-keyword: #FF79C6;
-
-  --ifm-color-primary: #C9A0AE;
-  --ifm-color-primary-dark: #b88d9d;
-  --ifm-color-primary-darker: #af7f91;
-  --ifm-color-primary-darkest: #96637A;
-  --ifm-color-primary-light: #d3b0bb;
-  --ifm-color-primary-lighter: #ddc0cb;
-  --ifm-color-primary-lightest: #ebd5dc;
-  --docusaurus-highlighted-code-line-bg: rgba(175, 120, 139, 0.2);
+  --text: #EDF2F8;
+  --text-light: #9AA4BF;
+  --bg: #141720;
+  --bg-secondary: #1A1F27;
+  --bg-offset: #1A1F27;
+  --card-bg: #1E2430;
+  --border: #2A3040;
+  --primary-soft: rgba(245, 158, 11, 0.12);
+  --code-bg: #0E1117;
+  --code-text: #CBD5E1;
+  --code-comment: #6B7280;
+  --code-keyword: #5E6AD2;
+
+  --ifm-color-primary: #FBBF24;
+  --ifm-color-primary-dark: #F59E0B;
+  --ifm-color-primary-darker: #D97706;
+  --ifm-color-primary-darkest: #B45309;
+  --ifm-color-primary-light: #FCD34D;
+  --ifm-color-primary-lighter: #FDE68A;
+  --ifm-color-primary-lightest: #FEF3C7;
+  --docusaurus-highlighted-code-line-bg: rgba(245, 158, 11, 0.15);
+  --ifm-background-color: #141720;
 }
 
 /* ===== Navbar ===== */
@@ -68,7 +74,7 @@
 
 .navbar__inner {
   height: 68px !important;
-  max-width: 1200px;
+  max-width: 1280px;
   margin: 0 auto;
 }
 
@@ -80,7 +86,7 @@
   font-size: 1.4rem !important;
   font-weight: 700 !important;
   color: var(--text) !important;
-  letter-spacing: -0.01em;
+  letter-spacing: -0.3px;
 }
 
 .navbar__logo {
@@ -96,16 +102,16 @@
 }
 
 .navbar__link:hover {
-  color: var(--primary-dark) !important;
+  color: var(--primary) !important;
 }
 
 .navbar__link--active {
-  color: var(--primary-dark) !important;
+  color: var(--primary) !important;
 }
 
 [data-theme='dark'] .navbar {
   background-color: var(--bg) !important;
-  border-bottom: none !important;
+  border-bottom-color: var(--border) !important;
 }
 
 [data-theme='dark'] .navbar__title {
@@ -117,11 +123,11 @@
 }
 
 [data-theme='dark'] .navbar__link:hover {
-  color: var(--primary-light) !important;
+  color: var(--primary) !important;
 }
 
 [data-theme='dark'] .navbar__link--active {
-  color: var(--primary-light) !important;
+  color: var(--primary) !important;
 }
 
 /* ===== Footer ===== */
diff --git a/docs/src/pages/index.module.css b/docs/src/pages/index.module.css
index f2685eb8..5b963e9a 100644
--- a/docs/src/pages/index.module.css
+++ b/docs/src/pages/index.module.css
@@ -32,11 +32,11 @@
 }
 
 .heroTitleEmphasis {
-  color: var(--text);
+  color: var(--primary);
 }
 
 .heroTitleLight {
-  color: #9a8a92;
+  color: #9AA4BF;
   font-weight: 300;
 }
 
@@ -176,9 +176,9 @@
 .demoCard {
   max-width: 1200px;
   margin: 0 auto;
-  background: #121417;
+  background: #161A1F;
   border-radius: 16px;
-  border: 1px solid #23262B;
+  border: 1px solid #252A30;
   overflow: hidden;
   box-shadow: 0 12px 40px rgba(0, 0, 0, 0.4);
 }
@@ -188,15 +188,15 @@
   align-items: center;
   gap: 1.5rem;
   padding: 0 1.25rem;
-  border-bottom: 1px solid #23262B;
-  background: #121417;
+  border-bottom: 1px solid #252A30;
+  background: #161A1F;
 }
 
 .demoTab {
   padding: 0.875rem 0 0.75rem;
   font-size: 0.8rem;
   font-weight: 500;
-  color: #8E95A3;
+  color: #9AA4BF;
   border: none;
   border-bottom: 2px solid transparent;
   background: transparent;
@@ -207,22 +207,22 @@
 }
 
 .demoTabActive {
-  color: #AF788B;
-  border-bottom-color: #AF788B;
+  color: #F59E0B;
+  border-bottom-color: #F59E0B;
 }
 
 .demoTab:not(.demoTabActive):hover {
-  color: #EBEDF0;
+  color: #EDF2F8;
 }
 
 .demoPanel {
-  background: #0B0D0E;
+  background: #111317;
 }
 
 .demoCodeHeader {
   padding: 0.75rem 1.25rem;
-  background: #0B0D0E;
-  border-bottom: 1px solid #2A2E34;
+  background: #111317;
+  border-bottom: 1px solid #252A30;
   display: flex;
   align-items: center;
   gap: 0.75rem;
@@ -256,7 +256,7 @@
   margin-left: auto;
   background: transparent;
   border: none;
-  color: #8E95A3;
+  color: #9AA4BF;
   font-size: 0.7rem;
   font-family: 'JetBrains Mono', 'Fira Code', monospace;
   cursor: pointer;
@@ -268,7 +268,7 @@
 
 .copyBtn:hover {
   background: rgba(175, 120, 139, 0.1);
-  color: #AF788B;
+  color: #F59E0B;
 }
 
 .demoPre {
@@ -278,8 +278,8 @@
   font-family: 'JetBrains Mono', 'Fira Code', 'SF Mono', Menlo, monospace;
   font-size: 0.85rem;
   line-height: 1.75;
-  color: #EBEDF0;
-  background: #0B0D0E;
+  color: #EDF2F8;
+  background: #111317;
 }
 
 .demoPre code {
@@ -292,25 +292,25 @@
 
 /* Syntax highlight tokens */
 .hlKeyword {
-  color: #AF788B;
+  color: #F59E0B;
   font-weight: 500;
 }
 
 .hlFunction {
-  color: #6DCDFF;
+  color: #5E6AD2;
 }
 
 .hlString {
-  color: #B0E57C;
+  color: #10B981;
 }
 
 .hlComment {
-  color: #5E6673;
+  color: #6B7280;
   font-style: italic;
 }
 
 .hlType {
-  color: #6DCDFF;
+  color: #5E6AD2;
 }
 
 .hlAttribute {
@@ -318,21 +318,21 @@
 }
 
 .terminalOutput {
-  background: #0B0D0E;
-  border-top: 1px solid #2A2E34;
+  background: #111317;
+  border-top: 1px solid #252A30;
   padding: 1rem 2rem;
   font-family: 'JetBrains Mono', 'Fira Code', monospace;
   font-size: 0.78rem;
-  color: #8E95A3;
+  color: #9AA4BF;
   line-height: 1.7;
 }
 
 .terminalPrompt {
-  color: #B0E57C;
+  color: #10B981;
 }
 
 .terminalAnswer {
-  color: #EBEDF0;
+  color: #EDF2F8;
 }
 
 .terminalCursor {
@@ -352,8 +352,8 @@
 
 .installBar {
   padding: 1rem 2rem;
-  background: #121417;
-  border-top: 1px solid #23262B;
+  background: #161A1F;
+  border-top: 1px solid #252A30;
   display: flex;
   align-items: center;
   justify-content: space-between;
@@ -364,19 +364,19 @@
 .installCommand {
   font-family: 'JetBrains Mono', 'Fira Code', monospace;
   font-size: 0.75rem;
-  color: #8E95A3;
-  background: #0B0D0E;
+  color: #9AA4BF;
+  background: #111317;
   padding: 0.4rem 1rem;
   border-radius: 8px;
-  border: 1px solid #23262B;
+  border: 1px solid #252A30;
 }
 
 .installCommand span {
-  color: #AF788B;
+  color: #F59E0B;
 }
 
 .installBtn {
-  background: #AF788B;
+  background: #F59E0B;
   border: none;
   color: white;
   font-size: 0.75rem;
@@ -389,23 +389,127 @@
 }
 
 .installBtn:hover {
-  background: #9A6A7C;
+  background: #D97706;
 }
 
-/* ===== How It Works ===== */
-.workflowWrapper {
-  max-width: 100%;
-  padding: 0;
+/* ===== Navigation Theater ===== */
+.narrativeDemo {
+  background: var(--code-bg);
+  border: 1px solid var(--border);
+  border-radius: 16px;
+  padding: 2rem 2.5rem;
+  max-width: 780px;
   margin: 0 auto;
-  text-align: center;
 }
 
-.workflowImg {
-  width: 100%;
-  height: auto;
-  display: block;
-  min-height: 520px;
-  object-fit: contain;
+.narrativeHeader {
+  display: flex;
+  gap: 12px;
+  align-items: center;
+  margin-bottom: 28px;
+}
+
+.narrativeTitle {
+  font-size: 1.25rem;
+  font-weight: 600;
+  color: #E2E8F0;
+  margin: 0;
+}
+
+.narrativeBadge {
+  background: rgba(245, 158, 11, 0.12);
+  color: var(--primary);
+  padding: 2px 12px;
+  border-radius: 30px;
+  font-size: 0.7rem;
+  margin-left: auto;
+}
+
+.navTrack {
+  display: flex;
+  flex-direction: column;
+  gap: 0;
+  position: relative;
+}
+
+/* vertical timeline line */
+.navTrack::before {
+  content: '';
+  position: absolute;
+  left: 18px;
+  top: 24px;
+  bottom: 24px;
+  width: 2px;
+  background: #2A3040;
+  border-radius: 1px;
+}
+
+.trackStep {
+  display: flex;
+  flex-direction: column;
+  padding: 0.75rem 0 0.75rem 48px;
+  position: relative;
+}
+
+.trackStep .stepBadge {
+  display: inline-flex;
+  align-items: center;
+  gap: 6px;
+  font-size: 0.78rem;
+  font-weight: 600;
+  color: var(--primary);
+  margin-bottom: 4px;
+  position: relative;
+}
+
+/* dot on the timeline */
+.trackStep .stepBadge::before {
+  content: '';
+  position: absolute;
+  left: -36px;
+  top: 50%;
+  transform: translateY(-50%);
+  width: 10px;
+  height: 10px;
+  border-radius: 50%;
+  background: var(--primary);
+  border: 2px solid var(--code-bg);
+  z-index: 1;
+}
+
+.stepBadgeGreen {
+  color: var(--accent-green) !important;
+}
+
+.stepBadgeGreen::before {
+  background: var(--accent-green) !important;
+}
+
+.stepContent {
+  font-size: 0.9rem;
+  line-height: 1.6;
+  color: #C8D0DE;
+}
+
+/* inline <code> inside narrative demo — dark-friendly */
+.stepContent code {
+  background: #1A1F27;
+  color: #E2E8F0;
+  padding: 2px 6px;
+  border-radius: 4px;
+  font-size: 0.85em;
+  border: 1px solid #2A3040;
+}
+
+.hamsterVoice {
+  background: rgba(245, 158, 11, 0.08);
+  border-left: 3px solid var(--primary);
+  border-radius: 8px;
+  padding: 1rem 1.25rem;
+  margin: 0.75rem 0 0.75rem 48px;
+  font-size: 0.85rem;
+  line-height: 1.6;
+  color: #B0B8C8;
 }
 
 /* ===== Use Cases Slider ===== */
@@ -424,8 +528,8 @@
 
 .caseCard {
   flex: 0 0 calc(65% - 0.75rem);
-  background: #121417;
-  border: 1px solid #23262B;
+  background: #161A1F;
+  border: 1px solid #252A30;
   border-radius: 16px;
   padding: 3rem 3rem 2.5rem;
   opacity: 0.4;
@@ -443,50 +547,50 @@
   opacity: 1;
   transform: scale(1);
   filter: brightness(1);
-  border-color: #AF788B;
-  box-shadow: 0 12px 40px rgba(175, 120, 139, 0.18);
+  border-color: #F59E0B;
+  box-shadow: 0 12px 40px rgba(245, 158, 11, 0.18);
 }
 
 .caseTitle {
   font-size: 1.55rem;
   font-weight: 600;
   margin: 0 0 0.75rem;
-  color: #EBEDF0;
+  color: #EDF2F8;
 }
 
 .caseDesc {
-  color: #8E95A3;
+  color: #9AA4BF;
   font-size: 1.05rem;
   line-height: 1.7;
   margin: 0 0 1.75rem;
 }
 
 .caseQuery {
-  background: #0B0D0E;
+  background: #111317;
   border-radius: 12px;
   padding: 1.5rem 1.75rem;
   font-family: 'JetBrains Mono', 'Fira Code', monospace;
   font-size: 0.88rem;
-  color: #EBEDF0;
-  border: 1px solid #23262B;
+  color: #EDF2F8;
+  border: 1px solid #252A30;
   line-height: 1.7;
 }
 
 .caseQueryLabel {
-  color: #AF788B;
+  color: #F59E0B;
   font-weight: 600;
   margin-bottom: 0.5rem;
 }
 
 .caseQueryText {
-  color: #EBEDF0;
+  color: #EDF2F8;
 }
 
 .caseAnswer {
-  color: #8E95A3;
+  color: #9AA4BF;
   margin-top: 0.75rem;
   padding-top: 0.75rem;
-  border-top: 1px solid #23262B;
+  border-top: 1px solid #252A30;
   font-size: 0.75rem;
 }
 
@@ -544,7 +648,7 @@
 
 /* ===== CTA ===== */
 .sectionCtaDark {
-  background: #0B0D0E;
+  background: #111317;
   padding: 3rem 1.5rem;
 }
 
@@ -560,12 +664,12 @@
   font-weight: 700;
   letter-spacing: -0.02em;
   margin: 0 0 1rem;
-  color: var(--text);
+  color: var(--primary);
 }
 
 .ctaDesc {
   font-size: 1.05rem;
-  color: #8E95A3;
+  color: #9AA4BF;
   max-width: 520px;
   margin: 0 auto 2rem;
   line-height: 1.6;
@@ -585,8 +689,8 @@
   font-weight: 600;
   font-size: 0.88rem;
   background: transparent;
-  border: 1px solid #23262B;
-  color: #EBEDF0;
+  border: 1px solid #252A30;
+  color: #EDF2F8;
   text-decoration: none;
   cursor: pointer;
   transition: all 0.2s;
@@ -594,9 +698,9 @@
 }
 
 .ctaBtnSecondary:hover {
-  border-color: #AF788B;
-  background: rgba(175, 120, 139, 0.12);
-  color: #AF788B;
+  border-color: #F59E0B;
+  background: rgba(245, 158, 11, 0.12);
+  color: #F59E0B;
   text-decoration: none;
 }
 
@@ -609,8 +713,8 @@
 }
 
 .ctaInstallItem {
-  background: #121417;
-  border: 1px solid #23262B;
+  background: #161A1F;
+  border: 1px solid #252A30;
   border-radius: 12px;
   padding: 0.65rem 1.25rem;
   display: flex;
@@ -621,17 +725,17 @@
 .ctaInstallCommand {
   font-family: 'JetBrains Mono', 'Fira Code', monospace;
   font-size: 0.82rem;
-  color: #EBEDF0;
+  color: #EDF2F8;
 }
 
 .ctaInstallCommand span {
-  color: #AF788B;
+  color: #F59E0B;
 }
 
 .ctaCopyIcon {
   background: transparent;
   border: none;
-  color: #8E95A3;
+  color: #9AA4BF;
   cursor: pointer;
   padding: 0.25rem 0.65rem;
   border-radius: 6px;
@@ -641,13 +745,13 @@
 }
 
 .ctaCopyIcon:hover {
-  background: rgba(175, 120, 139, 0.12);
-  color: #AF788B;
+  background: rgba(245, 158, 11, 0.12);
+  color: #F59E0B;
 }
 
 /* ===== Dark theme overrides ===== */
 [data-theme='dark'] .heroTitleEmphasis {
-  color: var(--primary-light);
+  color: var(--primary);
 }
 
 [data-theme='dark'] .heroTitleLight {
diff --git a/docs/src/pages/index.tsx b/docs/src/pages/index.tsx
index cae26e92..5bad087d 100644
--- a/docs/src/pages/index.tsx
+++ b/docs/src/pages/index.tsx
@@ -99,11 +99,18 @@ async def main():
     result = await engine.index(IndexContext.from_path("./report.pdf"))
     doc_id = result.doc_id
 
-    # Query
+    # Query with evidence and metrics
     result = await engine.query(
-        QueryContext("What is the total revenue?").with_doc_ids([doc_id])
+        QueryContext("What is the total revenue?")
+            .with_doc_ids([doc_id])
     )
-    print(result.single().content)
+    item = result.single()
+    print(f"Answer:  {item.content}")
+    print(f"Score:   {item.score:.2f}  Confidence: {item.confidence}")
+    for ev in item.evidence:
+        print(f"  [{ev.title}] {ev.path}")
+    print(f"LLM calls: {item.metrics.llm_calls}  "
+          f"Rounds: {item.metrics.rounds_used}")
 
 asyncio.run(main())`;
 
@@ -122,13 +129,20 @@ async fn main() -> vectorless::Result<()> {
     let result = engine.index(IndexContext::from_path("./report.pdf")).await?;
     let doc_id = result.doc_id().unwrap();
 
-    // Query
+    // Query with evidence and metrics
     let result = engine.query(
         QueryContext::new("What is the total revenue?")
             .with_doc_ids(vec![doc_id.to_string()])
     ).await?;
-    println!("{}", result.content);
-
+    let item = result.single().unwrap();
+    println!("Answer:  {}", item.content);
+    println!("Score:   {:.2}  Confidence: {:?}", item.score, item.confidence);
+    for ev in &item.evidence {
+        println!("  [{}] {}", ev.title, ev.path);
+    }
+    if let Some(m) = &item.metrics {
+        println!("LLM calls: {}  Rounds: {}", m.llm_calls, m.rounds_used);
+    }
     Ok(())
 }`;
 
@@ -200,7 +214,11 @@ function SectionGetStarted() {
               <PythonCode />
               <div className={styles.terminalOutput}>
                 <span className={styles.terminalPrompt}>$</span> python demo.py<br />
-                <span className={styles.terminalAnswer}>&rarr; The total revenue for fiscal year 2024 was $2.3 billion, a 15% increase YoY.</span>
+                <span className={styles.terminalAnswer}>Answer:&nbsp; The total revenue for fiscal year 2024 was $2.3 billion, a 15% increase YoY.<br />
+                Score:&nbsp;&nbsp; 0.91&nbsp;&nbsp; Confidence: high<br />
+                &nbsp;&nbsp;[Revenue Summary] Root/Financial Overview/Q3 2024<br />
+                &nbsp;&nbsp;[Revenue Breakdown] Root/Financial Overview/Q3 2024<br />
+                LLM calls: 4&nbsp;&nbsp; Rounds: 3</span>
                 <span className={styles.terminalCursor} />
               </div>
             </div>
@@ -220,7 +238,11 @@ function SectionGetStarted() {
               <RustCode />
               <div className={styles.terminalOutput}>
                 <span className={styles.terminalPrompt}>$</span> cargo run<br />
-                <span className={styles.terminalAnswer}>&rarr; The total revenue for fiscal year 2024 was $2.3 billion, a 15% increase YoY.</span>
+                <span className={styles.terminalAnswer}>Answer:&nbsp; The total revenue for fiscal year 2024 was $2.3 billion, a 15% increase YoY.<br />
+                Score:&nbsp;&nbsp; 0.91&nbsp;&nbsp; Confidence: High<br />
+                &nbsp;&nbsp;[Revenue Summary] Root/Financial Overview/Q3 2024<br />
+                &nbsp;&nbsp;[Revenue Breakdown] Root/Financial Overview/Q3 2024<br />
+                LLM calls: 4&nbsp;&nbsp; Rounds: 3</span>
                 <span className={styles.terminalCursor} />
               </div>
             </div>
@@ -249,8 +271,93 @@ function SectionHowItWorks() {
         <p className={styles.sectionSubtitle}>
           You declare a few lines of code. We do everything else.
         </p>
-        <div className={styles.workflowWrapper}>
-          <img src="/img/workflow.svg" alt="How Vectorless works" className={styles.workflowImg} />
+        <div className={styles.narrativeDemo}>
+          <div className={styles.narrativeHeader}>
+            <svg width="28" height="28" viewBox="0 0 24 24" fill="var(--primary)" xmlns="http://www.w3.org/2000/svg">
+              <path d="M12 2C6.48 2 2 6.48 2 12s4.48 10 10 10 10-4.48 10-10S17.52 2 12 2zm-2 15l-5-5 1.41-1.41L10 14.17l7.59-7.59L19 8l-9 9z"/>
+            </svg>
+            <h2 className={styles.narrativeTitle}>Vectorless Reasoning Flow</h2>
+            <span className={styles.narrativeBadge}>Live Reasoning</span>
+          </div>
+          <div className={styles.navTrack}>
+            {/* Step 1: Index */}
+            <div className={styles.trackStep}>
+              <div className={styles.stepBadge}>
+                <i className="fas fa-database" /> Index
+              </div>
+              <div className={styles.stepContent}>
+                3 documents indexed → hierarchical trees + NavigationIndex + ReasoningIndex built
+              </div>
+            </div>
+            {/* Step 2: Query */}
+            <div className={styles.trackStep}>
+              <div className={styles.stepBadge}>
+                <i className="fas fa-question-circle" /> Query
+              </div>
+              <div className={styles.stepContent}>
+                &ldquo;How much delta-V remains after the Day 17 thruster failure, and is it enough?&rdquo;
+              </div>
+            </div>
+            {/* Step 3: Orchestrator analyzes DocCards */}
+            <div className={styles.trackStep}>
+              <div className={styles.stepBadge}>
+                <i className="fas fa-sitemap" /> Orchestrator · Analyze
+              </div>
+              <div className={styles.stepContent}>
+                LLM understands query intent (complex, analytical) → reads DocCards → dispatches Worker to doc #1
+              </div>
+            </div>
+            {/* Step 4: Bird's-eye view */}
+            <div className={styles.trackStep}>
+              <div className={styles.stepBadge}>
+                <i className="fas fa-eye" /> Worker · Bird&rsquo;s-Eye
+              </div>
+              <div className={styles.stepContent}>
+                <code>ls</code> root → sees 4 top-level sections → LLM generates navigation plan targeting <span style={{color: 'var(--primary)'}}>Orbital Mechanics</span> + <span style={{color: 'var(--primary)'}}>Mission Anomalies</span>
+              </div>
+            </div>
+            {/* Step 5: Navigate */}
+            <div className={styles.trackStep}>
+              <div className={styles.stepBadge}>
+                <i className="fas fa-arrow-down" /> Navigate
+              </div>
+              <div className={styles.stepContent}>
+                <code>cd &quot;Orbital Mechanics&quot;</code> → <code>cd &quot;Transfer Orbit Analysis&quot;</code> → <code>cat &quot;Delta-V Budget&quot;</code> → evidence #1 collected
+              </div>
+            </div>
+            {/* Step 6: Cross-reference */}
+            <div className={styles.trackStep}>
+              <div className={`${styles.stepBadge} ${styles.stepBadgeGreen}`}>
+                <i className="fas fa-search" /> Cross-Reference
+              </div>
+              <div className={styles.stepContent}>
+                <code>find &quot;misfire&quot;</code> → hit in Mission Anomalies → <code>cd</code> + <code>cat &quot;Day 17 Thruster Misfire&quot;</code> → evidence #2 collected
+              </div>
+            </div>
+            {/* Step 7: Sufficiency check */}
+            <div className={styles.trackStep}>
+              <div className={styles.stepBadge}>
+                <i className="fas fa-clipboard-check" /> Check
+              </div>
+              <div className={styles.stepContent}>
+                <code>check</code> → LLM evaluates: both delta-V budget and anomaly impact found → SUFFICIENT → <code>done</code>
+              </div>
+            </div>
+            {/* Step 8: Rerank + Synthesize */}
+            <div className={styles.hamsterVoice}>
+              <i className="fas fa-lightbulb" style={{color: 'var(--primary)', marginRight: 8}} />
+              <strong>Rerank pipeline:</strong> dedup → BM25 scoring (score: 0.87, confidence: <span style={{color: 'var(--accent-green)'}}>high</span>) → synthesis LLM generates cross-referenced answer.
+            </div>
+            {/* Step 9: Final Answer */}
+            <div className={styles.trackStep}>
+              <div className={`${styles.stepBadge} ${styles.stepBadgeGreen}`}>
+                <i className="fas fa-check-circle" /> Result
+              </div>
+              <div className={styles.stepContent}>
+                After the B3 thruster failure, remaining reserve is <span style={{color: 'var(--primary)'}}>218 m/s</span> vs. 150 m/s requirement — sufficient to complete the mission. Sources: Delta-V Budget, Day 17 Thruster Misfire.
+              </div>
+            </div>
+          </div>
         </div>
       </div>
     </section>
diff --git a/docs/src/theme/Navbar/index.tsx b/docs/src/theme/Navbar/index.tsx
index 2437b65e..a6474455 100644
--- a/docs/src/theme/Navbar/index.tsx
+++ b/docs/src/theme/Navbar/index.tsx
@@ -1,6 +1,6 @@
 import React from 'react';
 import NavbarLayout from '@theme/Navbar/Layout';
-import {useThemeConfig} from '@docusaurus/theme-common';
+import {useThemeConfig, useColorMode} from '@docusaurus/theme-common';
 import NavbarItem from '@theme/NavbarItem';
 import NavbarMobileSidebarToggle from '@theme/Navbar/MobileSidebar/Toggle';
 import useBaseUrl from '@docusaurus/useBaseUrl';
@@ -9,6 +9,36 @@ import GitHubStar from '../../components/GitHubStar';
 import type {Props as NavbarItemConfig} from '@theme/NavbarItem';
 import styles from './styles.module.css';
 
+function ColorModeToggle(): React.ReactElement {
+  const {colorMode, setColorMode} = useColorMode();
+  const isDark = colorMode === 'dark';
+  return (
+    <button
+      className={styles.themeToggle}
+      onClick={() => setColorMode(isDark ? 'light' : 'dark')}
+      aria-label={`Switch to ${isDark ? 'light' : 'dark'} mode`}
+      title={`Switch to ${isDark ? 'light' : 'dark'} mode`}>
+      {isDark ? (
+        <svg width="16" height="16" viewBox="0 0 24 24" fill="none" stroke="currentColor" strokeWidth="2" strokeLinecap="round" strokeLinejoin="round">
+          <circle cx="12" cy="12" r="5" />
+          <line x1="12" y1="1" x2="12" y2="3" />
+          <line x1="12" y1="21" x2="12" y2="23" />
+          <line x1="4.22" y1="4.22" x2="5.64" y2="5.64" />
+          <line x1="18.36" y1="18.36" x2="19.78" y2="19.78" />
+          <line x1="1" y1="12" x2="3" y2="12" />
+          <line x1="21" y1="12" x2="23" y2="12" />
+          <line x1="4.22" y1="19.78" x2="5.64" y2="18.36" />
+          <line x1="18.36" y1="5.64" x2="19.78" y2="4.22" />
+        </svg>
+      ) : (
+        <svg width="16" height="16" viewBox="0 0 24 24" fill="none" stroke="currentColor" strokeWidth="2" strokeLinecap="round" strokeLinejoin="round">
+          <path d="M21 12.79A9 9 0 1 1 11.21 3 7 7 0 0 0 21 12.79z" />
+        </svg>
+      )}
+    </button>
+  );
+}
+
 export default function Navbar(): React.ReactElement {
   const {navbar: {items, logo, title}} = useThemeConfig();
   const leftItems = items.filter(item => item.position === 'left');
@@ -36,6 +66,7 @@ export default function Navbar(): React.ReactElement {
           <div className={styles.githubStarWrapper}>
             <GitHubStar />
           </div>
+          <ColorModeToggle />
         </div>
       </div>
     </NavbarLayout>
diff --git a/docs/src/theme/Navbar/styles.module.css b/docs/src/theme/Navbar/styles.module.css
index 9e3c803e..a61a7604 100644
--- a/docs/src/theme/Navbar/styles.module.css
+++ b/docs/src/theme/Navbar/styles.module.css
@@ -10,6 +10,7 @@
 .navbarBrand {
   display: flex;
   align-items: center;
+  gap: 10px;
   flex-shrink: 0;
   margin-right: 24rem;
 }
@@ -27,15 +28,21 @@
 }
 
 .navbarLogo {
-  height: 40px;
+  height: 30px;
   width: auto;
 }
 
 .logo {
-  font-size: 1.6rem;
-  font-weight: 800;
+  font-size: 1.3rem;
+  font-weight: 600;
+  font-family: 'Inter', 'Libre Franklin', -apple-system, BlinkMacSystemFont, sans-serif;
   letter-spacing: -0.02em;
-  color: var(--primary);
+  color: #111827;
+  line-height: 1;
+}
+
+[data-theme='dark'] .logo {
+  color: #D0D6E4;
 }
 
 /* Center: navigation links */
@@ -50,11 +57,16 @@
 .navbarCenter :global(.navbar__link) {
   font-size: 0.875rem;
   font-weight: 400;
+  color: #374151;
   padding: 0;
   text-decoration: none;
   transition: opacity 0.15s ease;
 }
 
+[data-theme='dark'] .navbarCenter :global(.navbar__link) {
+  color: #C8D0E0;
+}
+
 .navbarCenter :global(.navbar__link:hover) {
   opacity: 0.7;
   text-decoration: none;
@@ -82,6 +94,35 @@
   align-items: center;
 }
 
+/* Theme toggle */
+.themeToggle {
+  display: inline-flex;
+  align-items: center;
+  justify-content: center;
+  width: 32px;
+  height: 32px;
+  border-radius: 8px;
+  border: 1px solid var(--border);
+  background: transparent;
+  color: #374151;
+  cursor: pointer;
+  transition: all 0.15s;
+}
+
+.themeToggle:hover {
+  border-color: var(--primary);
+  color: var(--primary);
+}
+
+[data-theme='dark'] .themeToggle {
+  color: #C8D0E0;
+}
+
+[data-theme='dark'] .themeToggle:hover {
+  border-color: var(--primary);
+  color: var(--primary);
+}
+
 @media (max-width: 996px) {
   .navbarContainer {
     padding: 0 16px;
diff --git a/pyproject.toml b/pyproject.toml
index 9d83bdd6..1d8d38ea 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "maturin"
 
 [project]
 name = "vectorless"
-version = "0.1.8"
+version = "0.1.9"
 description = "Reasoning-native document intelligence engine for AI"
 readme = "README.md"
 requires-python = ">=3.9"
diff --git a/python/src/lib.rs b/python/src/lib.rs
index ebee59cf..6a7eb913 100644
--- a/python/src/lib.rs
+++ b/python/src/lib.rs
@@ -24,7 +24,8 @@ use metrics::{
     PyLlmMetricsReport, PyMetricsReport, PyPilotMetricsReport, PyRetrievalMetricsReport,
 };
 use results::{
-    PyFailedItem, PyIndexItem, PyIndexMetrics, PyIndexResult, PyQueryResult, PyQueryResultItem,
+    PyEvidenceItem, PyFailedItem, PyIndexItem, PyIndexMetrics, PyIndexResult, PyQueryMetrics,
+    PyQueryResult, PyQueryResultItem,
 };
 
 /// Vectorless - Reasoning-native document intelligence engine.
@@ -48,6 +49,8 @@ fn _vectorless(m: &Bound<'_, PyModule>) -> PyResult<()> {
     m.add_class::<PyIndexMetrics>()?;
     m.add_class::<PyQueryResult>()?;
     m.add_class::<PyQueryResultItem>()?;
+    m.add_class::<PyEvidenceItem>()?;
+    m.add_class::<PyQueryMetrics>()?;
     m.add_class::<PyFailedItem>()?;
     m.add_class::<PyDocumentInfo>()?;
     m.add_class::<PyDocumentGraphNode>()?;
diff --git a/python/src/results.rs b/python/src/results.rs
index 14735e71..ba4ea776 100644
--- a/python/src/results.rs
+++ b/python/src/results.rs
@@ -6,7 +6,105 @@
 use pyo3::prelude::*;
 
 use ::vectorless::IndexMetrics;
-use ::vectorless::{FailedItem, IndexItem, IndexResult, QueryResult, QueryResultItem};
+use ::vectorless::{
+    EvidenceItem, FailedItem, IndexItem, IndexResult, QueryMetrics, QueryResult, QueryResultItem,
+};
+
+// ============================================================
+// EvidenceItem
+// ============================================================
+
+/// A single piece of evidence with source attribution.
+#[pyclass(name = "EvidenceItem")]
+pub struct PyEvidenceItem {
+    pub(crate) inner: EvidenceItem,
+}
+
+#[pymethods]
+impl PyEvidenceItem {
+    /// Section title where this evidence was found.
+    #[getter]
+    fn title(&self) -> &str {
+        &self.inner.title
+    }
+
+    /// Navigation path (e.g., "Root/Chapter 1/Section 1.2").
+    #[getter]
+    fn path(&self) -> &str {
+        &self.inner.path
+    }
+
+    /// Raw evidence content.
+    #[getter]
+    fn content(&self) -> &str {
+        &self.inner.content
+    }
+
+    /// Source document name.
+    #[getter]
+    fn doc_name(&self) -> Option<&str> {
+        self.inner.doc_name.as_deref()
+    }
+
+    fn __repr__(&self) -> String {
+        format!(
+            "EvidenceItem(title='{}', path='{}', content_len={})",
+            self.inner.title,
+            self.inner.path,
+            self.inner.content.len()
+        )
+    }
+}
+
+// ============================================================
+// QueryMetrics
+// ============================================================
+
+/// Query execution metrics.
+#[pyclass(name = "QueryMetrics")]
+pub struct PyQueryMetrics {
+    pub(crate) inner: QueryMetrics,
+}
+
+#[pymethods]
+impl PyQueryMetrics {
+    /// Number of LLM calls made.
+    #[getter]
+    fn llm_calls(&self) -> u32 {
+        self.inner.llm_calls
+    }
+
+    /// Number of navigation rounds used.
+    #[getter]
+    fn rounds_used(&self) -> u32 {
+        self.inner.rounds_used
+    }
+
+    /// Number of distinct nodes visited.
+    #[getter]
+    fn nodes_visited(&self) -> usize {
+        self.inner.nodes_visited
+    }
+
+    /// Number of evidence items collected.
+    #[getter]
+    fn evidence_count(&self) -> usize {
+        self.inner.evidence_count
+    }
+
+    /// Total characters of collected evidence.
+    #[getter]
+    fn evidence_chars(&self) -> usize {
+        self.inner.evidence_chars
+    }
+
+    fn __repr__(&self) -> String {
+        format!(
+            "QueryMetrics(llm_calls={}, rounds={}, evidence={})",
+            self.inner.llm_calls, self.inner.rounds_used, self.inner.evidence_count
+        )
+    }
+}
 
 // ============================================================
 // QueryResultItem
@@ -26,30 +124,67 @@ impl PyQueryResultItem {
         &self.inner.doc_id
     }
 
-    /// The retrieved content.
+    /// The retrieved content (synthesized answer or raw evidence).
     #[getter]
     fn content(&self) -> &str {
         &self.inner.content
     }
 
-    /// Relevance score (0.0 to 1.0).
+    /// Confidence score (0.0 to 1.0).
     #[getter]
     fn score(&self) -> f32 {
-        self.inner.score
+        self.inner.confidence
     }
 
-    /// Node IDs that matched.
+    /// Node IDs that matched (navigation paths).
     #[getter]
     fn node_ids(&self) -> Vec<String> {
         self.inner.node_ids.clone()
     }
 
+    /// Evidence items with source attribution.
+    #[getter]
+    fn evidence(&self) -> Vec<PyEvidenceItem> {
+        self.inner
+            .evidence
+            .iter()
+            .map(|e| PyEvidenceItem {
+                inner: EvidenceItem {
+                    title: e.title.clone(),
+                    path: e.path.clone(),
+                    content: e.content.clone(),
+                    doc_name: e.doc_name.clone(),
+                },
+            })
+            .collect()
+    }
+
+    /// Execution metrics for this query.
+    #[getter]
+    fn metrics(&self) -> Option<PyQueryMetrics> {
+        self.inner.metrics.as_ref().map(|m| PyQueryMetrics {
+            inner: QueryMetrics {
+                llm_calls: m.llm_calls,
+                rounds_used: m.rounds_used,
+                nodes_visited: m.nodes_visited,
+                evidence_count: m.evidence_count,
+                evidence_chars: m.evidence_chars,
+            },
+        })
+    }
+
+    /// Confidence score (0.0 to 1.0).
+    #[getter]
+    fn confidence(&self) -> f32 {
+        self.inner.confidence
+    }
+
     fn __repr__(&self) -> String {
         format!(
-            "QueryResultItem(doc_id='{}', score={:.2}, content_len={})",
+            "QueryResultItem(doc_id='{}', confidence={:.2}, evidence={})",
             self.inner.doc_id,
-            self.inner.score,
-            self.inner.content.len()
+            self.inner.confidence,
+            self.inner.evidence.len()
         )
     }
 }
@@ -104,15 +239,31 @@ impl PyQueryResult {
         self.inner
             .items
             .iter()
-            .map(|i| PyQueryResultItem { inner: i.clone() })
+            .map(|i| PyQueryResultItem {
+                inner: QueryResultItem {
+                    doc_id: i.doc_id.clone(),
+                    node_ids: i.node_ids.clone(),
+                    content: i.content.clone(),
+                    evidence: i.evidence.clone(),
+                    metrics: i.metrics.clone(),
+                    confidence: i.confidence,
+                },
+            })
             .collect()
     }
 
     /// Get the first (single-doc) result item.
     fn single(&self) -> Option<PyQueryResultItem> {
-        self.inner
-            .single()
-            .map(|i| PyQueryResultItem { inner: i.clone() })
+        self.inner.single().map(|i| PyQueryResultItem {
+            inner: QueryResultItem {
+                doc_id: i.doc_id.clone(),
+                node_ids: i.node_ids.clone(),
+                content: i.content.clone(),
+                evidence: i.evidence.clone(),
+                metrics: i.metrics.clone(),
+                confidence: i.confidence,
+            },
+        })
     }
 
     /// Number of result items.
@@ -131,7 +282,9 @@ impl PyQueryResult {
         self.inner
             .failed
             .iter()
-            .map(|f| PyFailedItem { inner: f.clone() })
+            .map(|f| PyFailedItem {
+                inner: FailedItem::new(&f.source, &f.error),
+            })
             .collect()
     }
 
@@ -322,7 +475,9 @@ impl PyIndexResult {
         self.inner
             .failed
             .iter()
-            .map(|f| PyFailedItem { inner: f.clone() })
+            .map(|f| PyFailedItem {
+                inner: FailedItem::new(&f.source, &f.error),
+            })
             .collect()
     }
 
diff --git a/python/vectorless/cli/commands/config_cmd.py b/python/vectorless/cli/commands/config_cmd.py
index 8a119702..fbf0ac57 100644
--- a/python/vectorless/cli/commands/config_cmd.py
+++ b/python/vectorless/cli/commands/config_cmd.py
@@ -28,7 +28,7 @@ def config_cmd(
         llm.model           LLM model name
         llm.api_key         API key (or env VECTORLESS_API_KEY)
         llm.endpoint        API endpoint
-        retrieval.strategy  agent | pipeline
+        retrieval.strategy  agent
         retrieval.max_rounds  navigation budget
         index.summary       full | selective | lazy | navigation
         index.compact_mode  true | false
diff --git a/rust/examples/deep_retrieval.rs b/rust/examples/deep_retrieval.rs
new file mode 100644
index 00000000..44877543
--- /dev/null
+++ b/rust/examples/deep_retrieval.rs
@@ -0,0 +1,221 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! Complex retrieval example — forces SubAgent navigation, not fast path.
+//!
+//! This example indexes a document where the answer to a tricky question
+//! is NOT directly accessible via keyword lookup in the ReasoningIndex.
+//! The SubAgent must navigate through multiple levels, collect evidence
+//! from different sections, and synthesize a cross-referenced answer.
+//!
+//! # Usage
+//!
+//! ```bash
+//! LLM_API_KEY=sk-xxx LLM_MODEL=gpt-4o \
+//!   LLM_ENDPOINT=https://api.openai.com/v1 cargo run --example deep_retrieval
+//! ```
+
+use vectorless::{EngineBuilder, IndexContext, IndexOptions, QueryContext};
+
+/// A compact but deeply nested document about a fictional space mission.
+///
+/// Structure (4 levels deep):
+///
+/// Mission Atlas Report
+/// ├── Launch Operations
+/// │   ├── Vehicle Configuration
+/// │   │   ├── Stage 1 Parameters
+/// │   │   └── Stage 2 Parameters
+/// │   └── Countdown Timeline
+/// │       ├── T-48h to T-12h
+/// │       └── T-12h to T-0
+/// ├── Orbital Mechanics
+/// │   ├── Transfer Orbit Analysis
+/// │   │   ├── Delta-V Budget
+/// │   │   └── Gravity Assist Profile
+/// │   └── Station-Keeping Schedule
+/// ├── Payload Operations
+/// │   ├── Satellite Alpha Deployment
+/// │   │   ├── Separation Sequence
+/// │   │   └── Solar Panel Extension
+/// │   ├── Satellite Beta Deployment
+/// │   │   ├── Antenna Calibration
+/// │   │   └── Frequency Assignment
+/// │   └── Re-entry Capsule
+/// │       ├── Heat Shield Specs
+/// │       └── Landing Zone Selection
+/// └── Mission Anomalies
+///     ├── Day 3 Communication Blackout
+///     └── Day 17 Thruster Misfire
+const MISSION_REPORT: &str = r#"
+# Mission Atlas Report
+
+## Launch Operations
+
+### Vehicle Configuration
+
+#### Stage 1 Parameters
+
+The first stage utilizes a LOX/RP-1 bipropellant configuration with a sea-level thrust of 7,600 kN. Burn time is 162 seconds with a specific impulse of 282 seconds. The propellant mass fraction is 0.894. Stage separation occurs at T+162s at an altitude of approximately 68 km with a velocity of 2,340 m/s.
+
+#### Stage 2 Parameters
+
+The second stage employs a single RL-10C engine using LOX/LH2 with a vacuum thrust of 110 kN. Burn duration extends to 370 seconds with a specific impulse of 448 seconds. The stage carries 20,800 kg of propellant. Engine ignition occurs at T+165s following a 3-second coast phase after stage separation.
+
+### Countdown Timeline
+
+#### T-48h to T-12h
+
+During the early countdown phase, the launch team completed propellant loading verification and navigation system alignment. A minor issue was detected in the Stage 2 fuel temperature sensor at T-36h, which was resolved by recalibrating the sensor threshold from 20.1K to 19.8K. Weather briefing at T-24h indicated 85% probability of favorable conditions with upper-level winds at 45 knots.
+
+#### T-12h to T-0
+
+Final countdown proceeded nominally. Auxiliary power unit start occurred at T-4h. Range safety checks completed at T-2h. Go/No-Go poll at T-30 minutes was unanimous across all stations. Terminal count at T-9 minutes was initiated with no holds. Liftoff occurred at 14:37:22 UTC on March 15, achieving the targeted azimuth of 72.3 degrees.
+
+## Orbital Mechanics
+
+### Transfer Orbit Analysis
+
+#### Delta-V Budget
+
+The total mission delta-V budget is 4,832 m/s, allocated as follows: ascent to parking orbit 1,890 m/s, trans-target injection 2,210 m/s, orbit insertion 510 m/s, and station-keeping reserve 222 m/s. The parking orbit was achieved at 185 km circular with an inclination of 28.5 degrees. The gravity assist maneuver at Titan contributed an effective delta-V savings of 380 m/s, which allowed the mission to carry 15% more payload than the original baseline design.
+
+#### Gravity Assist Profile
+
+The Titan flyby occurred on Day 47 at a closest approach distance of 950 km. The bending angle was 38.7 degrees with an asymptotic velocity of 4.2 km/s relative to Titan. This maneuver shifted the spacecraft trajectory from a Hohmann-type direct transfer to a gravity-assisted trajectory, reducing total flight time from 187 days to 143 days. Post-flyby trajectory correction burn of 3.4 m/s was executed on Day 49 to refine the approach corridor.
+
+### Station-Keeping Schedule
+
+Station-keeping maneuvers are planned at 14-day intervals with a delta-V allocation of 2.8 m/s per maneuver. The first three maneuvers consumed 2.6, 3.1, and 2.5 m/s respectively, staying within the allocated budget. Orbital decay rate without correction is approximately 0.3 km per 14-day cycle due to atmospheric drag at the operational altitude of 420 km.
+
+## Payload Operations
+
+### Satellite Alpha Deployment
+
+#### Separation Sequence
+
+Satellite Alpha separated from the payload adapter at T+3h42m using a Marman band release mechanism. Separation velocity was 0.45 m/s with a tip-off rate of 0.02 deg/s. Initial telemetry confirmed solar panel deployment signal at T+3h58m. First ground station contact occurred over Svalbard at T+4h12m confirming nominal spacecraft health.
+
+#### Solar Panel Extension
+
+Both solar arrays deployed fully within 8 minutes of the deployment command. Array 1 generated 4,280 W and Array 2 generated 4,310 W, for a combined initial output of 8,590 W against a design target of 8,400 W. The arrays use triple-junction GaAs cells with a beginning-of-life efficiency of 30.7%. Power margin at end-of-life (7 years) is projected at 6,950 W, still above the minimum operational requirement of 6,200 W.
+
+### Satellite Beta Deployment
+
+#### Antenna Calibration
+
+Satellite Beta's high-gain antenna completed calibration in three phases. Phase 1 (boresight alignment) achieved a pointing accuracy of 0.023 degrees against a requirement of 0.05 degrees. Phase 2 (pattern verification) confirmed the sidelobe levels were within specification at -28 dB below main beam. Phase 3 (EIRP verification) measured 52.4 dBW against a required minimum of 51.0 dBW.
+
+#### Frequency Assignment
+
+Satellite Beta operates in Ka-band with a downlink center frequency of 20.185 GHz and an uplink at 30.050 GHz. The allocated bandwidth is 500 MHz per polarization, supporting 24 transponders with 36 MHz spacing. Cross-polarization isolation exceeds 30 dB. The link budget supports a minimum data rate of 1.2 Gbps under rain fade conditions corresponding to 99.7% availability in the primary coverage zone.
+
+### Re-entry Capsule
+
+#### Heat Shield Specs
+
+The re-entry capsule thermal protection system uses a phenolic-impregnated carbon ablator (PICA-X) with a thickness of 33 mm on the forebody. Maximum predicted heat flux is 185 W/cm² at the stagnation point during re-entry at 11.2 km/s. The heat shield mass is 86 kg, representing 12% of the total capsule dry mass of 717 kg. The backshell uses a lighter SLA-561V material with a 15 mm thickness rated for 45 W/cm².
+
+#### Landing Zone Selection
+
+The primary landing zone is located at 34.2°N 108.7°W in the White Sands Proving Ground, with an elliptical footprint of 15 km × 8 km at the 3-sigma confidence level. Wind drift analysis based on 10 years of upper-atmosphere data predicts a mean offset of 3.2 km northeast. The backup landing zone is at 32.5°N 106.5°W near Fort Bliss, activated only if the primary zone weather violates the surface wind constraint of 12 m/s.
+
+## Mission Anomalies
+
+### Day 3 Communication Blackout
+
+At approximately 07:14 UTC on Day 3, the primary S-band transponder experienced an unexpected carrier loss lasting 4 hours and 22 minutes. Root cause analysis identified a single-event upset (SEU) in the command decoder ASIC, caused by a high-energy proton from the inner Van Allen belt. The transponder recovered autonomously after a watchdog timer reset. No command sequences were lost as the onboard computer continued executing the stored timeline. Redundant transponder was not activated because the primary recovery occurred before the 6-hour switchover threshold.
+
+### Day 17 Thruster Misfire
+
+At 14:52 UTC on Day 17, thruster cluster B3 (one of eight attitude control clusters) fired for 2.3 seconds during a period when no thruster activity was commanded. This produced an unplanned delta-V of 0.08 m/s and an attitude perturbation of 0.3 degrees. Telemetry analysis revealed a stuck valve in the B3 propellant control valve assembly, likely caused by particulate contamination during ground processing. The flight software detected the anomaly within 500 ms and inhibited the B3 cluster. Subsequent attitude corrections were performed using the remaining seven clusters. The propellant impact of the lost cluster reduces the available delta-V for the mission by approximately 4 m/s, leaving a remaining reserve of 218 m/s against a requirement of 150 m/s.
+"#;
+
+/// Questions designed to force deep navigation:
+///
+/// 1. "How much delta-V budget remains after the Day 17 thruster failure,
+///     and is it enough to complete the mission?"
+///     → Requires finding delta-V budget (Orbital Mechanics > Transfer > Delta-V Budget)
+///     AND the anomaly impact (Mission Anomalies > Day 17 Thruster Misfire)
+///     AND cross-referencing reserve vs requirement.
+///
+/// 2. "What is the total power generation margin at end-of-life for Satellite Alpha
+///     compared to its minimum operational requirement?"
+///     → Requires finding EOL power (Payload > Alpha > Solar Panel Extension)
+///     and computing the difference.
+///
+/// 3. "If the B3 thruster cluster had failed during the Day 3 blackout instead of
+///     Day 17, would the spacecraft have been able to recover attitude without
+///     ground intervention?"
+///     → Requires combining anomaly timelines and thruster redundancy info.
+const QUERIES: &[&str] = &["where can i find the backup landing zone"];
+
+#[tokio::main]
+async fn main() -> vectorless::Result<()> {
+    tracing_subscriber::fmt::init();
+
+    println!("=== Deep Retrieval Example ===\n");
+
+    let api_key = std::env::var("LLM_API_KEY").unwrap_or_else(|_| "sk-...".to_string());
+    let model = std::env::var("LLM_MODEL").unwrap_or_else(|_| "gpt-4o".to_string());
+    let endpoint = std::env::var("LLM_ENDPOINT").unwrap_or_else(|_| "https://api".to_string());
+
+    // Build engine
+    let engine = EngineBuilder::new()
+        .with_key(&api_key)
+        .with_model(&model)
+        .with_endpoint(&endpoint)
+        .build()
+        .await
+        .map_err(|e| vectorless::Error::Config(e.to_string()))?;
+
+    // Index document
+    let temp_dir = tempfile::tempdir()?;
+    let md_path = temp_dir.path().join("mission_atlas.md");
+    tokio::fs::write(&md_path, MISSION_REPORT).await?;
+
+    let index_result = engine
+        .index(IndexContext::from_path(&md_path).with_options(IndexOptions::new().with_summaries()))
+        .await?;
+    let doc_id = index_result.doc_id().unwrap().to_string();
+    println!("Indexed document: {}\n", doc_id);
+
+    // Query
+    for query in QUERIES {
+        println!("Q: \"{}\"", query);
+
+        match engine
+            .query(
+                QueryContext::new(*query)
+                    .with_doc_ids(vec![doc_id.clone()])
+                    .with_force_analysis(true),
+            )
+            .await
+        {
+            Ok(result) => {
+                if let Some(item) = result.single() {
+                    if item.content.is_empty() {
+                        println!("   No relevant content found");
+                    } else {
+                        println!("   A:");
+                        for line in item.content.lines().take(10) {
+                            println!("     {}", line);
+                        }
+                        if item.content.lines().count() > 10 {
+                            println!(
+                                "     ... ({} more lines)",
+                                item.content.lines().count() - 10
+                            );
+                        }
+                    }
+                }
+            }
+            Err(e) => println!("   Error: {}", e),
+        }
+        println!();
+    }
+
+    // Cleanup
+    engine.remove(&doc_id).await?;
+    Ok(())
+}
diff --git a/rust/examples/flow.rs b/rust/examples/flow.rs
index fbc89423..759c0b2e 100644
--- a/rust/examples/flow.rs
+++ b/rust/examples/flow.rs
@@ -119,12 +119,7 @@ async fn main() -> vectorless::Result<()> {
                         println!("    - No relevant content found");
                     } else {
                         println!("    - Found relevant content:");
-                        let preview = if item.content.len() > 200 {
-                            format!("{}...", &item.content)
-                        } else {
-                            item.content.clone()
-                        };
-                        for line in preview.lines().take(5) {
+                        for line in item.content.lines() {
                             println!("      {}", line);
                         }
                     }
@@ -140,9 +135,9 @@ async fn main() -> vectorless::Result<()> {
     }
 
     // Cleanup
-    for doc in engine.list().await? {
-        engine.remove(&doc.id).await?;
-    }
+    // for doc in engine.list().await? {
+    //     engine.remove(&doc.id).await?;
+    // }
 
     Ok(())
 }
diff --git a/rust/examples/index_directory.rs b/rust/examples/index_directory.rs
index 29f6cb08..2696df99 100644
--- a/rust/examples/index_directory.rs
+++ b/rust/examples/index_directory.rs
@@ -85,7 +85,7 @@ async fn main() -> vectorless::Result<()> {
     let answer = engine.query(vectorless::QueryContext::new(query)).await?;
 
     for item in &answer.items {
-        println!("  [{} score={:.2}]", item.doc_id, item.score);
+        println!("  [{} confidence={:.2}]", item.doc_id, item.confidence);
         let preview: String = item.content.chars().take(200).collect();
         println!("  {preview}");
         if item.content.len() > 200 {
diff --git a/rust/examples/indexing_flow.rs b/rust/examples/indexing_flow.rs
new file mode 100644
index 00000000..03eb3a87
--- /dev/null
+++ b/rust/examples/indexing_flow.rs
@@ -0,0 +1,173 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! Indexing pipeline flow example — demonstrates the full indexing pipeline
+//! with detailed metrics breakdown.
+//!
+//! This example walks through:
+//! 1. Creating a Vectorless engine
+//! 2. Indexing a Markdown document from content
+//! 3. Inspecting per-stage timing metrics
+//!
+//! Set `RUST_LOG=info` to see pipeline stage logs, or `RUST_LOG=debug` for
+//! detailed internal progress.
+//!
+//! # Usage
+//!
+//! ```bash
+//! # Using environment variables for LLM config:
+//! LLM_API_KEY=sk-xxx LLM_MODEL=google/gemini-3-flash-preview \
+//!   LLM_ENDPOINT=http://localhost:4000/api/v1 cargo run --example indexing_flow
+//!
+//! # Or with defaults (edit the code to set your key/endpoint):
+//! cargo run --example indexing_flow
+//! ```
+
+use vectorless::{DocumentFormat, EngineBuilder, IndexContext};
+
+/// Sample document with multi-level headings to exercise tree construction
+/// and navigation index building.
+const SAMPLE_MARKDOWN: &str = r#"
+# Payment Platform Technical Guide
+
+## Overview
+
+This guide covers the architecture and implementation details of the payment processing platform. The system handles credit card payments, bank transfers, and digital wallets across multiple currencies and regions. It is designed for high availability with 99.99% uptime SLA and supports peak throughput of 10,000 transactions per second.
+
+## Architecture
+
+The platform uses a microservices architecture with event-driven communication between services. Each service owns its data store and communicates through a message broker for eventual consistency. The system is deployed on Kubernetes with automatic horizontal scaling based on request queue depth.
+
+### Ingestion Gateway
+
+The ingestion gateway is the entry point for all payment requests. It handles request validation, authentication, idempotency checks, and routing to the appropriate payment processor. The gateway implements circuit breaker patterns to gracefully degrade when downstream processors experience issues.
+
+### Payment Processing Engine
+
+The payment processing engine orchestrates the lifecycle of each payment transaction. It manages state transitions from initiation through authorization, capture, settlement, and reconciliation. The engine supports both synchronous and asynchronous payment flows, depending on the payment method and processor requirements.
+
+### Settlement Service
+
+The settlement service handles batch settlement with acquiring banks and payment networks. It runs on a configurable schedule (typically end-of-day for each banking region) and groups authorized transactions into settlement batches. The service handles currency conversion, fee calculation, and split payments for marketplace scenarios.
+
+## Security
+
+All payment data is encrypted at rest using AES-256 and in transit using TLS 1.3. Cardholder data is tokenized immediately upon receipt and stored in a PCI DSS Level 1 compliant vault. The platform undergoes annual PCI DSS audits and quarterly network vulnerability scans.
+
+### Fraud Detection
+
+Real-time fraud detection uses a rules engine combined with a machine learning model that scores each transaction based on velocity checks, geolocation anomalies, device fingerprinting, and behavioral patterns. Transactions exceeding configurable risk thresholds are automatically held for manual review.
+
+### Compliance
+
+The platform complies with PCI DSS, SOC 2 Type II, GDPR, and regional payment regulations including PSD2 (Europe) and local data residency requirements. Audit logs are retained for 7 years and accessible through a dedicated compliance API.
+
+## Monitoring and Operations
+
+Real-time dashboards track transaction volumes, success rates, latency percentiles, and error rates across all payment methods and processors. Automated alerting triggers on-call rotations when key metrics deviate from baseline thresholds.
+"#;
+
+#[tokio::main]
+async fn main() -> vectorless::Result<()> {
+    tracing_subscriber::fmt::init();
+
+    println!("=== Indexing Pipeline Flow Example ===\n");
+
+    // Build engine with LLM configuration from environment or defaults.
+    let api_key = std::env::var("LLM_API_KEY").unwrap_or_else(|_| "sk-...".to_string());
+    let model =
+        std::env::var("LLM_MODEL").unwrap_or_else(|_| "google/gemini-3-flash-preview".to_string());
+    let endpoint = std::env::var("LLM_ENDPOINT")
+        .unwrap_or_else(|_| "http://localhost:4000/api/v1".to_string());
+
+    // Step 1: Create engine
+    println!("Step 1: Creating engine...");
+    let engine = EngineBuilder::new()
+        .with_key(&api_key)
+        .with_model(&model)
+        .with_endpoint(&endpoint)
+        .build()
+        .await
+        .map_err(|e| vectorless::Error::Config(e.to_string()))?;
+    println!("  Done.\n");
+
+    // Step 2: Index from content
+    println!("Step 2: Indexing document from content...\n");
+    let result = engine
+        .index(IndexContext::from_content(
+            SAMPLE_MARKDOWN,
+            DocumentFormat::Markdown,
+        ))
+        .await?;
+
+    println!("  Indexed {} document(s)\n", result.items.len());
+
+    // Step 3: Inspect indexing results and metrics
+    for item in &result.items {
+        println!("--- Document Info ---");
+        println!("  doc_id:    {}", item.doc_id);
+        println!("  name:      {}", item.name);
+        println!("  format:    {:?}", item.format);
+
+        if let Some(desc) = &item.description {
+            println!("  summary:   {}...", &desc[..desc.len().min(120)]);
+        }
+
+        if let Some(ref metrics) = item.metrics {
+            println!("\n--- Pipeline Stage Metrics ---");
+            println!("  Stage                Time (ms)");
+            println!("  ─────────────────────────────");
+            println!("  Parse              {:>8}", metrics.parse_time_ms);
+            println!("  Build              {:>8}", metrics.build_time_ms);
+            println!("  Validate           {:>8}", metrics.validate_time_ms);
+            println!("  Split              {:>8}", metrics.split_time_ms);
+            println!("  Enhance            {:>8}", metrics.enhance_time_ms);
+            println!("  Enrich             {:>8}", metrics.enrich_time_ms);
+            println!(
+                "  Reasoning Index    {:>8}",
+                metrics.reasoning_index_time_ms
+            );
+            println!(
+                "  Navigation Index   {:>8}",
+                metrics.navigation_index_time_ms
+            );
+            println!("  Optimize           {:>8}", metrics.optimize_time_ms);
+            println!("  ─────────────────────────────");
+            println!("  Total              {:>8}", metrics.total_time_ms());
+
+            println!("\n--- Index Output ---");
+            println!("  Nodes processed:       {}", metrics.nodes_processed);
+            println!("  Summaries generated:   {}", metrics.summaries_generated);
+            println!("  Summaries failed:      {}", metrics.summaries_failed);
+            println!("  LLM calls:             {}", metrics.llm_calls);
+            println!(
+                "  Tokens generated:      {}",
+                metrics.total_tokens_generated
+            );
+
+            println!("\n--- Navigation Index ---");
+            println!("  Nav entries:           {}", metrics.nav_entries_indexed);
+            println!("  Child routes:          {}", metrics.child_routes_indexed);
+
+            println!("\n--- Reasoning Index ---");
+            println!("  Topics indexed:        {}", metrics.topics_indexed);
+            println!("  Keywords indexed:      {}", metrics.keywords_indexed);
+
+            println!("\n--- Tree Optimization ---");
+            println!("  Nodes skipped:         {}", metrics.nodes_skipped);
+            println!("  Nodes merged:          {}", metrics.nodes_merged);
+        }
+
+        println!();
+    }
+
+    // Step 4: Cleanup
+    println!("Step 3: Cleaning up...");
+    for doc in engine.list().await? {
+        engine.remove(&doc.id).await?;
+        println!("  Removed: {} ({})", doc.name, doc.id);
+    }
+
+    println!("\n=== Done ===");
+    Ok(())
+}
diff --git a/rust/examples/query.rs b/rust/examples/query.rs
new file mode 100644
index 00000000..8914081d
--- /dev/null
+++ b/rust/examples/query.rs
@@ -0,0 +1,83 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! Query-only example — query an already-indexed document.
+//!
+//! Assumes the workspace already contains indexed documents
+//! (e.g. from `cargo run --example flow` or `index_single`).
+//!
+//! # Usage
+//!
+//! ```bash
+//! LLM_API_KEY=sk-xxx LLM_MODEL=gpt-4o \
+//!   LLM_ENDPOINT=https://api.openai.com/v1 cargo run --example query
+//! ```
+
+use vectorless::{EngineBuilder, QueryContext};
+
+#[tokio::main]
+async fn main() -> vectorless::Result<()> {
+    tracing_subscriber::fmt::init();
+
+    let api_key = std::env::var("LLM_API_KEY").unwrap_or_else(|_| "sk-...".to_string());
+    let model = std::env::var("LLM_MODEL").unwrap_or_else(|_| "gpt-4o".to_string());
+    let endpoint = std::env::var("LLM_ENDPOINT").unwrap_or_else(|_| "https://api".to_string());
+
+    let engine = EngineBuilder::new()
+        .with_key(&api_key)
+        .with_model(&model)
+        .with_endpoint(&endpoint)
+        .build()
+        .await
+        .map_err(|e| vectorless::Error::Config(e.to_string()))?;
+
+    // List available documents
+    let docs = engine.list().await?;
+    if docs.is_empty() {
+        println!("No indexed documents found. Run an indexing example first.");
+        return Ok(());
+    }
+
+    println!("Available documents:");
+    for doc in &docs {
+        println!("  - {} ({})", doc.name, doc.id);
+    }
+    println!();
+
+    // Query a specific document
+    let doc_id = docs[0].id.clone();
+    let queries = vec![
+        "What is the system architecture?",
+        "How does the storage layer work?",
+    ];
+
+    for query in queries {
+        println!("Query: \"{}\"", query);
+
+        match engine
+            .query(QueryContext::new(query).with_doc_ids(vec![doc_id.clone()]))
+            .await
+        {
+            Ok(result) => {
+                if let Some(item) = result.single() {
+                    if item.content.is_empty() {
+                        println!("  No relevant content found");
+                    } else {
+                        println!("  Found:");
+                        for line in item.content.lines() {
+                            println!("    {}", line);
+                        }
+                    }
+                } else {
+                    println!("  No results");
+                }
+            }
+            Err(e) => {
+                println!("  Error: {}", e);
+            }
+        }
+        println!();
+    }
+
+    Ok(())
+}
diff --git a/rust/src/agent/command.rs b/rust/src/agent/command.rs
new file mode 100644
index 00000000..6d983dc5
--- /dev/null
+++ b/rust/src/agent/command.rs
@@ -0,0 +1,538 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! Command parsing for the agent navigation loop.
+//!
+//! LLM output is parsed into `Command` variants. The parser is intentionally
+//! simple and forgiving — unknown input falls back to `Ls` so the agent can
+//! re-observe its surroundings.
+
+use crate::document::{NavigationIndex, NodeId};
+
+/// Parsed command from LLM output.
+#[derive(Debug, Clone, PartialEq)]
+pub enum Command {
+    /// List children of the current node.
+    Ls,
+    /// Navigate into a child node by name.
+    Cd { target: String },
+    /// Navigate back to parent.
+    CdUp,
+    /// Read node content (collects as evidence).
+    Cat { target: String },
+    /// Search for a keyword in the ReasoningIndex.
+    Find { keyword: String },
+    /// Regex search across node content in the current subtree.
+    Grep { pattern: String },
+    /// Preview first N lines of a node without collecting evidence.
+    Head { target: String, lines: usize },
+    /// Search for nodes by title pattern in the tree.
+    FindTree { pattern: String },
+    /// Show node content size (lines, chars).
+    Wc { target: String },
+    /// Show current navigation path.
+    Pwd,
+    /// Evaluate evidence sufficiency.
+    Check,
+    /// End navigation.
+    Done,
+}
+
+/// Strip surrounding quotes from a target string.
+///
+/// Handles straight quotes (`"`, `'`) and Unicode smart quotes (U+201C/U+201D, U+2018/U+2019).
+fn strip_quotes(s: &str) -> String {
+    let trimmed = s.trim();
+    let chars: Vec<char> = trimmed.chars().collect();
+    if chars.len() < 2 {
+        return trimmed.to_string();
+    }
+    let (first, last) = (chars[0], chars[chars.len() - 1]);
+    let matching = (first == '"' && last == '"')
+        || (first == '\'' && last == '\'')
+        || (first == '\u{201c}' && last == '\u{201d}')
+        || (first == '\u{2018}' && last == '\u{2019}');
+    if matching {
+        trimmed[chars[0].len_utf8()..trimmed.len() - chars[chars.len() - 1].len_utf8()].to_string()
+    } else {
+        trimmed.to_string()
+    }
+}
+
+/// Parse the first non-empty line of LLM output into a Command.
+pub fn parse_command(llm_output: &str) -> Command {
+    let line = llm_output
+        .lines()
+        .find(|l| !l.trim().is_empty())
+        .unwrap_or("")
+        .trim();
+
+    // Remove common wrapping (markdown code blocks, etc.)
+    let line = line.trim_start_matches('`').trim_end_matches('`').trim();
+
+    let parts: Vec<&str> = line.split_whitespace().collect();
+
+    match parts.as_slice() {
+        ["ls"] => Command::Ls,
+        ["cat"] => Command::Cat {
+            target: ".".to_string(),
+        },
+        ["cd", ".."] => Command::CdUp,
+        ["cd", target] => Command::Cd {
+            target: strip_quotes(target),
+        },
+        ["cd", _target, ..] => Command::Cd {
+            // Handle "cd some name" by joining remaining parts
+            target: strip_quotes(&parts[1..].join(" ")),
+        },
+        ["cat", target] => Command::Cat {
+            target: strip_quotes(target),
+        },
+        ["cat", _target, ..] => Command::Cat {
+            target: strip_quotes(&parts[1..].join(" ")),
+        },
+        ["find", keyword] => Command::Find {
+            keyword: strip_quotes(keyword),
+        },
+        ["find", _keyword, ..] => Command::Find {
+            keyword: strip_quotes(&parts[1..].join(" ")),
+        },
+        ["grep", pattern] => Command::Grep {
+            pattern: strip_quotes(pattern),
+        },
+        ["grep", _pattern, ..] => Command::Grep {
+            pattern: strip_quotes(&parts[1..].join(" ")),
+        },
+        ["head", target] => Command::Head {
+            target: strip_quotes(target),
+            lines: 20, // default
+        },
+        ["head", "-n", n, target @ ..] => Command::Head {
+            target: strip_quotes(&target.join(" ")),
+            lines: n.parse().unwrap_or(20),
+        },
+        ["head", _target, ..] => Command::Head {
+            target: strip_quotes(&parts[1..].join(" ")),
+            lines: 20,
+        },
+        ["findtree", pattern] => Command::FindTree {
+            pattern: strip_quotes(pattern),
+        },
+        ["findtree", _pattern, ..] => Command::FindTree {
+            pattern: strip_quotes(&parts[1..].join(" ")),
+        },
+        ["wc", target] => Command::Wc {
+            target: strip_quotes(target),
+        },
+        ["wc", _target, ..] => Command::Wc {
+            target: strip_quotes(&parts[1..].join(" ")),
+        },
+        ["pwd"] => Command::Pwd,
+        ["check"] => Command::Check,
+        ["done"] => Command::Done,
+        _ => Command::Ls, // fallback: re-observe
+    }
+}
+
+/// Resolve a cd/cat target string to a NodeId using multi-level matching.
+///
+/// Matching priority:
+/// 1. Exact title match
+/// 2. Case-insensitive title match
+/// 3. Substring (contains) match
+/// 4. Numeric index match ("1" → first child, "2" → second, etc.)
+pub fn resolve_target(
+    target: &str,
+    nav_index: &NavigationIndex,
+    current_node: NodeId,
+) -> Option<NodeId> {
+    let target = strip_quotes(target);
+    let routes = nav_index.get_child_routes(current_node)?;
+
+    // 1. Exact match
+    if let Some(r) = routes.iter().find(|r| r.title == target) {
+        return Some(r.node_id);
+    }
+
+    // 2. Case-insensitive match
+    let target_lower = target.to_lowercase();
+    if let Some(r) = routes
+        .iter()
+        .find(|r| r.title.to_lowercase() == target_lower)
+    {
+        return Some(r.node_id);
+    }
+
+    // 3. Substring (contains) match
+    if let Some(r) = routes
+        .iter()
+        .find(|r| r.title.to_lowercase().contains(&target_lower))
+    {
+        return Some(r.node_id);
+    }
+
+    // 4. Numeric index match ("1" → first child)
+    if let Ok(idx) = target.parse::<usize>() {
+        if idx > 0 && idx <= routes.len() {
+            return Some(routes[idx - 1].node_id);
+        }
+    }
+
+    None
+}
+
+/// Resolve a cd/cat target with additional context from the tree node titles.
+///
+/// This extended resolver also checks against the actual tree node titles
+/// (in case NavEntry titles differ from TreeNode titles).
+pub fn resolve_target_extended(
+    target: &str,
+    nav_index: &NavigationIndex,
+    current_node: NodeId,
+    tree: &crate::document::DocumentTree,
+) -> Option<NodeId> {
+    let target = strip_quotes(target);
+    // Try the primary resolver first
+    if let Some(id) = resolve_target(&target, nav_index, current_node) {
+        return Some(id);
+    }
+
+    // Extended: check all children by their TreeNode titles
+    let children: Vec<NodeId> = tree.children_iter(current_node).collect();
+    let target_lower = target.to_lowercase();
+
+    for child_id in &children {
+        if let Some(node) = tree.get(*child_id) {
+            if node.title.to_lowercase().contains(&target_lower) {
+                return Some(*child_id);
+            }
+        }
+    }
+
+    None
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn test_parse_ls() {
+        assert_eq!(parse_command("ls"), Command::Ls);
+        assert_eq!(parse_command("  ls  "), Command::Ls);
+    }
+
+    #[test]
+    fn test_parse_cd() {
+        assert_eq!(parse_command("cd .."), Command::CdUp);
+        assert_eq!(
+            parse_command("cd Getting Started"),
+            Command::Cd {
+                target: "Getting Started".to_string()
+            }
+        );
+        assert_eq!(
+            parse_command("cd some long name"),
+            Command::Cd {
+                target: "some long name".to_string()
+            }
+        );
+        // Quoted multi-word targets should have quotes stripped
+        assert_eq!(
+            parse_command("cd \"Vectorless Architecture Guide\""),
+            Command::Cd {
+                target: "Vectorless Architecture Guide".to_string()
+            }
+        );
+        assert_eq!(
+            parse_command("cd 'Vectorless Architecture Guide'"),
+            Command::Cd {
+                target: "Vectorless Architecture Guide".to_string()
+            }
+        );
+        // Smart quotes
+        assert_eq!(
+            parse_command("\u{201c}Vectorless Architecture Guide\u{201d}"),
+            Command::Ls // doesn't start with a command keyword
+        );
+    }
+
+    #[test]
+    fn test_strip_quotes_straight() {
+        assert_eq!(strip_quotes("\"hello\""), "hello");
+        assert_eq!(strip_quotes("'hello'"), "hello");
+        assert_eq!(strip_quotes("hello"), "hello");
+        assert_eq!(strip_quotes("\"only left"), "\"only left");
+    }
+
+    #[test]
+    fn test_strip_quotes_smart() {
+        assert_eq!(strip_quotes("\u{201c}hello\u{201d}"), "hello");
+        assert_eq!(strip_quotes("\u{2018}hello\u{2019}"), "hello");
+    }
+
+    #[test]
+    fn test_resolve_target_quoted() {
+        use crate::document::{ChildRoute, DocumentTree};
+
+        let mut tree = DocumentTree::new("Root", "");
+        let root = tree.root();
+        let c1 = tree.add_child(root, "Vectorless Architecture Guide", "content");
+
+        let mut nav_index = NavigationIndex::new();
+        nav_index.add_child_routes(
+            root,
+            vec![ChildRoute {
+                node_id: c1,
+                title: "Vectorless Architecture Guide".to_string(),
+                description: "Main guide".to_string(),
+                leaf_count: 5,
+            }],
+        );
+
+        // Quoted target should still resolve
+        assert_eq!(
+            resolve_target("\"Vectorless Architecture Guide\"", &nav_index, root),
+            Some(c1)
+        );
+        assert_eq!(
+            resolve_target("'Vectorless Architecture Guide'", &nav_index, root),
+            Some(c1)
+        );
+    }
+
+    #[test]
+    fn test_parse_cat() {
+        assert_eq!(
+            parse_command("cat Installation"),
+            Command::Cat {
+                target: "Installation".to_string()
+            }
+        );
+        assert_eq!(
+            parse_command("cat API Reference"),
+            Command::Cat {
+                target: "API Reference".to_string()
+            }
+        );
+    }
+
+    #[test]
+    fn test_parse_find() {
+        assert_eq!(
+            parse_command("find authentication"),
+            Command::Find {
+                keyword: "authentication".to_string()
+            }
+        );
+    }
+
+    #[test]
+    fn test_parse_misc() {
+        assert_eq!(parse_command("pwd"), Command::Pwd);
+        assert_eq!(parse_command("check"), Command::Check);
+        assert_eq!(parse_command("done"), Command::Done);
+    }
+
+    #[test]
+    fn test_parse_fallback() {
+        assert_eq!(parse_command(""), Command::Ls);
+        assert_eq!(parse_command("unknown command"), Command::Ls);
+        assert_eq!(parse_command("blah blah"), Command::Ls);
+    }
+
+    #[test]
+    fn test_parse_with_wrapping() {
+        assert_eq!(parse_command("`ls`"), Command::Ls);
+        assert_eq!(parse_command("```ls```"), Command::Ls);
+    }
+
+    #[test]
+    fn test_parse_multiline() {
+        // Should parse the first non-empty line
+        assert_eq!(parse_command("\n\nls\n\n// listing children"), Command::Ls);
+    }
+
+    #[test]
+    fn test_resolve_target_numeric() {
+        use crate::document::{ChildRoute, DocumentTree};
+
+        let mut tree = DocumentTree::new("Root", "");
+        let root = tree.root();
+        let c1 = tree.add_child(root, "Getting Started", "content");
+        let c2 = tree.add_child(root, "API Reference", "content");
+
+        let mut nav_index = NavigationIndex::new();
+        nav_index.add_child_routes(
+            root,
+            vec![
+                ChildRoute {
+                    node_id: c1,
+                    title: "Getting Started".to_string(),
+                    description: "Setup guide".to_string(),
+                    leaf_count: 3,
+                },
+                ChildRoute {
+                    node_id: c2,
+                    title: "API Reference".to_string(),
+                    description: "API docs".to_string(),
+                    leaf_count: 7,
+                },
+            ],
+        );
+
+        assert_eq!(resolve_target("1", &nav_index, root), Some(c1));
+        assert_eq!(resolve_target("2", &nav_index, root), Some(c2));
+        assert_eq!(resolve_target("3", &nav_index, root), None);
+    }
+
+    #[test]
+    fn test_resolve_target_exact() {
+        use crate::document::{ChildRoute, DocumentTree};
+
+        let mut tree = DocumentTree::new("Root", "");
+        let root = tree.root();
+        let c1 = tree.add_child(root, "Getting Started", "content");
+
+        let mut nav_index = NavigationIndex::new();
+        nav_index.add_child_routes(
+            root,
+            vec![ChildRoute {
+                node_id: c1,
+                title: "Getting Started".to_string(),
+                description: "Setup".to_string(),
+                leaf_count: 3,
+            }],
+        );
+
+        assert_eq!(
+            resolve_target("Getting Started", &nav_index, root),
+            Some(c1)
+        );
+    }
+
+    #[test]
+    fn test_resolve_target_case_insensitive() {
+        use crate::document::{ChildRoute, DocumentTree};
+
+        let mut tree = DocumentTree::new("Root", "");
+        let root = tree.root();
+        let c1 = tree.add_child(root, "Getting Started", "content");
+
+        let mut nav_index = NavigationIndex::new();
+        nav_index.add_child_routes(
+            root,
+            vec![ChildRoute {
+                node_id: c1,
+                title: "Getting Started".to_string(),
+                description: "Setup".to_string(),
+                leaf_count: 3,
+            }],
+        );
+
+        assert_eq!(
+            resolve_target("getting started", &nav_index, root),
+            Some(c1)
+        );
+        assert_eq!(
+            resolve_target("GETTING STARTED", &nav_index, root),
+            Some(c1)
+        );
+    }
+
+    #[test]
+    fn test_resolve_target_contains() {
+        use crate::document::{ChildRoute, DocumentTree};
+
+        let mut tree = DocumentTree::new("Root", "");
+        let root = tree.root();
+        let c1 = tree.add_child(root, "API Reference", "content");
+
+        let mut nav_index = NavigationIndex::new();
+        nav_index.add_child_routes(
+            root,
+            vec![ChildRoute {
+                node_id: c1,
+                title: "API Reference".to_string(),
+                description: "API docs".to_string(),
+                leaf_count: 7,
+            }],
+        );
+
+        assert_eq!(resolve_target("api", &nav_index, root), Some(c1));
+        assert_eq!(resolve_target("reference", &nav_index, root), Some(c1));
+    }
+
+    #[test]
+    fn test_resolve_target_no_routes() {
+        let nav_index = NavigationIndex::new();
+        let tree = crate::document::DocumentTree::new("Root", "");
+        assert!(resolve_target("anything", &nav_index, tree.root()).is_none());
+    }
+
+    #[test]
+    fn test_parse_grep() {
+        assert_eq!(
+            parse_command("grep EBITDA"),
+            Command::Grep {
+                pattern: "EBITDA".to_string()
+            }
+        );
+        assert_eq!(
+            parse_command("grep revenue.*2024"),
+            Command::Grep {
+                pattern: "revenue.*2024".to_string()
+            }
+        );
+    }
+
+    #[test]
+    fn test_parse_head() {
+        assert_eq!(
+            parse_command("head Installation"),
+            Command::Head {
+                target: "Installation".to_string(),
+                lines: 20
+            }
+        );
+        assert_eq!(
+            parse_command("head -n 5 API Reference"),
+            Command::Head {
+                target: "API Reference".to_string(),
+                lines: 5
+            }
+        );
+    }
+
+    #[test]
+    fn test_parse_findtree() {
+        assert_eq!(
+            parse_command("findtree revenue"),
+            Command::FindTree {
+                pattern: "revenue".to_string()
+            }
+        );
+        assert_eq!(
+            parse_command("findtree API Reference"),
+            Command::FindTree {
+                pattern: "API Reference".to_string()
+            }
+        );
+    }
+
+    #[test]
+    fn test_parse_wc() {
+        assert_eq!(
+            parse_command("wc Installation"),
+            Command::Wc {
+                target: "Installation".to_string()
+            }
+        );
+        assert_eq!(
+            parse_command("wc API Reference"),
+            Command::Wc {
+                target: "API Reference".to_string()
+            }
+        );
+    }
+}
diff --git a/rust/src/agent/config.rs b/rust/src/agent/config.rs
new file mode 100644
index 00000000..b276d94f
--- /dev/null
+++ b/rust/src/agent/config.rs
@@ -0,0 +1,242 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! Configuration and output types for the retrieval agent.
+
+use serde::{Deserialize, Serialize};
+
+// ---------------------------------------------------------------------------
+// Worker configuration
+// ---------------------------------------------------------------------------
+
+/// Worker configuration — navigation budget settings.
+#[derive(Debug, Clone)]
+pub struct WorkerConfig {
+    /// Maximum navigation rounds per Worker loop (ls/cd/cat/grep/head/find etc.).
+    /// `check` does NOT count against this budget.
+    pub max_rounds: u32,
+    /// Hard cap on total LLM calls per Worker (planning + nav + check).
+    /// Prevents runaway costs regardless of max_rounds. 0 = no limit.
+    pub max_llm_calls: u32,
+}
+
+impl Default for WorkerConfig {
+    fn default() -> Self {
+        Self {
+            max_rounds: 8,
+            max_llm_calls: 15,
+        }
+    }
+}
+
+impl WorkerConfig {
+    pub fn new() -> Self {
+        Self::default()
+    }
+}
+
+// ---------------------------------------------------------------------------
+// Answer pipeline configuration
+// ---------------------------------------------------------------------------
+
+/// Answer pipeline configuration — synthesis settings.
+#[derive(Debug, Clone)]
+pub struct AnswerConfig {
+    /// Maximum number of evidence items to feed into synthesis.
+    pub evidence_cap: usize,
+}
+
+impl Default for AnswerConfig {
+    fn default() -> Self {
+        Self { evidence_cap: 20 }
+    }
+}
+
+// ---------------------------------------------------------------------------
+// Aggregated agent configuration
+// ---------------------------------------------------------------------------
+
+/// Aggregated configuration for the entire retrieval agent system.
+#[derive(Debug, Clone, Default)]
+pub struct AgentConfig {
+    pub worker: WorkerConfig,
+    pub answer: AnswerConfig,
+}
+
+impl AgentConfig {
+    pub fn new() -> Self {
+        Self::default()
+    }
+}
+
+// ---------------------------------------------------------------------------
+// Output types
+// ---------------------------------------------------------------------------
+
+/// Agent output — the final result of a retrieval operation.
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct Output {
+    /// Final synthesized answer.
+    pub answer: String,
+    /// Collected evidence from navigation.
+    pub evidence: Vec<Evidence>,
+    /// Agent execution metrics.
+    pub metrics: Metrics,
+    /// Confidence score (0.0–1.0) — derived from LLM evaluate() result.
+    pub confidence: f32,
+}
+
+impl Output {
+    /// Create an empty output (no evidence found).
+    pub fn empty() -> Self {
+        Self {
+            answer: String::new(),
+            evidence: Vec::new(),
+            metrics: Metrics::default(),
+            confidence: 0.0,
+        }
+    }
+}
+
+/// A single piece of evidence collected during navigation.
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct Evidence {
+    /// Navigation path where this evidence was found (e.g., "Root/API Reference/Auth").
+    pub source_path: String,
+    /// Title of the node.
+    pub node_title: String,
+    /// Content of the node.
+    pub content: String,
+    /// Source document name (set by Orchestrator in multi-doc scenarios).
+    pub doc_name: Option<String>,
+}
+
+/// Agent execution metrics.
+#[derive(Debug, Clone, Default, Serialize, Deserialize)]
+pub struct Metrics {
+    pub rounds_used: u32,
+    pub llm_calls: u32,
+    pub nodes_visited: usize,
+    pub budget_exhausted: bool,
+    pub plan_generated: bool,
+    pub check_count: u32,
+    pub evidence_chars: usize,
+}
+
+/// Step result from the navigation loop.
+#[derive(Debug, Clone, PartialEq)]
+pub enum Step {
+    /// Continue to next round with the given feedback.
+    Continue,
+    /// Navigation is done, proceed to synthesis.
+    Done,
+    /// Forced done due to budget exhaustion or error.
+    ForceDone(String),
+}
+
+// ---------------------------------------------------------------------------
+// Worker output (evidence only, no answer)
+// ---------------------------------------------------------------------------
+
+/// Output from a single Worker — pure evidence, no answer synthesis.
+/// Rerank handles all answer generation.
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct WorkerOutput {
+    /// Collected evidence from document navigation.
+    pub evidence: Vec<Evidence>,
+    /// Worker execution metrics.
+    pub metrics: WorkerMetrics,
+    /// Document name this Worker was assigned to.
+    pub doc_name: String,
+}
+
+/// Metrics specific to a single Worker's execution.
+#[derive(Debug, Clone, Default, Serialize, Deserialize)]
+pub struct WorkerMetrics {
+    /// Number of navigation rounds used.
+    pub rounds_used: u32,
+    /// Number of LLM calls made.
+    pub llm_calls: u32,
+    /// Number of distinct nodes visited.
+    pub nodes_visited: usize,
+    /// Whether the LLM call budget was exhausted.
+    pub budget_exhausted: bool,
+    /// Whether a navigation plan was generated.
+    pub plan_generated: bool,
+    /// Number of times `check` was called.
+    pub check_count: u32,
+    /// Total characters of collected evidence.
+    pub evidence_chars: usize,
+}
+
+impl From<WorkerOutput> for Output {
+    fn from(wo: WorkerOutput) -> Self {
+        Output {
+            answer: String::new(),
+            evidence: wo.evidence,
+            metrics: Metrics {
+                rounds_used: wo.metrics.rounds_used,
+                llm_calls: wo.metrics.llm_calls,
+                nodes_visited: wo.metrics.nodes_visited,
+                budget_exhausted: wo.metrics.budget_exhausted,
+                plan_generated: wo.metrics.plan_generated,
+                check_count: wo.metrics.check_count,
+                evidence_chars: wo.metrics.evidence_chars,
+            },
+            confidence: 0.0,
+        }
+    }
+}
+
+// ---------------------------------------------------------------------------
+// Scope types
+// ---------------------------------------------------------------------------
+
+/// Scope context — determines which path the dispatcher takes.
+///
+/// Both variants go through the Orchestrator. The difference is:
+/// - `Specified`: user chose specific documents → skip Orchestrator analysis phase
+/// - `Workspace`: user didn't specify → Orchestrator analyzes DocCards to select docs
+pub enum Scope<'a> {
+    /// User specified one or more documents (by doc_id).
+    /// Orchestrator skips analysis, spawns Workers directly.
+    Specified(Vec<DocContext<'a>>),
+    /// Workspace scope — user didn't specify documents.
+    /// Orchestrator analyzes DocCards and selects relevant ones.
+    Workspace(WorkspaceContext<'a>),
+}
+
+/// Read-only access to a single document's compile artifacts.
+pub struct DocContext<'a> {
+    /// Document content tree.
+    pub tree: &'a crate::document::DocumentTree,
+    /// Navigation index (includes DocCard).
+    pub nav_index: &'a crate::document::NavigationIndex,
+    /// Reasoning index (keyword/topic lookup).
+    pub reasoning_index: &'a crate::document::ReasoningIndex,
+    /// Document name (for evidence source attribution).
+    pub doc_name: &'a str,
+}
+
+/// Read-only access to multiple documents' compile artifacts.
+pub struct WorkspaceContext<'a> {
+    /// All available documents.
+    pub docs: Vec<DocContext<'a>>,
+}
+
+impl<'a> WorkspaceContext<'a> {
+    /// Create a workspace from a slice of DocContexts.
+    pub fn new(docs: Vec<DocContext<'a>>) -> Self {
+        Self { docs }
+    }
+
+    /// Number of documents in the workspace.
+    pub fn doc_count(&self) -> usize {
+        self.docs.len()
+    }
+
+    /// Whether the workspace has only one document.
+    pub fn is_single(&self) -> bool {
+        self.docs.len() == 1
+    }
+}
diff --git a/rust/src/agent/context.rs b/rust/src/agent/context.rs
new file mode 100644
index 00000000..c4e542bf
--- /dev/null
+++ b/rust/src/agent/context.rs
@@ -0,0 +1,120 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! Read-only data access wrappers over compile artifacts.
+//!
+//! These types provide the agent with structured access to the document's
+//! navigation index, content tree, and reasoning index — all read-only.
+
+use crate::document::{ChildRoute, NodeId, TopicEntry};
+
+// Re-export from config for convenience
+pub use super::config::{DocContext, WorkspaceContext};
+
+/// A single hit from a keyword search.
+#[derive(Debug, Clone)]
+pub struct FindHit {
+    /// The matched keyword.
+    pub keyword: String,
+    /// Topic entries matching the keyword.
+    pub entries: Vec<TopicEntry>,
+}
+
+impl<'a> DocContext<'a> {
+    /// List child routes for a given node.
+    pub fn ls(&self, node: NodeId) -> Option<&[ChildRoute]> {
+        self.nav_index.get_child_routes(node)
+    }
+
+    /// Read the full content of a node.
+    pub fn cat(&self, node: NodeId) -> Option<&str> {
+        self.tree.get(node).map(|n| n.content.as_str())
+    }
+
+    /// Get the title of a node.
+    pub fn node_title(&self, node: NodeId) -> Option<&str> {
+        self.tree.get(node).map(|n| n.title.as_str())
+    }
+
+    /// Search for a keyword in the reasoning index.
+    pub fn find(&self, keyword: &str) -> Option<FindHit> {
+        self.reasoning_index
+            .topic_entries(keyword)
+            .map(|entries| FindHit {
+                keyword: keyword.to_string(),
+                entries: entries.to_vec(),
+            })
+    }
+
+    /// Search for multiple keywords, collecting all hits.
+    pub fn find_all(&self, keywords: &[String]) -> Vec<FindHit> {
+        keywords.iter().filter_map(|kw| self.find(kw)).collect()
+    }
+
+    /// Get the root node ID.
+    pub fn root(&self) -> NodeId {
+        self.tree.root()
+    }
+
+    /// Get the document's DocCard, if available.
+    pub fn doc_card(&self) -> Option<&crate::document::DocCard> {
+        self.nav_index.doc_card()
+    }
+
+    /// Get the navigation entry for a node (overview, hints, tags).
+    pub fn nav_entry(&self, node: NodeId) -> Option<&crate::document::NavEntry> {
+        self.nav_index.get_entry(node)
+    }
+
+    /// Get the summary shortcut (pre-computed overview), if available.
+    pub fn summary_shortcut(&self) -> Option<&crate::document::SummaryShortcut> {
+        self.reasoning_index.summary_shortcut()
+    }
+
+    /// Find a top-level section by its title, returning its NodeId.
+    pub fn find_section(&self, title: &str) -> Option<NodeId> {
+        self.reasoning_index.find_section(title)
+    }
+
+    /// Get the parent of a node (by searching the tree).
+    pub fn parent(&self, node: NodeId) -> Option<NodeId> {
+        self.tree.parent(node)
+    }
+}
+
+impl<'a> WorkspaceContext<'a> {
+    /// Search for a keyword across all documents.
+    pub fn find_cross(&self, keyword: &str) -> Vec<(usize, FindHit)> {
+        self.docs
+            .iter()
+            .enumerate()
+            .filter_map(|(idx, doc)| doc.find(keyword).map(|hit| (idx, hit)))
+            .collect()
+    }
+
+    /// Search multiple keywords across all documents.
+    pub fn find_cross_all(&self, keywords: &[String]) -> Vec<(usize, Vec<FindHit>)> {
+        let mut results: Vec<(usize, Vec<FindHit>)> = Vec::new();
+        for (idx, doc) in self.docs.iter().enumerate() {
+            let hits = doc.find_all(keywords);
+            if !hits.is_empty() {
+                results.push((idx, hits));
+            }
+        }
+        results
+    }
+
+    /// Get all DocCards for documents that have them.
+    pub fn doc_cards(&self) -> Vec<(usize, &crate::document::DocCard)> {
+        self.docs
+            .iter()
+            .enumerate()
+            .filter_map(|(idx, doc)| doc.doc_card().map(|card| (idx, card)))
+            .collect()
+    }
+
+    /// Get a specific document context by index.
+    pub fn doc(&self, idx: usize) -> Option<&DocContext<'a>> {
+        self.docs.get(idx)
+    }
+}
diff --git a/rust/src/agent/events.rs b/rust/src/agent/events.rs
new file mode 100644
index 00000000..e4575c93
--- /dev/null
+++ b/rust/src/agent/events.rs
@@ -0,0 +1,537 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! Agent events — rich, structured visibility into the entire retrieval pipeline.
+//!
+//! Events are organized by pipeline stage:
+//! 1. **Query Understanding** — intent analysis, keyword extraction
+//! 2. **Orchestrator** — document selection, dispatch, evaluation, replan
+//! 3. **Worker** — navigation, evidence collection, budget management
+//! 4. **Answer** — synthesis and fusion
+//!
+//! The stream terminates with `Completed` or `Error`.
+
+use serde::Serialize;
+
+/// An event emitted during agent-based retrieval.
+///
+/// Each variant carries the data a client needs to understand what happened,
+/// not just that something happened. All events are `Clone + Serialize` so
+/// they can be broadcast or persisted.
+#[derive(Debug, Clone, Serialize)]
+pub enum AgentEvent {
+    // ── Query Understanding ──────────────────────────────────────────
+    /// Query understanding started.
+    QueryUnderstandingStarted { query: String },
+
+    /// Query understanding completed (intent, keywords, strategy decided).
+    QueryUnderstandingCompleted {
+        query: String,
+        intent: String,
+        keywords: Vec<String>,
+        strategy_hint: String,
+        complexity: String,
+    },
+
+    // ── Orchestrator ─────────────────────────────────────────────────
+    /// Orchestrator started.
+    OrchestratorStarted {
+        query: String,
+        doc_count: usize,
+        skip_analysis: bool,
+    },
+
+    /// Orchestrator is analyzing documents to select which to dispatch.
+    OrchestratorAnalyzing {
+        doc_count: usize,
+        keywords: Vec<String>,
+    },
+
+    /// A Worker was dispatched to a document.
+    WorkerDispatched {
+        doc_idx: usize,
+        doc_name: String,
+        task: String,
+        focus_keywords: Vec<String>,
+    },
+
+    /// A Worker finished its task.
+    WorkerCompleted {
+        doc_idx: usize,
+        doc_name: String,
+        evidence_count: usize,
+        rounds_used: u32,
+        llm_calls: u32,
+        success: bool,
+    },
+
+    /// Cross-doc sufficiency evaluation result.
+    OrchestratorEvaluated {
+        sufficient: bool,
+        evidence_count: usize,
+        missing_info: Option<String>,
+    },
+
+    /// Orchestrator is replanning after insufficient evidence.
+    OrchestratorReplanning {
+        reason: String,
+        evidence_count: usize,
+    },
+
+    /// Orchestrator completed.
+    OrchestratorCompleted {
+        evidence_count: usize,
+        total_llm_calls: u32,
+        dispatch_rounds: u32,
+    },
+
+    // ── Worker (per-document navigation) ─────────────────────────────
+    /// Worker started on a document.
+    WorkerStarted {
+        doc_name: String,
+        task: Option<String>,
+        max_rounds: u32,
+    },
+
+    /// Worker generated a navigation plan.
+    WorkerPlanGenerated { doc_name: String, plan_len: usize },
+
+    /// A navigation round completed.
+    WorkerRound {
+        doc_name: String,
+        round: u32,
+        command: String,
+        success: bool,
+        elapsed_ms: u64,
+    },
+
+    /// Evidence was collected from a node.
+    EvidenceCollected {
+        doc_name: String,
+        node_title: String,
+        source_path: String,
+        content_len: usize,
+        total_evidence: usize,
+    },
+
+    /// Worker sufficiency check result.
+    WorkerSufficiencyCheck {
+        doc_name: String,
+        sufficient: bool,
+        evidence_count: usize,
+        missing_info: Option<String>,
+    },
+
+    /// Worker re-planned after insufficient check.
+    WorkerReplan {
+        doc_name: String,
+        missing_info: String,
+        plan_len: usize,
+    },
+
+    /// Worker budget warning (stuck or half-budget).
+    WorkerBudgetWarning {
+        doc_name: String,
+        warning_type: String,
+        round: u32,
+    },
+
+    /// Worker completed.
+    WorkerDone {
+        doc_name: String,
+        evidence_count: usize,
+        rounds_used: u32,
+        llm_calls: u32,
+        budget_exhausted: bool,
+        plan_generated: bool,
+    },
+
+    // ── Answer Pipeline ──────────────────────────────────────────────
+    /// Answer synthesis started.
+    AnswerStarted {
+        evidence_count: usize,
+        multi_doc: bool,
+    },
+
+    /// Answer synthesis completed.
+    AnswerCompleted {
+        answer_len: usize,
+        confidence: String,
+    },
+
+    // ── Terminal ─────────────────────────────────────────────────────
+    /// Entire retrieval pipeline completed.
+    Completed {
+        evidence_count: usize,
+        llm_calls: u32,
+        answer_len: usize,
+    },
+
+    /// An error occurred.
+    Error { stage: String, message: String },
+}
+
+// ---------------------------------------------------------------------------
+// Channel + EventEmitter
+// ---------------------------------------------------------------------------
+
+/// Sender for agent events.
+pub(crate) type AgentEventSender = tokio::sync::mpsc::Sender<AgentEvent>;
+
+/// Receiver for agent events.
+pub type AgentEventReceiver = tokio::sync::mpsc::Receiver<AgentEvent>;
+
+/// Create a bounded channel for agent events.
+pub(crate) fn channel(bound: usize) -> (AgentEventSender, AgentEventReceiver) {
+    tokio::sync::mpsc::channel(bound)
+}
+
+/// Default channel bound for agent events.
+pub const DEFAULT_AGENT_EVENT_BOUND: usize = 256;
+
+/// A handle for emitting agent events.
+///
+/// Wraps an `mpsc::Sender` and silently drops events if the receiver
+/// is closed (no panic on send failure). Cheaply clonable.
+#[derive(Clone)]
+pub struct EventEmitter {
+    tx: Option<AgentEventSender>,
+}
+
+impl EventEmitter {
+    /// Create a new emitter with the given sender.
+    pub fn new(tx: AgentEventSender) -> Self {
+        Self { tx: Some(tx) }
+    }
+
+    /// Create a noop emitter that discards all events.
+    pub fn noop() -> Self {
+        Self { tx: None }
+    }
+
+    /// Emit an event. Silently drops if the receiver is closed.
+    pub fn emit(&self, event: AgentEvent) {
+        if let Some(ref tx) = self.tx {
+            let _ = tx.try_send(event);
+        }
+    }
+
+    // ── Query Understanding ──
+
+    pub fn emit_query_understanding_started(&self, query: &str) {
+        self.emit(AgentEvent::QueryUnderstandingStarted {
+            query: query.to_string(),
+        });
+    }
+
+    pub fn emit_query_understanding_completed(
+        &self,
+        query: &str,
+        intent: &str,
+        keywords: &[String],
+        strategy_hint: &str,
+        complexity: &str,
+    ) {
+        self.emit(AgentEvent::QueryUnderstandingCompleted {
+            query: query.to_string(),
+            intent: intent.to_string(),
+            keywords: keywords.to_vec(),
+            strategy_hint: strategy_hint.to_string(),
+            complexity: complexity.to_string(),
+        });
+    }
+
+    // ── Orchestrator ──
+
+    pub fn emit_orchestrator_started(&self, query: &str, doc_count: usize, skip_analysis: bool) {
+        self.emit(AgentEvent::OrchestratorStarted {
+            query: query.to_string(),
+            doc_count,
+            skip_analysis,
+        });
+    }
+
+    pub fn emit_orchestrator_analyzing(&self, doc_count: usize, keywords: &[String]) {
+        self.emit(AgentEvent::OrchestratorAnalyzing {
+            doc_count,
+            keywords: keywords.to_vec(),
+        });
+    }
+
+    pub fn emit_worker_dispatched(
+        &self,
+        doc_idx: usize,
+        doc_name: &str,
+        task: &str,
+        focus_keywords: &[String],
+    ) {
+        self.emit(AgentEvent::WorkerDispatched {
+            doc_idx,
+            doc_name: doc_name.to_string(),
+            task: task.to_string(),
+            focus_keywords: focus_keywords.to_vec(),
+        });
+    }
+
+    pub fn emit_worker_completed(
+        &self,
+        doc_idx: usize,
+        doc_name: &str,
+        evidence_count: usize,
+        rounds_used: u32,
+        llm_calls: u32,
+        success: bool,
+    ) {
+        self.emit(AgentEvent::WorkerCompleted {
+            doc_idx,
+            doc_name: doc_name.to_string(),
+            evidence_count,
+            rounds_used,
+            llm_calls,
+            success,
+        });
+    }
+
+    pub fn emit_orchestrator_evaluated(
+        &self,
+        sufficient: bool,
+        evidence_count: usize,
+        missing_info: Option<&str>,
+    ) {
+        self.emit(AgentEvent::OrchestratorEvaluated {
+            sufficient,
+            evidence_count,
+            missing_info: missing_info.map(|s| s.to_string()),
+        });
+    }
+
+    pub fn emit_orchestrator_replanning(&self, reason: &str, evidence_count: usize) {
+        self.emit(AgentEvent::OrchestratorReplanning {
+            reason: reason.to_string(),
+            evidence_count,
+        });
+    }
+
+    pub fn emit_orchestrator_completed(
+        &self,
+        evidence_count: usize,
+        total_llm_calls: u32,
+        dispatch_rounds: u32,
+    ) {
+        self.emit(AgentEvent::OrchestratorCompleted {
+            evidence_count,
+            total_llm_calls,
+            dispatch_rounds,
+        });
+    }
+
+    // ── Worker ──
+
+    pub fn emit_worker_started(&self, doc_name: &str, task: Option<&str>, max_rounds: u32) {
+        self.emit(AgentEvent::WorkerStarted {
+            doc_name: doc_name.to_string(),
+            task: task.map(|s| s.to_string()),
+            max_rounds,
+        });
+    }
+
+    pub fn emit_worker_plan_generated(&self, doc_name: &str, plan_len: usize) {
+        self.emit(AgentEvent::WorkerPlanGenerated {
+            doc_name: doc_name.to_string(),
+            plan_len,
+        });
+    }
+
+    pub fn emit_worker_round(
+        &self,
+        doc_name: &str,
+        round: u32,
+        command: &str,
+        success: bool,
+        elapsed_ms: u64,
+    ) {
+        self.emit(AgentEvent::WorkerRound {
+            doc_name: doc_name.to_string(),
+            round,
+            command: command.to_string(),
+            success,
+            elapsed_ms,
+        });
+    }
+
+    pub fn emit_evidence(
+        &self,
+        doc_name: &str,
+        node_title: &str,
+        source_path: &str,
+        content_len: usize,
+        total: usize,
+    ) {
+        self.emit(AgentEvent::EvidenceCollected {
+            doc_name: doc_name.to_string(),
+            node_title: node_title.to_string(),
+            source_path: source_path.to_string(),
+            content_len,
+            total_evidence: total,
+        });
+    }
+
+    pub fn emit_worker_sufficiency_check(
+        &self,
+        doc_name: &str,
+        sufficient: bool,
+        evidence_count: usize,
+        missing_info: Option<&str>,
+    ) {
+        self.emit(AgentEvent::WorkerSufficiencyCheck {
+            doc_name: doc_name.to_string(),
+            sufficient,
+            evidence_count,
+            missing_info: missing_info.map(|s| s.to_string()),
+        });
+    }
+
+    pub fn emit_worker_replan(&self, doc_name: &str, missing_info: &str, plan_len: usize) {
+        self.emit(AgentEvent::WorkerReplan {
+            doc_name: doc_name.to_string(),
+            missing_info: missing_info.to_string(),
+            plan_len,
+        });
+    }
+
+    pub fn emit_worker_budget_warning(&self, doc_name: &str, warning_type: &str, round: u32) {
+        self.emit(AgentEvent::WorkerBudgetWarning {
+            doc_name: doc_name.to_string(),
+            warning_type: warning_type.to_string(),
+            round,
+        });
+    }
+
+    pub fn emit_worker_done(
+        &self,
+        doc_name: &str,
+        evidence_count: usize,
+        rounds_used: u32,
+        llm_calls: u32,
+        budget_exhausted: bool,
+        plan_generated: bool,
+    ) {
+        self.emit(AgentEvent::WorkerDone {
+            doc_name: doc_name.to_string(),
+            evidence_count,
+            rounds_used,
+            llm_calls,
+            budget_exhausted,
+            plan_generated,
+        });
+    }
+
+    // ── Answer ──
+
+    pub fn emit_answer_started(&self, evidence_count: usize, multi_doc: bool) {
+        self.emit(AgentEvent::AnswerStarted {
+            evidence_count,
+            multi_doc,
+        });
+    }
+
+    pub fn emit_answer_completed(&self, answer_len: usize, confidence: &str) {
+        self.emit(AgentEvent::AnswerCompleted {
+            answer_len,
+            confidence: confidence.to_string(),
+        });
+    }
+
+    // ── Terminal ──
+
+    pub fn emit_completed(&self, evidence_count: usize, llm_calls: u32, answer_len: usize) {
+        self.emit(AgentEvent::Completed {
+            evidence_count,
+            llm_calls,
+            answer_len,
+        });
+    }
+
+    pub fn emit_error(&self, stage: &str, message: &str) {
+        self.emit(AgentEvent::Error {
+            stage: stage.to_string(),
+            message: message.to_string(),
+        });
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn test_noop_emitter() {
+        let emitter = EventEmitter::noop();
+        emitter.emit_orchestrator_started("test", 1, false);
+        emitter.emit_worker_started("doc.md", None, 8);
+        emitter.emit_worker_round("doc.md", 1, "ls", true, 50);
+        emitter.emit_worker_done("doc.md", 0, 1, 1, false, false);
+        emitter.emit_completed(0, 1, 0);
+        // No panic — events silently dropped
+    }
+
+    #[test]
+    fn test_event_roundtrip() {
+        let (tx, mut rx) = channel(DEFAULT_AGENT_EVENT_BOUND);
+        let emitter = EventEmitter::new(tx);
+
+        emitter.emit_orchestrator_started("what is X?", 1, true);
+        emitter.emit_worker_started("doc.md", None, 8);
+        emitter.emit_evidence("doc.md", "Intro", "root/Intro", 100, 1);
+        emitter.emit_worker_sufficiency_check("doc.md", true, 1, None);
+        emitter.emit_worker_done("doc.md", 1, 3, 5, false, true);
+        emitter.emit_completed(1, 6, 42);
+
+        let events: Vec<AgentEvent> = (0..6).map(|_| rx.blocking_recv().unwrap()).collect();
+
+        assert!(
+            matches!(&events[0], AgentEvent::OrchestratorStarted { query, .. } if query == "what is X?")
+        );
+        assert!(
+            matches!(&events[1], AgentEvent::WorkerStarted { doc_name, .. } if doc_name == "doc.md")
+        );
+        assert!(
+            matches!(&events[2], AgentEvent::EvidenceCollected { node_title, .. } if node_title == "Intro")
+        );
+        assert!(matches!(
+            &events[3],
+            AgentEvent::WorkerSufficiencyCheck {
+                sufficient: true,
+                ..
+            }
+        ));
+        assert!(matches!(
+            &events[4],
+            AgentEvent::WorkerDone {
+                evidence_count: 1,
+                plan_generated: true,
+                ..
+            }
+        ));
+        assert!(matches!(
+            &events[5],
+            AgentEvent::Completed {
+                evidence_count: 1,
+                answer_len: 42,
+                ..
+            }
+        ));
+    }
+
+    #[test]
+    fn test_serialization() {
+        let event = AgentEvent::OrchestratorStarted {
+            query: "test".to_string(),
+            doc_count: 3,
+            skip_analysis: false,
+        };
+        let json = serde_json::to_string(&event).unwrap();
+        assert!(json.contains("OrchestratorStarted"));
+        assert!(json.contains("test"));
+    }
+}
diff --git a/rust/src/agent/mod.rs b/rust/src/agent/mod.rs
new file mode 100644
index 00000000..f471258a
--- /dev/null
+++ b/rust/src/agent/mod.rs
@@ -0,0 +1,55 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! Retrieval agent — struct-based document intelligence.
+//!
+//! # Architecture
+//!
+//! The retrieval dispatcher always goes through the Orchestrator.
+//! Based on [`Scope`]:
+//!
+//! - **User specified doc_ids** → Orchestrator skips analysis, spawns Workers directly.
+//! - **Workspace / unspecified** → Orchestrator analyzes DocCards, selects docs, spawns Workers.
+//!
+//! Both paths produce the same [`Output`] type and share the same synthesis logic.
+//!
+//! ```text
+//! dispatch(query, scope)
+//!     └── Orchestrator (always)
+//!          ├── Scope::Specified(docs) → skip analysis → N × Worker → synthesis
+//!          └── Scope::Workspace(ws)  → analysis → N × Worker → fusion → synthesis
+//! ```
+//!
+//! # Agent trait
+//!
+//! All retrieval agents implement [`Agent`] with `async fn run(self)` (Edition 2024).
+//! The trait uses native async functions — no `async-trait` crate needed.
+
+pub mod command;
+pub mod config;
+pub mod context;
+pub mod events;
+pub mod state;
+pub mod tools;
+
+pub mod orchestrator;
+pub mod prompts;
+pub mod worker;
+
+pub use config::{DocContext, Evidence, Output, Scope, WorkspaceContext};
+pub use events::{AgentEvent, EventEmitter};
+
+/// Agent trait — async, consuming-self execution.
+///
+/// Each agent struct holds its own configuration and context.
+/// Calling `run(self)` consumes the agent and produces output.
+///
+/// Uses Edition 2024 native `async fn` in trait — no `async-trait` crate.
+pub trait Agent {
+    /// The output type produced by this agent.
+    type Output;
+    /// Agent name for logging and events.
+    fn name(&self) -> &str;
+    /// Execute the agent, consuming self.
+    async fn run(self) -> crate::error::Result<Self::Output>;
+}
diff --git a/rust/src/agent/orchestrator/analyze.rs b/rust/src/agent/orchestrator/analyze.rs
new file mode 100644
index 00000000..47dd58f1
--- /dev/null
+++ b/rust/src/agent/orchestrator/analyze.rs
@@ -0,0 +1,159 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! Phase 1: Analyze documents and produce a dispatch plan.
+//!
+//! Uses the [`QueryPlan`] from query understanding to inform document selection.
+//! LLM errors propagate — no silent degradation.
+
+use tracing::{debug, info};
+
+use crate::error::Error;
+use crate::llm::LlmClient;
+use crate::query::QueryPlan;
+use crate::scoring::bm25::extract_keywords;
+
+use super::super::config::WorkspaceContext;
+use super::super::prompts::{DispatchEntry, orchestrator_analysis, parse_dispatch_plan};
+use super::super::state::OrchestratorState;
+use super::super::tools::orchestrator as orch_tools;
+
+/// Outcome of the analyze phase.
+pub enum AnalyzeOutcome {
+    /// Produce dispatch entries for Phase 2.
+    Proceed {
+        dispatches: Vec<DispatchEntry>,
+        llm_calls: u32,
+    },
+    /// Cross-doc search already answered the query.
+    AlreadyAnswered { llm_calls: u32 },
+    /// No relevant documents found.
+    NoResults { llm_calls: u32 },
+}
+
+/// Analyze documents and produce a dispatch plan.
+///
+/// Uses the [`QueryPlan`] for intent-aware analysis:
+/// - Intent and key concepts inform the LLM about what to look for
+/// - Complexity hints at how many documents may be needed
+/// - Strategy hint guides the analysis approach
+///
+/// LLM failures propagate as [`Error::LlmReasoning`] — no fallback.
+pub async fn analyze(
+    query: &str,
+    ws: &WorkspaceContext<'_>,
+    state: &mut OrchestratorState,
+    emitter: &crate::agent::EventEmitter,
+    skip_analysis: bool,
+    query_plan: &QueryPlan,
+    llm: &LlmClient,
+) -> crate::error::Result<AnalyzeOutcome> {
+    if skip_analysis {
+        debug!("Phase 1: skipping (user-specified documents)");
+        let dispatches = (0..ws.doc_count())
+            .map(|idx| DispatchEntry {
+                doc_idx: idx,
+                reason: "User-specified document".to_string(),
+                task: query.to_string(),
+            })
+            .collect();
+        return Ok(AnalyzeOutcome::Proceed {
+            dispatches,
+            llm_calls: 0,
+        });
+    }
+
+    debug!(
+        intent = %query_plan.intent,
+        complexity = %query_plan.complexity,
+        strategy = query_plan.strategy_hint,
+        "Phase 1: analyzing doc cards with query understanding"
+    );
+
+    let doc_cards_text = orch_tools::ls_docs(ws).feedback;
+    let keywords = extract_keywords(query);
+    let find_text = if keywords.is_empty() {
+        "(no keywords extracted)".to_string()
+    } else {
+        orch_tools::find_cross(&keywords, ws).feedback
+    };
+
+    info!(keywords = ?keywords, "Phase 1: analyzing");
+    debug!(
+        doc_cards_len = doc_cards_text.len(),
+        find_results_len = find_text.len(),
+        "Phase 1: analysis input"
+    );
+
+    // Build analysis prompt enriched with query understanding
+    let concepts_text = if query_plan.key_concepts.is_empty() {
+        String::new()
+    } else {
+        format!("\nKey concepts: {}", query_plan.key_concepts.join(", "))
+    };
+
+    let strategy_text = if query_plan.strategy_hint.is_empty() {
+        String::new()
+    } else {
+        format!("\nRetrieval strategy: {}", query_plan.strategy_hint)
+    };
+
+    let rewritten_text = if query_plan.rewritten.is_empty() {
+        String::new()
+    } else {
+        format!(
+            "\nRewritten queries for matching: {}",
+            query_plan.rewritten.join("; ")
+        )
+    };
+
+    let intent_context = format!(
+        "\nQuery intent: {} (complexity: {}){concepts_text}{strategy_text}{rewritten_text}",
+        query_plan.intent, query_plan.complexity,
+    );
+
+    let (system, user) =
+        orchestrator_analysis(&super::super::prompts::OrchestratorAnalysisParams {
+            query,
+            doc_cards: &doc_cards_text,
+            find_results: &find_text,
+            intent_context: &intent_context,
+        });
+
+    let analysis_output = llm.complete(&system, &user).await.map_err(|e| {
+        emitter.emit_error("orchestrator/analysis", &e.to_string());
+        Error::LlmReasoning {
+            stage: "orchestrator/analysis".to_string(),
+            detail: format!("LLM call failed: {e}"),
+        }
+    })?;
+
+    info!(
+        response_len = analysis_output.len(),
+        response = %if analysis_output.len() > 500 { &analysis_output[..500] } else { &analysis_output },
+        "Phase 1: analysis LLM response"
+    );
+
+    let dispatches = match parse_dispatch_plan(&analysis_output, ws.doc_count()) {
+        Some(entries) => entries,
+        None => {
+            info!("Orchestrator: analysis indicates already answered");
+            return Ok(AnalyzeOutcome::AlreadyAnswered { llm_calls: 1 });
+        }
+    };
+
+    info!(
+        dispatches = dispatches.len(),
+        "Phase 1: parsed dispatch plan"
+    );
+
+    if dispatches.is_empty() {
+        return Ok(AnalyzeOutcome::NoResults { llm_calls: 1 });
+    }
+
+    state.analyze_done = true;
+    Ok(AnalyzeOutcome::Proceed {
+        dispatches,
+        llm_calls: 1,
+    })
+}
diff --git a/rust/src/agent/orchestrator/dispatch.rs b/rust/src/agent/orchestrator/dispatch.rs
new file mode 100644
index 00000000..f599ac1d
--- /dev/null
+++ b/rust/src/agent/orchestrator/dispatch.rs
@@ -0,0 +1,92 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! Phase 2: Dispatch Workers and collect results.
+
+use tracing::{info, warn};
+
+use crate::llm::LlmClient;
+
+use super::super::Agent;
+use super::super::config::{AgentConfig, WorkspaceContext};
+use super::super::events::EventEmitter;
+use super::super::prompts::DispatchEntry;
+use super::super::state::OrchestratorState;
+use super::super::worker::Worker;
+use crate::query::QueryPlan;
+
+/// Dispatch Workers in parallel and collect results.
+pub async fn dispatch_and_collect(
+    query: &str,
+    dispatches: &[DispatchEntry],
+    ws: &WorkspaceContext<'_>,
+    config: &AgentConfig,
+    llm: &LlmClient,
+    state: &mut OrchestratorState,
+    emitter: &EventEmitter,
+    query_plan: &QueryPlan,
+) {
+    let futures: Vec<_> = dispatches
+        .iter()
+        .filter_map(|dispatch| {
+            let doc = match ws.doc(dispatch.doc_idx) {
+                Some(d) => d,
+                None => {
+                    warn!(doc_idx = dispatch.doc_idx, "Document not found, skipping");
+                    return None;
+                }
+            };
+
+            let query = query.to_string();
+            let task = dispatch.task.clone();
+            let worker_config = config.worker.clone();
+            let doc_idx = dispatch.doc_idx;
+            let doc_name = doc.doc_name.to_string();
+            let llm = llm.clone();
+            let sub_emitter = EventEmitter::noop();
+            let worker_plan = query_plan.clone();
+
+            Some(async move {
+                emitter.emit_worker_dispatched(doc_idx, &doc_name, &task, &[]);
+                let worker = Worker::new(
+                    &query,
+                    Some(&task),
+                    doc,
+                    worker_config,
+                    llm,
+                    sub_emitter,
+                    worker_plan,
+                );
+                let result = worker.run().await;
+                (doc_idx, doc_name, result)
+            })
+        })
+        .collect();
+
+    let results: Vec<_> = futures::future::join_all(futures).await;
+
+    for (doc_idx, doc_name, result) in results {
+        match result {
+            Ok(output) => {
+                info!(
+                    doc_idx,
+                    evidence = output.evidence.len(),
+                    "Worker completed"
+                );
+                emitter.emit_worker_completed(
+                    doc_idx,
+                    &doc_name,
+                    output.evidence.len(),
+                    output.metrics.rounds_used,
+                    output.metrics.llm_calls,
+                    true,
+                );
+                state.collect_result(doc_idx, output);
+            }
+            Err(e) => {
+                warn!(doc_idx, error = %e, "Worker failed");
+                emitter.emit_worker_completed(doc_idx, &doc_name, 0, 0, 0, false);
+            }
+        }
+    }
+}
diff --git a/rust/src/agent/orchestrator/evaluate.rs b/rust/src/agent/orchestrator/evaluate.rs
new file mode 100644
index 00000000..27c8aab6
--- /dev/null
+++ b/rust/src/agent/orchestrator/evaluate.rs
@@ -0,0 +1,128 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! Evaluate cross-document evidence sufficiency via LLM.
+//!
+//! Replaces the old `integrate` module's heuristic sufficiency check.
+//! LLM errors propagate — no silent "assume sufficient" fallback.
+
+use tracing::info;
+
+use crate::error::Error;
+use crate::llm::LlmClient;
+
+use super::super::config::Evidence;
+use super::super::prompts::{check_sufficiency, parse_sufficiency_response};
+
+/// Result of the evidence sufficiency evaluation.
+pub struct EvalResult {
+    /// Whether the collected evidence is sufficient to answer the query.
+    pub sufficient: bool,
+    /// Description of what information is still missing (empty if sufficient).
+    pub missing_info: String,
+}
+
+/// Evaluate cross-document evidence sufficiency via LLM.
+///
+/// Propagates LLM errors as [`Error::LlmReasoning`].
+/// The caller decides how to handle insufficiency (replan, abort, etc.).
+pub async fn evaluate(
+    query: &str,
+    evidence: &[Evidence],
+    llm: &LlmClient,
+) -> crate::error::Result<EvalResult> {
+    let evidence_summary = format_evidence_summary(evidence);
+    let (system, user) = check_sufficiency(query, &evidence_summary);
+
+    let response = llm
+        .complete(&system, &user)
+        .await
+        .map_err(|e| Error::LlmReasoning {
+            stage: "orchestrator/evaluate".to_string(),
+            detail: format!("Sufficiency check LLM call failed: {e}"),
+        })?;
+
+    let sufficient = parse_sufficiency_response(&response);
+    let missing_info = if sufficient {
+        String::new()
+    } else {
+        // Extract the reason from the response (everything after SUFFICIENT/INSUFFICIENT)
+        let reason = response
+            .trim()
+            .strip_prefix("INSUFFICIENT")
+            .or_else(|| response.trim().strip_prefix("Insufficient"))
+            .unwrap_or("")
+            .trim_start_matches(|c: char| c == '-' || c == ' ' || c == ':');
+        if reason.is_empty() {
+            "Evidence does not fully address the query.".to_string()
+        } else {
+            reason.to_string()
+        }
+    };
+
+    info!(
+        sufficient,
+        evidence = evidence.len(),
+        missing_info_len = missing_info.len(),
+        "Cross-doc sufficiency evaluation"
+    );
+
+    Ok(EvalResult {
+        sufficient,
+        missing_info,
+    })
+}
+
+/// Format evidence summary for sufficiency check.
+pub fn format_evidence_summary(evidence: &[Evidence]) -> String {
+    if evidence.is_empty() {
+        return "(no evidence)".to_string();
+    }
+    evidence
+        .iter()
+        .map(|e| {
+            let doc = e.doc_name.as_deref().unwrap_or("unknown");
+            format!(
+                "- [{}] (from {}) {} chars",
+                e.node_title,
+                doc,
+                e.content.len()
+            )
+        })
+        .collect::<Vec<_>>()
+        .join("\n")
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn test_format_evidence_summary() {
+        let evidence = vec![
+            Evidence {
+                source_path: "root/A".to_string(),
+                node_title: "A".to_string(),
+                content: "content".to_string(),
+                doc_name: Some("doc1".to_string()),
+            },
+            Evidence {
+                source_path: "root/B".to_string(),
+                node_title: "B".to_string(),
+                content: "more content".to_string(),
+                doc_name: Some("doc2".to_string()),
+            },
+        ];
+        let summary = format_evidence_summary(&evidence);
+        assert!(summary.contains("[A]"));
+        assert!(summary.contains("doc1"));
+        assert!(summary.contains("[B]"));
+        assert!(summary.contains("doc2"));
+    }
+
+    #[test]
+    fn test_format_evidence_summary_empty() {
+        let summary = format_evidence_summary(&[]);
+        assert!(summary.contains("no evidence"));
+    }
+}
diff --git a/rust/src/agent/orchestrator/mod.rs b/rust/src/agent/orchestrator/mod.rs
new file mode 100644
index 00000000..9cf96153
--- /dev/null
+++ b/rust/src/agent/orchestrator/mod.rs
@@ -0,0 +1,297 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! Orchestrator agent — supervisor loop for multi-document retrieval.
+//!
+//! The Orchestrator is a consuming-self struct implementing [`Agent`]:
+//! 1. Analyze: LLM selects documents + tasks (informed by QueryPlan)
+//! 2. Supervisor loop: dispatch → evaluate → replan if insufficient
+//! 3. Rerank: dedup → BM25 scoring → synthesis/fusion
+
+mod analyze;
+mod dispatch;
+mod evaluate;
+mod replan;
+
+use tracing::info;
+
+use crate::llm::LlmClient;
+use crate::query::QueryPlan;
+
+use super::Agent;
+use super::config::{AgentConfig, Output, WorkspaceContext};
+use super::events::EventEmitter;
+use super::state::OrchestratorState;
+use super::tools::orchestrator as orch_tools;
+
+use analyze::{AnalyzeOutcome, analyze};
+use evaluate::evaluate;
+use replan::replan;
+
+/// Maximum supervisor loop iterations to prevent infinite loops.
+const MAX_SUPERVISOR_ITERATIONS: u32 = 3;
+
+/// Orchestrator agent — coordinates multi-document retrieval.
+///
+/// Holds all execution context. Calling [`run()`](Agent::run) consumes self.
+pub struct Orchestrator<'a> {
+    query: String,
+    ws: &'a WorkspaceContext<'a>,
+    config: AgentConfig,
+    llm: LlmClient,
+    emitter: EventEmitter,
+    skip_analysis: bool,
+    /// Query understanding plan — produced by `QueryPipeline::understand()`.
+    /// Contains intent, complexity, key concepts, and strategy hints.
+    query_plan: QueryPlan,
+}
+
+impl<'a> Orchestrator<'a> {
+    /// Create a new Orchestrator.
+    pub fn new(
+        query: &str,
+        ws: &'a WorkspaceContext<'a>,
+        config: AgentConfig,
+        llm: LlmClient,
+        emitter: EventEmitter,
+        skip_analysis: bool,
+        query_plan: QueryPlan,
+    ) -> Self {
+        Self {
+            query: query.to_string(),
+            ws,
+            config,
+            llm,
+            emitter,
+            skip_analysis,
+            query_plan,
+        }
+    }
+}
+
+impl<'a> Agent for Orchestrator<'a> {
+    type Output = Output;
+
+    fn name(&self) -> &str {
+        "orchestrator"
+    }
+
+    async fn run(self) -> crate::error::Result<Output> {
+        let Orchestrator {
+            query,
+            ws,
+            config,
+            llm,
+            emitter,
+            skip_analysis,
+            query_plan,
+        } = self;
+
+        info!(
+            docs = ws.doc_count(),
+            skip_analysis,
+            intent = %query_plan.intent,
+            complexity = %query_plan.complexity,
+            "Orchestrator starting"
+        );
+        emitter.emit_orchestrator_started(&query, ws.doc_count(), skip_analysis);
+
+        let mut state = OrchestratorState::new();
+        let mut orch_llm_calls: u32 = 0;
+
+        // --- Phase 1: Analyze — LLM selects documents + tasks ---
+        let initial_dispatches = match analyze(
+            &query,
+            ws,
+            &mut state,
+            &emitter,
+            skip_analysis,
+            &query_plan,
+            &llm,
+        )
+        .await?
+        {
+            AnalyzeOutcome::Proceed {
+                dispatches,
+                llm_calls,
+            } => {
+                orch_llm_calls += llm_calls;
+                dispatches
+            }
+            AnalyzeOutcome::AlreadyAnswered { llm_calls } => {
+                let mut output = Output::empty();
+                output.answer = "Already answered by cross-document search.".to_string();
+                emitter.emit_orchestrator_completed(0, orch_llm_calls + llm_calls, 0);
+                return Ok(output);
+            }
+            AnalyzeOutcome::NoResults { llm_calls } => {
+                emitter.emit_orchestrator_completed(0, orch_llm_calls + llm_calls, 0);
+                return Ok(Output::empty());
+            }
+        };
+
+        // --- Phase 2: Supervisor loop ---
+        let mut current_dispatches = initial_dispatches;
+        let mut iteration: u32 = 0;
+        let mut eval_sufficient = false;
+
+        loop {
+            if iteration >= MAX_SUPERVISOR_ITERATIONS {
+                info!(iteration, "Supervisor loop budget exhausted");
+                break;
+            }
+
+            // Dispatch current plan
+            if !current_dispatches.is_empty() {
+                info!(
+                    docs = current_dispatches.len(),
+                    docs_list = ?current_dispatches.iter().map(|d| d.doc_idx).collect::<Vec<_>>(),
+                    iteration,
+                    "Dispatching Workers"
+                );
+                dispatch::dispatch_and_collect(
+                    &query,
+                    &current_dispatches,
+                    ws,
+                    &config,
+                    &llm,
+                    &mut state,
+                    &emitter,
+                    &query_plan,
+                )
+                .await;
+            }
+
+            // No evidence at all — nothing to evaluate
+            if state.all_evidence.is_empty() {
+                info!("No evidence collected from any Worker");
+                break;
+            }
+
+            // Skip evaluation for user-specified documents (no replan needed)
+            if skip_analysis {
+                eval_sufficient = !state.all_evidence.is_empty();
+                break;
+            }
+
+            // Evaluate sufficiency
+            let eval_result = evaluate(&query, &state.all_evidence, &llm).await?;
+            orch_llm_calls += 1;
+
+            if eval_result.sufficient {
+                eval_sufficient = true;
+                info!(
+                    evidence = state.all_evidence.len(),
+                    iteration, "Evidence sufficient — exiting supervisor loop"
+                );
+                break;
+            }
+
+            // Insufficient — replan
+            info!(
+                evidence = state.all_evidence.len(),
+                missing = eval_result.missing_info.len(),
+                iteration,
+                "Evidence insufficient — replanning"
+            );
+
+            let doc_cards_text = orch_tools::ls_docs(ws).feedback;
+            let replan_result = replan(
+                &query,
+                &eval_result.missing_info,
+                &state.all_evidence,
+                &state.dispatched,
+                ws.doc_count(),
+                &doc_cards_text,
+                &llm,
+            )
+            .await?;
+            orch_llm_calls += 1;
+
+            if replan_result.dispatches.is_empty() {
+                info!("Replan produced no new dispatches — exiting supervisor loop");
+                break;
+            }
+
+            current_dispatches = replan_result.dispatches;
+            iteration += 1;
+        }
+
+        // Derive confidence from supervisor loop outcome:
+        // - LLM evaluated sufficient on first try → high confidence
+        // - Needed replan rounds → lower confidence
+        // - No evaluation ran (skip_analysis / no evidence) → moderate
+        let confidence =
+            compute_confidence(eval_sufficient, iteration, state.all_evidence.is_empty());
+
+        // --- Phase 3: Finalize — rerank + synthesize ---
+        if state.all_evidence.is_empty() {
+            emitter.emit_orchestrator_completed(0, orch_llm_calls, 0);
+            return Ok(state.into_output(String::new()));
+        }
+
+        let multi_doc = ws.doc_count() > 1;
+        finalize_output(
+            &query,
+            &state,
+            &emitter,
+            orch_llm_calls,
+            multi_doc,
+            query_plan.intent,
+            confidence,
+        )
+        .await
+    }
+}
+
+/// Compute confidence from LLM evaluate() outcome.
+fn compute_confidence(eval_sufficient: bool, replan_rounds: u32, no_evidence: bool) -> f32 {
+    if no_evidence {
+        return 0.0;
+    }
+    if eval_sufficient {
+        // LLM said sufficient: first round = 0.95, each replan round drops 0.15
+        (0.95 - replan_rounds as f32 * 0.15).max(0.5)
+    } else {
+        // LLM never said sufficient (budget exhausted or no more docs)
+        (0.4 - replan_rounds as f32 * 0.1).max(0.1)
+    }
+}
+
+/// Rerank evidence and emit completion events.
+pub async fn finalize_output(
+    query: &str,
+    state: &OrchestratorState,
+    emitter: &EventEmitter,
+    orch_llm_calls: u32,
+    multi_doc: bool,
+    intent: crate::query::QueryIntent,
+    confidence: f32,
+) -> crate::error::Result<Output> {
+    let rerank_result =
+        crate::rerank::process(query, &state.all_evidence, multi_doc, intent, confidence).await?;
+
+    let total_llm_calls = orch_llm_calls + rerank_result.llm_calls;
+    if !rerank_result.answer.is_empty() {
+        emitter.emit_answer_completed(rerank_result.answer.len(), "medium");
+    }
+
+    let mut output = state.clone_results_into_output(rerank_result.answer);
+    output.metrics.llm_calls += total_llm_calls;
+    output.confidence = rerank_result.confidence;
+
+    emitter.emit_orchestrator_completed(
+        output.evidence.len(),
+        output.metrics.llm_calls,
+        output.metrics.rounds_used,
+    );
+
+    info!(
+        evidence = output.evidence.len(),
+        llm_calls = output.metrics.llm_calls,
+        confidence = output.confidence,
+        "Orchestrator complete"
+    );
+
+    Ok(output)
+}
diff --git a/rust/src/agent/orchestrator/replan.rs b/rust/src/agent/orchestrator/replan.rs
new file mode 100644
index 00000000..57d5e248
--- /dev/null
+++ b/rust/src/agent/orchestrator/replan.rs
@@ -0,0 +1,249 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! Replan: LLM-driven re-dispatch after insufficient evidence.
+//!
+//! After evaluate() returns insufficient, the Orchestrator replans:
+//! the LLM analyzes what's missing and decides which documents to query next.
+//! This replaces the old heuristic supplement logic.
+
+use tracing::info;
+
+use crate::error::Error;
+use crate::llm::LlmClient;
+use crate::scoring::bm25::extract_keywords;
+
+use super::super::config::Evidence;
+use super::super::prompts::DispatchEntry;
+
+/// Result of the replan phase.
+pub struct ReplanResult {
+    /// New dispatch targets for the next round.
+    pub dispatches: Vec<DispatchEntry>,
+    /// The LLM's reasoning about what was missing.
+    pub reasoning: String,
+}
+
+/// Replan dispatch targets based on missing information.
+///
+/// The LLM reviews:
+/// - The original query
+/// - What evidence has been collected so far
+/// - What information is still missing
+/// - Available documents that haven't been dispatched yet
+///
+/// Returns new dispatch targets. LLM errors propagate.
+pub async fn replan(
+    query: &str,
+    missing_info: &str,
+    collected_evidence: &[Evidence],
+    dispatched_indices: &[usize],
+    total_docs: usize,
+    doc_cards_text: &str,
+    llm: &LlmClient,
+) -> crate::error::Result<ReplanResult> {
+    let evidence_summary = format_evidence_context(collected_evidence);
+    let keywords = extract_keywords(query);
+    let find_text = if keywords.is_empty() {
+        String::new()
+    } else {
+        format!("\nExtracted keywords: {}", keywords.join(", "))
+    };
+
+    let (system, user) = replan_prompt(
+        query,
+        missing_info,
+        &evidence_summary,
+        dispatched_indices,
+        doc_cards_text,
+        &find_text,
+    );
+
+    let response = llm
+        .complete(&system, &user)
+        .await
+        .map_err(|e| Error::LlmReasoning {
+            stage: "orchestrator/replan".to_string(),
+            detail: format!("Replan LLM call failed: {e}"),
+        })?;
+
+    info!(
+        response_len = response.len(),
+        "Replan LLM response received"
+    );
+
+    let dispatches = parse_replan_response(&response, total_docs, dispatched_indices);
+    let reasoning = response.lines().take(3).collect::<Vec<_>>().join(" ");
+
+    info!(
+        new_dispatches = dispatches.len(),
+        "Replan produced new dispatch targets"
+    );
+
+    Ok(ReplanResult {
+        dispatches,
+        reasoning,
+    })
+}
+
+/// Format collected evidence for the replan prompt.
+fn format_evidence_context(evidence: &[Evidence]) -> String {
+    if evidence.is_empty() {
+        return "(no evidence collected)".to_string();
+    }
+    evidence
+        .iter()
+        .map(|e| {
+            let doc = e.doc_name.as_deref().unwrap_or("unknown");
+            format!(
+                "- [{}] (from {}) {} chars",
+                e.node_title,
+                doc,
+                e.content.len()
+            )
+        })
+        .collect::<Vec<_>>()
+        .join("\n")
+}
+
+/// Build the replan prompt.
+fn replan_prompt(
+    query: &str,
+    missing_info: &str,
+    evidence_summary: &str,
+    dispatched: &[usize],
+    doc_cards: &str,
+    keywords_text: &str,
+) -> (String, String) {
+    let dispatched_set: Vec<String> = dispatched
+        .iter()
+        .map(|&i| format!("doc {}", i + 1))
+        .collect();
+    let dispatched_text = if dispatched_set.is_empty() {
+        "None".to_string()
+    } else {
+        dispatched_set.join(", ")
+    };
+
+    let system = "You are a multi-document retrieval coordinator. The first round of evidence \
+         collection was insufficient to fully answer the query. Review what was collected, \
+         what's missing, and decide which additional documents to query.
+
+Output format — for each additional document to query, output a block:
+- doc: <number>
+  reason: <why this document may have the missing information>
+  task: <what specific information to find>
+
+Only include documents not yet dispatched. If no additional documents are likely to help, \
+respond with: NO_ADDITIONAL_DOCS"
+        .to_string();
+
+    let user = format!(
+        "Original question: {query}
+
+Missing information: {missing_info}
+
+Collected evidence so far:
+{evidence_summary}
+
+Already dispatched documents: {dispatched_text}
+
+Available documents (all):
+{doc_cards}{keywords_text}
+
+Additional documents to query:"
+    );
+
+    (system, user)
+}
+
+/// Parse the replan response into dispatch entries.
+fn parse_replan_response(
+    response: &str,
+    total_docs: usize,
+    dispatched: &[usize],
+) -> Vec<DispatchEntry> {
+    let trimmed = response.trim();
+
+    if trimmed.starts_with("NO_ADDITIONAL_DOCS") {
+        return Vec::new();
+    }
+
+    let mut entries = Vec::new();
+    let mut current_doc_idx: Option<usize> = None;
+    let mut current_reason = String::new();
+    let mut current_task = String::new();
+
+    for line in trimmed.lines() {
+        let line = line.trim();
+
+        if let Some(rest) = line.strip_prefix("- doc:") {
+            // Flush previous
+            if let Some(idx) = current_doc_idx.take() {
+                entries.push(DispatchEntry {
+                    doc_idx: idx,
+                    reason: std::mem::take(&mut current_reason),
+                    task: std::mem::take(&mut current_task),
+                });
+            }
+
+            let doc_num: usize = rest.trim().trim_end_matches(',').parse().unwrap_or(0);
+            if doc_num > 0 && doc_num <= total_docs {
+                let idx = doc_num - 1;
+                // Only include if not already dispatched
+                if !dispatched.contains(&idx) {
+                    current_doc_idx = Some(idx);
+                }
+            }
+        } else if let Some(rest) = line.strip_prefix("reason:") {
+            current_reason = rest.trim().to_string();
+        } else if let Some(rest) = line.strip_prefix("task:") {
+            current_task = rest.trim().to_string();
+        }
+    }
+
+    // Flush last
+    if let Some(idx) = current_doc_idx {
+        entries.push(DispatchEntry {
+            doc_idx: idx,
+            reason: current_reason,
+            task: current_task,
+        });
+    }
+
+    entries
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn test_parse_replan_response_basic() {
+        let response = "\
+- doc: 3
+  reason: May contain the missing financial data
+  task: Find Q4 revenue figures";
+        let entries = parse_replan_response(response, 5, &[0, 1]);
+        assert_eq!(entries.len(), 1);
+        assert_eq!(entries[0].doc_idx, 2);
+        assert_eq!(entries[0].task, "Find Q4 revenue figures");
+    }
+
+    #[test]
+    fn test_parse_replan_response_already_dispatched() {
+        let response = "\
+- doc: 1
+  reason: Already queried
+  task: test";
+        let entries = parse_replan_response(response, 3, &[0]);
+        assert!(entries.is_empty()); // doc 1 (idx 0) already dispatched
+    }
+
+    #[test]
+    fn test_parse_replan_response_no_additional() {
+        let response = "NO_ADDITIONAL_DOCS";
+        let entries = parse_replan_response(response, 3, &[0, 1]);
+        assert!(entries.is_empty());
+    }
+}
diff --git a/rust/src/agent/prompts.rs b/rust/src/agent/prompts.rs
new file mode 100644
index 00000000..42699cc6
--- /dev/null
+++ b/rust/src/agent/prompts.rs
@@ -0,0 +1,558 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! Prompt templates for the retrieval agent.
+//!
+//! Prompts for agent-level operations:
+//! 1. `worker_navigation` — Worker nav loop, every round
+//! 2. `orchestrator_analysis` — Orchestrator Phase 1
+//! 3. `worker_dispatch` — Worker first round (when dispatched by Orchestrator)
+//! 4. `check_sufficiency` — evidence sufficiency evaluation
+//!
+//! Post-processing prompts (answer synthesis, multi-doc fusion) have been
+//! moved to `rerank/synthesis.rs` and `rerank/fusion.rs`.
+
+// ---------------------------------------------------------------------------
+// Prompt 1: Worker Navigation (used every round in the nav loop)
+// ---------------------------------------------------------------------------
+
+/// Parameters for the sub-agent navigation prompt.
+pub struct NavigationParams<'a> {
+    pub query: &'a str,
+    /// Sub-task description (None when Worker is called directly).
+    pub task: Option<&'a str>,
+    /// Current breadcrumb path.
+    pub breadcrumb: &'a str,
+    /// Summary of collected evidence.
+    pub evidence_summary: &'a str,
+    /// Description of what's still missing (empty string if nothing).
+    pub missing_info: &'a str,
+    /// Feedback from the last command execution.
+    pub last_feedback: &'a str,
+    /// Remaining rounds.
+    pub remaining: u32,
+    /// Maximum rounds.
+    pub max_rounds: u32,
+    /// ReAct history of recent rounds.
+    pub history: &'a str,
+    /// Titles of already-visited nodes.
+    pub visited_titles: &'a str,
+    /// Navigation plan from bird's-eye analysis (empty if no plan).
+    pub plan: &'a str,
+    /// Query intent context from QueryPlan (e.g. "factual — find specific answer").
+    /// Empty string if not available.
+    pub intent_context: &'a str,
+    /// Formatted keyword index matches (empty if none).
+    pub keyword_hints: &'a str,
+}
+
+pub fn worker_navigation(params: &NavigationParams) -> (String, String) {
+    let query = params.query;
+    let breadcrumb = params.breadcrumb;
+    let evidence_summary = params.evidence_summary;
+    let remaining = params.remaining;
+    let max_rounds = params.max_rounds;
+
+    let task_section = match params.task {
+        Some(task) => format!(
+            "\nYour specific task: {}\n(This is a sub-task for the original query.)",
+            task
+        ),
+        None => String::new(),
+    };
+
+    let missing_section = if params.missing_info.is_empty() {
+        String::new()
+    } else {
+        format!("\nPotentially missing info: {}", params.missing_info)
+    };
+
+    let last_feedback_section = if params.last_feedback.is_empty() {
+        String::new()
+    } else {
+        format!("\nLast command result:\n{}\n", params.last_feedback)
+    };
+
+    let history_section = if params.history == "(no history yet)" {
+        String::new()
+    } else {
+        format!("\nPrevious rounds:\n{}\n", params.history)
+    };
+
+    let visited_section = if params.visited_titles == "(none)" {
+        String::new()
+    } else {
+        format!(
+            "\nAlready visited (do not re-read these): {}",
+            params.visited_titles
+        )
+    };
+
+    let plan_section = if params.plan.is_empty() {
+        String::new()
+    } else {
+        format!(
+            "\nNavigation plan (follow this as guidance, adapt if needed):\n{}\n",
+            params.plan
+        )
+    };
+
+    let keyword_section = if params.keyword_hints.is_empty() {
+        String::new()
+    } else {
+        format!("\n{}", params.keyword_hints)
+    };
+
+    let intent_section = if params.intent_context.is_empty() {
+        String::new()
+    } else {
+        format!("\nQuery context: {}", params.intent_context)
+    };
+
+    let system = format!(
+        "You are a document navigation assistant. You navigate inside a document to find \
+         information that answers the user's question.
+
+Available commands:
+- ls                List children at current position (with summaries and leaf counts)
+- cd <name>         Enter a child node (supports absolute paths like /root/Section)
+- cd ..             Go back to parent node
+- cat <name>        Read a child node's content (automatically collected as evidence)
+- cat               Read the current node's content (useful at leaf nodes)
+- head <name>       Preview first 20 lines of a node (does NOT collect evidence)
+- find <keyword>    Search for a keyword in the document index
+- findtree <pattern> Search for nodes by title pattern (case-insensitive)
+- grep <pattern>    Regex search across all content in current subtree
+- wc <name>         Show content size (lines, words, chars)
+- pwd               Show current navigation path
+- check             Evaluate if collected evidence is sufficient
+- done              End navigation
+
+SEARCH STRATEGY (important — follow this priority order):
+- When keyword matches are shown below, use find with the EXACT keyword from the list (single word, \
+not multi-word phrases). Example: if hint shows keyword 'performance' pointing to Performance section, \
+use find performance, NOT find \"performance guide\".
+- Use ls when you have no keyword hints or need to discover the structure of an unknown section.
+- Use findtree when you know a section title pattern but not the exact name.
+
+Rules:
+- Output exactly ONE command per response, nothing else.
+- Content from cat is automatically saved as evidence — don't re-cat the same node.
+- Do not cat or cd into nodes you have already visited.
+- If the current branch has nothing relevant, use cd .. to go back.
+- If you're at the root and no children seem relevant, use done.
+
+STOPPING RULES (critical — follow these strictly):
+- After cat collects evidence, immediately check: does the collected text contain information \
+  that answers or relates to the user's question? If YES, output done. Do NOT continue searching.
+- Do NOT run grep after cat — cat already collected the full content. grep is for locating \
+  content BEFORE cat, not after.
+- If ls shows '(no navigation data)' or no children, you are at a leaf node. Use cat to read it \
+  or cd .. to go back. Do NOT ls again.
+- When remaining rounds are low (≤2), prefer done over exploring new branches."
+    );
+
+    let user = format!(
+        "{last_feedback_section}\
+User question: {query}{task_section}{intent_section}
+
+Current position: /{breadcrumb}
+Collected evidence:
+{evidence_summary}{missing_section}{keyword_section}{visited_section}{plan_section}
+{history_section}
+Remaining rounds: {remaining}/{max_rounds}
+
+Command:"
+    );
+
+    (system, user)
+}
+
+// ---------------------------------------------------------------------------
+// Prompt 2: Orchestrator Analysis (multi-doc Phase 1)
+// ---------------------------------------------------------------------------
+
+/// Parameters for the orchestrator analysis prompt.
+pub struct OrchestratorAnalysisParams<'a> {
+    pub query: &'a str,
+    /// Formatted DocCard listing from ls_docs.
+    pub doc_cards: &'a str,
+    /// Formatted cross-document search results.
+    pub find_results: &'a str,
+    /// Query understanding context (intent, concepts, strategy, complexity).
+    pub intent_context: &'a str,
+}
+
+pub fn orchestrator_analysis(params: &OrchestratorAnalysisParams) -> (String, String) {
+    let doc_cards = params.doc_cards;
+    let find_results = params.find_results;
+    let query = params.query;
+    let intent_context = params.intent_context;
+
+    let system =
+        "You are a multi-document retrieval coordinator. Analyze the user's question, \
+         review the available documents, and decide which documents to search and what to look for in each.
+
+Output format — for each relevant document, output a block:
+- doc: <number>
+  reason: <why this document is relevant>
+  task: <what specific information to find in this document>
+
+Only include documents that are likely to contain relevant information.
+If the cross-document search results already fully answer the question, respond with just: ALREADY_ANSWERED".to_string();
+
+    let user = format!(
+        "Available documents:
+{doc_cards}
+
+Cross-document search results:
+{find_results}
+{intent_context}
+
+User question: {query}
+
+Relevant documents:"
+    );
+
+    (system, user)
+}
+
+// ---------------------------------------------------------------------------
+// Prompt 3: Worker Dispatch (first-round prompt when Orchestrator dispatches)
+// ---------------------------------------------------------------------------
+
+/// Parameters for the dispatch prompt.
+pub struct WorkerDispatchParams<'a> {
+    pub original_query: &'a str,
+    pub task: &'a str,
+    pub doc_name: &'a str,
+    pub breadcrumb: &'a str,
+}
+
+pub fn worker_dispatch(params: &WorkerDispatchParams) -> (String, String) {
+    let doc_name = params.doc_name;
+    let original_query = params.original_query;
+    let task = params.task;
+    let breadcrumb = params.breadcrumb;
+
+    let system = format!(
+        "You are a document navigation assistant. You are searching inside the document \
+         \"{doc_name}\" for specific information.
+
+Available commands: ls, cd <name>, cd .., cat, cat <name>, head <name>, find <keyword>, \
+findtree <pattern>, grep <regex>, wc <name>, pwd, check, done
+
+SEARCH STRATEGY:
+- Prefer find <keyword> to jump directly to relevant sections over manual ls→cd exploration. \
+Use single-word keywords, not multi-word phrases.
+- Use ls when you need to discover the structure of an unknown section.
+- Use findtree when you know a section title pattern but not the exact name.
+
+Rules:
+- Output exactly ONE command per response.
+- Content from cat is automatically saved as evidence.
+- After cat collects evidence, if it relates to your task, use done immediately.
+- Do NOT grep after cat — cat already collected the full content.
+- If ls shows no children, use cat to read the current node or cd .. to go back.
+- When evidence is sufficient, use done."
+    );
+
+    let user = format!(
+        "Original question: {original_query}
+Your task: {task}
+Document: {doc_name}
+Current position: /{breadcrumb}
+
+Command:"
+    );
+
+    (system, user)
+}
+
+// ---------------------------------------------------------------------------
+// Prompt 4: Check (evidence sufficiency evaluation)
+// ---------------------------------------------------------------------------
+
+/// Build the check prompt for LLM-based sufficiency evaluation.
+pub fn check_sufficiency(query: &str, evidence_summary: &str) -> (String, String) {
+    let system = "You evaluate whether collected evidence contains information that can answer or \
+         relate to the user's question. The evidence is raw document text — it does not need to be \
+         a complete or perfect answer. If the evidence mentions or addresses the key concepts from \
+         the question, it is sufficient.
+
+Respond with ONLY 'SUFFICIENT' or 'INSUFFICIENT' followed by a one-line reason.
+
+Guidelines:
+- If the evidence text contains any information directly related to the question's key terms, \
+respond SUFFICIENT.
+- If the evidence is completely unrelated or empty, respond INSUFFICIENT.
+- Default to SUFFICIENT unless the evidence is clearly irrelevant."
+        .to_string();
+
+    let user = format!(
+        "Question: {query}\n\n\
+         Collected evidence:\n\
+         {evidence_summary}\n\n\
+         Is this sufficient?"
+    );
+
+    (system, user)
+}
+
+// ---------------------------------------------------------------------------
+// Dispatch plan parsing
+// ---------------------------------------------------------------------------
+
+/// A single dispatch entry parsed from orchestrator analysis.
+#[derive(Debug, Clone)]
+pub struct DispatchEntry {
+    /// Document index (0-based).
+    pub doc_idx: usize,
+    /// Why this document was selected.
+    pub reason: String,
+    /// What to search for in this document.
+    pub task: String,
+}
+
+/// Parse the LLM output from orchestrator analysis into dispatch entries.
+///
+/// Returns `None` if the response is "ALREADY_ANSWERED".
+/// Returns empty vec if no valid dispatch entries found.
+pub fn parse_dispatch_plan(llm_output: &str, total_docs: usize) -> Option<Vec<DispatchEntry>> {
+    let trimmed = llm_output.trim();
+
+    if trimmed.starts_with("ALREADY_ANSWERED") {
+        return None;
+    }
+
+    let mut entries = Vec::new();
+    let mut current_doc_idx: Option<usize> = None;
+    let mut current_reason = String::new();
+    let mut current_task = String::new();
+
+    for line in trimmed.lines() {
+        let line = line.trim();
+
+        if let Some(rest) = line.strip_prefix("- doc:") {
+            // Flush previous entry
+            if let Some(idx) = current_doc_idx.take() {
+                entries.push(DispatchEntry {
+                    doc_idx: idx,
+                    reason: std::mem::take(&mut current_reason),
+                    task: std::mem::take(&mut current_task),
+                });
+            }
+
+            let doc_num: usize = rest.trim().trim_end_matches(',').parse().unwrap_or(0);
+            if doc_num > 0 && doc_num <= total_docs {
+                current_doc_idx = Some(doc_num - 1); // Convert to 0-based
+            } else if doc_num > 0 {
+                tracing::warn!(
+                    requested_doc = doc_num,
+                    total_docs,
+                    "Dispatch plan references out-of-range document, skipping"
+                );
+            }
+        } else if let Some(rest) = line.strip_prefix("reason:") {
+            current_reason = rest.trim().to_string();
+        } else if let Some(rest) = line.strip_prefix("task:") {
+            current_task = rest.trim().to_string();
+        }
+    }
+
+    // Flush last entry
+    if let Some(idx) = current_doc_idx {
+        entries.push(DispatchEntry {
+            doc_idx: idx,
+            reason: current_reason,
+            task: current_task,
+        });
+    }
+
+    Some(entries)
+}
+
+/// Parse the sufficiency check response.
+pub fn parse_sufficiency_response(response: &str) -> bool {
+    let upper = response.trim().to_uppercase();
+    upper.starts_with("SUFFICIENT") && !upper.starts_with("INSUFFICIENT")
+}
+
+// ---------------------------------------------------------------------------
+// Tests
+// ---------------------------------------------------------------------------
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn test_worker_navigation_without_task() {
+        let params = NavigationParams {
+            query: "What is the revenue?",
+            task: None,
+            breadcrumb: "root/Financial Statements",
+            evidence_summary: "- [Revenue] 200 chars",
+            missing_info: "2024 comparison",
+            last_feedback: "[1] Q1 Report — Q1 data (5 leaves)\n[2] Q2 Report — Q2 data (5 leaves)",
+            remaining: 5,
+            max_rounds: 8,
+            history: "(no history yet)",
+            visited_titles: "(none)",
+            plan: "",
+            intent_context: "",
+            keyword_hints: "",
+        };
+
+        let (system, user) = worker_navigation(&params);
+        assert!(system.contains("document navigation"));
+        assert!(system.contains("SEARCH STRATEGY"));
+        assert!(user.contains("What is the revenue?"));
+        assert!(user.contains("root/Financial Statements"));
+        assert!(user.contains("200 chars"));
+        assert!(user.contains("2024 comparison"));
+        assert!(user.contains("5/8"));
+        assert!(!user.contains("sub-task"));
+    }
+
+    #[test]
+    fn test_worker_navigation_with_keyword_hints() {
+        let params = NavigationParams {
+            query: "What is the revenue?",
+            task: None,
+            breadcrumb: "root",
+            evidence_summary: "(none)",
+            missing_info: "",
+            last_feedback: "",
+            remaining: 8,
+            max_rounds: 8,
+            history: "(no history yet)",
+            visited_titles: "(none)",
+            plan: "",
+            intent_context: "",
+            keyword_hints: "Keyword matches (use find <keyword> to jump directly):\n  - 'revenue' → root > Revenue (weight 0.85)\n",
+        };
+
+        let (_, user) = worker_navigation(&params);
+        assert!(user.contains("revenue"));
+        assert!(user.contains("find"));
+    }
+
+    #[test]
+    fn test_worker_navigation_with_task() {
+        let params = NavigationParams {
+            query: "Compare 2024 and 2023 revenue",
+            task: Some("Find revenue data in this document"),
+            breadcrumb: "root",
+            evidence_summary: "(none)",
+            missing_info: "",
+            last_feedback: "",
+            remaining: 8,
+            max_rounds: 8,
+            history: "(no history yet)",
+            visited_titles: "(none)",
+            plan: "",
+            intent_context: "analytical — comparative analysis",
+            keyword_hints: "",
+        };
+
+        let (_, user) = worker_navigation(&params);
+        assert!(user.contains("Find revenue data"));
+        assert!(user.contains("sub-task"));
+    }
+
+    #[test]
+    fn test_orchestrator_analysis() {
+        let params = OrchestratorAnalysisParams {
+            query: "Compare 2024 and 2023 revenue",
+            doc_cards: "[1] 2024 Report\n[2] 2023 Report",
+            find_results: "doc 1: keyword 'revenue' matched",
+            intent_context: "\nQuery intent: analytical (complexity: moderate)",
+        };
+
+        let (system, user) = orchestrator_analysis(&params);
+        assert!(system.contains("multi-document"));
+        assert!(user.contains("2024 Report"));
+        assert!(user.contains("revenue"));
+        assert!(user.contains("analytical"));
+    }
+
+    #[test]
+    fn test_worker_dispatch() {
+        let params = WorkerDispatchParams {
+            original_query: "Compare revenue",
+            task: "Find 2024 revenue figures",
+            doc_name: "2024 Annual Report",
+            breadcrumb: "root",
+        };
+
+        let (system, user) = worker_dispatch(&params);
+        assert!(system.contains("2024 Annual Report"));
+        assert!(user.contains("Compare revenue"));
+        assert!(user.contains("Find 2024 revenue"));
+    }
+
+    #[test]
+    fn test_check_sufficiency() {
+        let (system, user) = check_sufficiency("What is X?", "- [A] some data");
+        assert!(system.contains("SUFFICIENT"));
+        assert!(user.contains("What is X?"));
+    }
+
+    // --- Dispatch plan parsing ---
+
+    #[test]
+    fn test_parse_dispatch_plan_basic() {
+        let output = "\
+- doc: 1
+  reason: Contains revenue data
+  task: Find 2024 revenue figures
+- doc: 2
+  reason: Contains comparison data
+  task: Find 2023 revenue figures";
+
+        let entries = parse_dispatch_plan(output, 3).unwrap();
+        assert_eq!(entries.len(), 2);
+        assert_eq!(entries[0].doc_idx, 0);
+        assert_eq!(entries[0].task, "Find 2024 revenue figures");
+        assert_eq!(entries[1].doc_idx, 1);
+        assert_eq!(entries[1].reason, "Contains comparison data");
+    }
+
+    #[test]
+    fn test_parse_dispatch_plan_already_answered() {
+        let output = "ALREADY_ANSWERED";
+        assert!(parse_dispatch_plan(output, 3).is_none());
+    }
+
+    #[test]
+    fn test_parse_dispatch_plan_empty() {
+        let entries = parse_dispatch_plan("no relevant documents", 3).unwrap();
+        assert!(entries.is_empty());
+    }
+
+    #[test]
+    fn test_parse_dispatch_plan_out_of_range() {
+        let output = "\
+- doc: 99
+  reason: test
+  task: test";
+
+        let entries = parse_dispatch_plan(output, 3).unwrap();
+        assert!(entries.is_empty()); // doc 99 is out of range, skipped
+    }
+
+    // --- Sufficiency parsing ---
+
+    #[test]
+    fn test_parse_sufficiency_sufficient() {
+        assert!(parse_sufficiency_response("SUFFICIENT - we have all data"));
+        assert!(parse_sufficiency_response("Sufficient"));
+    }
+
+    #[test]
+    fn test_parse_sufficiency_insufficient() {
+        assert!(!parse_sufficiency_response("INSUFFICIENT - missing data"));
+        assert!(!parse_sufficiency_response("Insufficient"));
+    }
+}
diff --git a/rust/src/agent/state.rs b/rust/src/agent/state.rs
new file mode 100644
index 00000000..a4c1e674
--- /dev/null
+++ b/rust/src/agent/state.rs
@@ -0,0 +1,289 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! Agent state types — mutable state that lives within a single retrieve() call.
+
+use std::collections::HashSet;
+
+use crate::document::NodeId;
+
+use super::config::{Evidence, Output};
+
+// ---------------------------------------------------------------------------
+// Worker state
+// ---------------------------------------------------------------------------
+
+/// Mutable navigation state for a Worker loop.
+///
+/// Created at loop start, destroyed at loop end. Never escapes the call.
+pub struct WorkerState {
+    /// Navigation breadcrumb (path from root to current node).
+    pub breadcrumb: Vec<String>,
+    /// Current position in the document tree.
+    pub current_node: NodeId,
+    /// Collected evidence so far.
+    pub evidence: Vec<Evidence>,
+    /// Nodes already visited (prevents redundant reads).
+    pub visited: HashSet<NodeId>,
+    /// Remaining navigation rounds.
+    pub remaining: u32,
+    /// Maximum rounds (for display in prompts).
+    pub max_rounds: u32,
+    /// Feedback from the last executed command (injected into next prompt).
+    pub last_feedback: String,
+    /// Structured description of what information is still missing.
+    /// Updated after `check` returns "insufficient".
+    pub missing_info: String,
+    /// ReAct history: summary of each round's command + result.
+    /// Keeps last N entries for prompt injection.
+    pub history: Vec<String>,
+    /// Navigation plan generated after bird's-eye view (Phase 1.5).
+    /// Injected into subsequent prompts as guidance (non-binding).
+    pub plan: String,
+    /// Number of times `check` has been called.
+    pub check_count: u32,
+    /// Whether a navigation plan was generated in Phase 1.5.
+    pub plan_generated: bool,
+}
+
+/// Maximum number of history entries to keep for prompt injection.
+const MAX_HISTORY_ENTRIES: usize = 6;
+
+/// Maximum characters for `last_feedback` before truncation.
+/// Prevents large cat/grep outputs from bloating subsequent prompts.
+const MAX_FEEDBACK_CHARS: usize = 500;
+
+impl WorkerState {
+    /// Create a new state starting at the given root node.
+    pub fn new(root: NodeId, max_rounds: u32) -> Self {
+        Self {
+            breadcrumb: vec!["root".to_string()],
+            current_node: root,
+            evidence: Vec::new(),
+            visited: HashSet::new(),
+            remaining: max_rounds,
+            max_rounds,
+            last_feedback: String::new(),
+            missing_info: String::new(),
+            history: Vec::new(),
+            plan: String::new(),
+            check_count: 0,
+            plan_generated: false,
+        }
+    }
+
+    /// Consume the remaining rounds.
+    pub fn dec_round(&mut self) {
+        if self.remaining > 0 {
+            self.remaining -= 1;
+        }
+    }
+
+    /// Set feedback with automatic truncation to prevent prompt bloat.
+    pub fn set_feedback(&mut self, feedback: String) {
+        if feedback.len() <= MAX_FEEDBACK_CHARS {
+            self.last_feedback = feedback;
+        } else {
+            // Find a clean truncation point (line boundary if possible)
+            let truncated = &feedback[..MAX_FEEDBACK_CHARS];
+            let end = truncated.rfind('\n').unwrap_or(MAX_FEEDBACK_CHARS);
+            self.last_feedback = format!(
+                "{}...\n(truncated, {} chars total)",
+                &feedback[..end.min(MAX_FEEDBACK_CHARS)],
+                feedback.len()
+            );
+        }
+    }
+
+    /// Navigate into a child node.
+    pub fn cd(&mut self, node: NodeId, title: &str) {
+        self.breadcrumb.push(title.to_string());
+        self.current_node = node;
+    }
+
+    /// Navigate back to parent.
+    ///
+    /// Returns `false` if already at root.
+    pub fn cd_up(&mut self, parent: NodeId) -> bool {
+        if self.breadcrumb.len() <= 1 {
+            return false;
+        }
+        self.breadcrumb.pop();
+        self.current_node = parent;
+        true
+    }
+
+    /// Add a piece of evidence.
+    pub fn add_evidence(&mut self, evidence: Evidence) {
+        self.evidence.push(evidence);
+    }
+
+    /// Push a history entry (command + result summary).
+    /// Keeps only the last `MAX_HISTORY_ENTRIES` entries.
+    pub fn push_history(&mut self, entry: String) {
+        if self.history.len() >= MAX_HISTORY_ENTRIES {
+            self.history.remove(0);
+        }
+        self.history.push(entry);
+    }
+
+    /// Format history as text for prompt injection.
+    pub fn history_text(&self) -> String {
+        if self.history.is_empty() {
+            return "(no history yet)".to_string();
+        }
+        self.history
+            .iter()
+            .enumerate()
+            .map(|(i, h)| format!("{}. {}", i + 1, h))
+            .collect::<Vec<_>>()
+            .join("\n")
+    }
+
+    /// Format the breadcrumb as a path string (e.g., "root/Chapter 1/Section 1.2").
+    pub fn path_str(&self) -> String {
+        self.breadcrumb.join("/")
+    }
+
+    /// Summary of collected evidence for prompts.
+    pub fn evidence_summary(&self) -> String {
+        if self.evidence.is_empty() {
+            return "(none)".to_string();
+        }
+        self.evidence
+            .iter()
+            .map(|e| format!("- [{}] {} chars", e.node_title, e.content.len()))
+            .collect::<Vec<_>>()
+            .join("\n")
+    }
+
+    /// Convert this state into a WorkerOutput (consuming the state), with budget flag.
+    /// Worker returns evidence only — no answer synthesis.
+    pub fn into_worker_output(
+        self,
+        llm_calls: u32,
+        budget_exhausted: bool,
+        doc_name: &str,
+    ) -> super::config::WorkerOutput {
+        let evidence_chars: usize = self.evidence.iter().map(|e| e.content.len()).sum();
+        super::config::WorkerOutput {
+            evidence: self.evidence,
+            metrics: super::config::WorkerMetrics {
+                rounds_used: self.max_rounds.saturating_sub(self.remaining),
+                llm_calls,
+                nodes_visited: self.visited.len(),
+                budget_exhausted,
+                plan_generated: self.plan_generated,
+                check_count: self.check_count,
+                evidence_chars,
+            },
+            doc_name: doc_name.to_string(),
+        }
+    }
+}
+
+// ---------------------------------------------------------------------------
+// Orchestrator state
+// ---------------------------------------------------------------------------
+
+/// Mutable state for the Orchestrator loop.
+///
+/// Tracks which documents have been dispatched and collects Worker results.
+pub struct OrchestratorState {
+    /// Indices of documents that have been dispatched.
+    pub dispatched: Vec<usize>,
+    /// Results returned by dispatched Workers.
+    pub sub_results: Vec<Output>,
+    /// All evidence merged from sub-results.
+    pub all_evidence: Vec<Evidence>,
+    /// Whether the analysis phase is complete.
+    pub analyze_done: bool,
+    /// Total LLM calls across orchestrator + sub-agents.
+    pub total_llm_calls: u32,
+}
+
+impl OrchestratorState {
+    /// Create a new orchestrator state.
+    pub fn new() -> Self {
+        Self {
+            dispatched: Vec::new(),
+            sub_results: Vec::new(),
+            all_evidence: Vec::new(),
+            analyze_done: false,
+            total_llm_calls: 0,
+        }
+    }
+
+    /// Record a dispatch to document at the given index.
+    pub fn record_dispatch(&mut self, doc_idx: usize) {
+        if !self.dispatched.contains(&doc_idx) {
+            self.dispatched.push(doc_idx);
+        }
+    }
+
+    /// Collect a Worker result, converting WorkerOutput to Output for internal tracking.
+    pub fn collect_result(&mut self, doc_idx: usize, result: super::config::WorkerOutput) {
+        self.total_llm_calls += result.metrics.llm_calls;
+        self.all_evidence.extend(result.evidence.iter().cloned());
+        self.sub_results.push(result.into());
+        self.record_dispatch(doc_idx);
+    }
+
+    /// Clone results into an Output without consuming self.
+    ///
+    /// Used by `finalize_output` which needs to borrow state for rerank.
+    pub fn clone_results_into_output(&self, answer: String) -> Output {
+        Output {
+            answer,
+            evidence: self.all_evidence.clone(),
+            metrics: super::config::Metrics {
+                llm_calls: self.total_llm_calls,
+                nodes_visited: self
+                    .sub_results
+                    .iter()
+                    .map(|r| r.metrics.nodes_visited)
+                    .sum(),
+                plan_generated: self.sub_results.iter().any(|r| r.metrics.plan_generated),
+                check_count: self.sub_results.iter().map(|r| r.metrics.check_count).sum(),
+                evidence_chars: self
+                    .sub_results
+                    .iter()
+                    .map(|r| r.metrics.evidence_chars)
+                    .sum(),
+                ..Default::default()
+            },
+            confidence: 0.0,
+        }
+    }
+
+    /// Merge all sub-results into a single Output (consuming self).
+    pub fn into_output(self, answer: String) -> Output {
+        Output {
+            answer,
+            evidence: self.all_evidence,
+            metrics: super::config::Metrics {
+                llm_calls: self.total_llm_calls,
+                nodes_visited: self
+                    .sub_results
+                    .iter()
+                    .map(|r| r.metrics.nodes_visited)
+                    .sum(),
+                plan_generated: self.sub_results.iter().any(|r| r.metrics.plan_generated),
+                check_count: self.sub_results.iter().map(|r| r.metrics.check_count).sum(),
+                evidence_chars: self
+                    .sub_results
+                    .iter()
+                    .map(|r| r.metrics.evidence_chars)
+                    .sum(),
+                ..Default::default()
+            },
+            confidence: 0.0,
+        }
+    }
+}
+
+impl Default for OrchestratorState {
+    fn default() -> Self {
+        Self::new()
+    }
+}
diff --git a/rust/src/agent/tools/common.rs b/rust/src/agent/tools/common.rs
new file mode 100644
index 00000000..740510de
--- /dev/null
+++ b/rust/src/agent/tools/common.rs
@@ -0,0 +1,69 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! Common tools shared between Orchestrator and Worker (find, check, done).
+
+use super::ToolResult;
+
+/// Execute a `find` command — search for a keyword.
+///
+/// Returns formatted search results as feedback text.
+pub fn format_find_result(keyword: &str, hits: &[super::super::context::FindHit]) -> String {
+    if hits.is_empty() {
+        return format!("No results found for '{}'", keyword);
+    }
+
+    let mut output = format!("Results for '{}':\n", keyword);
+    for hit in hits {
+        for entry in &hit.entries {
+            output.push_str(&format!(
+                "  - node (depth {}, weight {:.2})\n",
+                entry.depth, entry.weight
+            ));
+        }
+    }
+    output
+}
+
+/// Execute a `check` command — evaluate evidence sufficiency.
+///
+/// Returns a formatted summary of current evidence for the LLM to evaluate.
+pub fn format_check_prompt(evidence_summary: &str, query: &str) -> String {
+    format!(
+        "Please evaluate whether the collected evidence is sufficient to answer the query.\n\n\
+         Query: {}\n\n\
+         Evidence:\n{}\n\n\
+         Is this sufficient? Answer YES or NO and briefly explain.",
+        query, evidence_summary
+    )
+}
+
+/// Execute a `done` command — signal loop termination.
+pub fn format_done() -> ToolResult {
+    ToolResult::done("Navigation complete.")
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn test_format_find_result_empty() {
+        let result = format_find_result("nonexistent", &[]);
+        assert!(result.contains("No results"));
+    }
+
+    #[test]
+    fn test_format_check_prompt() {
+        let prompt = format_check_prompt("- [Intro] 500 chars", "What is X?");
+        assert!(prompt.contains("What is X?"));
+        assert!(prompt.contains("500 chars"));
+    }
+
+    #[test]
+    fn test_format_done() {
+        let result = format_done();
+        assert!(result.should_stop);
+        assert!(result.success);
+    }
+}
diff --git a/rust/src/agent/tools/mod.rs b/rust/src/agent/tools/mod.rs
new file mode 100644
index 00000000..af900137
--- /dev/null
+++ b/rust/src/agent/tools/mod.rs
@@ -0,0 +1,53 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! Tool definitions for the retrieval agent.
+//!
+//! Tools are organized by role:
+//! - `common` — shared between Orchestrator and Worker (find, check, done)
+//! - `worker` — Worker-specific (ls, cd, cd_up, cat, pwd)
+//! - `orchestrator` — Orchestrator-specific (ls_docs, find_cross, dispatch)
+
+pub mod common;
+pub mod orchestrator;
+pub mod worker;
+
+/// Result of executing a tool command.
+#[derive(Debug, Clone)]
+pub struct ToolResult {
+    /// Text feedback to include in the next LLM prompt.
+    pub feedback: String,
+    /// Whether the loop should stop.
+    pub should_stop: bool,
+    /// Whether the command executed successfully.
+    pub success: bool,
+}
+
+impl ToolResult {
+    /// Create a successful result with feedback.
+    pub fn ok(feedback: impl Into<String>) -> Self {
+        Self {
+            feedback: feedback.into(),
+            should_stop: false,
+            success: true,
+        }
+    }
+
+    /// Create a result that signals loop termination.
+    pub fn done(feedback: impl Into<String>) -> Self {
+        Self {
+            feedback: feedback.into(),
+            should_stop: true,
+            success: true,
+        }
+    }
+
+    /// Create a failed result (parse error, invalid target, etc.).
+    pub fn fail(feedback: impl Into<String>) -> Self {
+        Self {
+            feedback: feedback.into(),
+            should_stop: false,
+            success: false,
+        }
+    }
+}
diff --git a/rust/src/agent/tools/orchestrator.rs b/rust/src/agent/tools/orchestrator.rs
new file mode 100644
index 00000000..73c78aca
--- /dev/null
+++ b/rust/src/agent/tools/orchestrator.rs
@@ -0,0 +1,204 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! Orchestrator tools: ls_docs, find_cross, dispatch.
+
+use super::ToolResult;
+use crate::agent::config::WorkspaceContext;
+
+/// Execute `ls_docs` — list all document cards.
+///
+/// Returns a formatted view of all DocCards for the Orchestrator's Bird's-Eye View.
+pub fn ls_docs(ctx: &WorkspaceContext) -> ToolResult {
+    let cards = ctx.doc_cards();
+
+    if cards.is_empty() {
+        return ToolResult::ok("No documents with DocCards available.");
+    }
+
+    let mut output = format!("Available documents ({} total):\n\n", ctx.doc_count());
+
+    for (idx, card) in &cards {
+        output.push_str(&format!(
+            "[{}] {} — {}\n",
+            idx + 1,
+            card.title,
+            card.overview
+        ));
+
+        for sec in &card.sections {
+            output.push_str(&format!(
+                "    → {} ({} leaves)\n",
+                sec.title, sec.leaf_count
+            ));
+        }
+
+        if !card.question_hints.is_empty() {
+            output.push_str(&format!(
+                "    Can answer: {}\n",
+                card.question_hints.join(", ")
+            ));
+        }
+
+        if !card.topic_tags.is_empty() {
+            output.push_str(&format!("    Topics: {}\n", card.topic_tags.join(", ")));
+        }
+
+        output.push('\n');
+    }
+
+    // Also mention docs without cards
+    let with_cards: Vec<usize> = cards.iter().map(|(idx, _)| *idx).collect();
+    let without_cards: Vec<usize> = (0..ctx.doc_count())
+        .filter(|i| !with_cards.contains(i))
+        .collect();
+
+    if !without_cards.is_empty() {
+        output.push_str(&format!(
+            "Documents without DocCards: {:?}\n",
+            without_cards
+                .iter()
+                .map(|i| format!("doc_{}", i))
+                .collect::<Vec<_>>()
+        ));
+    }
+
+    ToolResult::ok(output)
+}
+
+/// Execute `find_cross` — search keywords across all documents.
+///
+/// Returns formatted results showing which documents matched.
+pub fn find_cross(keywords: &[String], ctx: &WorkspaceContext) -> ToolResult {
+    let results = ctx.find_cross_all(keywords);
+
+    if results.is_empty() {
+        return ToolResult::ok(format!(
+            "No matches found for keywords: {}",
+            keywords.join(", ")
+        ));
+    }
+
+    let mut output = String::new();
+    for (doc_idx, hits) in &results {
+        let doc = ctx.doc(*doc_idx);
+        let doc_name = doc.map(|d| d.doc_name).unwrap_or("unknown");
+        output.push_str(&format!("Document [{}] {}:\n", doc_idx + 1, doc_name));
+
+        for hit in hits {
+            for entry in &hit.entries {
+                let title = doc
+                    .and_then(|d| d.node_title(entry.node_id))
+                    .unwrap_or("unknown");
+                let summary = doc
+                    .and_then(|d| d.nav_entry(entry.node_id))
+                    .map(|e| e.overview.as_str())
+                    .unwrap_or("");
+                output.push_str(&format!(
+                    "  keyword '{}' → {} (depth {}, weight {:.2})",
+                    hit.keyword, title, entry.depth, entry.weight
+                ));
+                if !summary.is_empty() {
+                    output.push_str(&format!(" — {}", summary));
+                }
+                output.push('\n');
+            }
+        }
+        output.push('\n');
+    }
+
+    ToolResult::ok(output)
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::document::{DocCard, NavigationIndex, ReasoningIndex, SectionCard};
+
+    fn build_workspace() -> (
+        Vec<crate::document::DocumentTree>,
+        Vec<NavigationIndex>,
+        Vec<ReasoningIndex>,
+    ) {
+        let tree1 = crate::document::DocumentTree::new("2024 Report", "content");
+        let mut nav1 = NavigationIndex::new();
+        nav1.set_doc_card(DocCard {
+            title: "2024 Financial Report".to_string(),
+            overview: "Annual financial statements".to_string(),
+            question_hints: vec!["Revenue?".to_string()],
+            topic_tags: vec!["finance".to_string(), "2024".to_string()],
+            sections: vec![SectionCard {
+                title: "Revenue".to_string(),
+                description: "Revenue breakdown".to_string(),
+                leaf_count: 5,
+            }],
+            total_leaves: 10,
+        });
+
+        let tree2 = crate::document::DocumentTree::new("2023 Report", "content");
+        let mut nav2 = NavigationIndex::new();
+        nav2.set_doc_card(DocCard {
+            title: "2023 Financial Report".to_string(),
+            overview: "Previous year financial statements".to_string(),
+            question_hints: vec!["Sales?".to_string()],
+            topic_tags: vec!["finance".to_string(), "2023".to_string()],
+            sections: vec![SectionCard {
+                title: "Net Sales".to_string(),
+                description: "Net sales figures".to_string(),
+                leaf_count: 4,
+            }],
+            total_leaves: 8,
+        });
+
+        (
+            vec![tree1, tree2],
+            vec![nav1, nav2],
+            vec![ReasoningIndex::default(), ReasoningIndex::default()],
+        )
+    }
+
+    #[test]
+    fn test_ls_docs_shows_cards() {
+        let (trees, navs, ridxs) = build_workspace();
+        let docs = vec![
+            crate::agent::config::DocContext {
+                tree: &trees[0],
+                nav_index: &navs[0],
+                reasoning_index: &ridxs[0],
+                doc_name: "2024",
+            },
+            crate::agent::config::DocContext {
+                tree: &trees[1],
+                nav_index: &navs[1],
+                reasoning_index: &ridxs[1],
+                doc_name: "2023",
+            },
+        ];
+        let ctx = WorkspaceContext::new(docs);
+
+        let result = ls_docs(&ctx);
+        assert!(result.success);
+        assert!(result.feedback.contains("2024 Financial Report"));
+        assert!(result.feedback.contains("2023 Financial Report"));
+        assert!(result.feedback.contains("Revenue"));
+        assert!(result.feedback.contains("finance"));
+    }
+
+    #[test]
+    fn test_ls_docs_empty() {
+        let tree = crate::document::DocumentTree::new("Empty", "");
+        let nav = NavigationIndex::new();
+        let ridx = ReasoningIndex::default();
+        let docs = vec![crate::agent::config::DocContext {
+            tree: &tree,
+            nav_index: &nav,
+            reasoning_index: &ridx,
+            doc_name: "empty",
+        }];
+        let ctx = WorkspaceContext::new(docs);
+
+        let result = ls_docs(&ctx);
+        assert!(result.success);
+        assert!(result.feedback.contains("No documents with DocCards"));
+    }
+}
diff --git a/rust/src/agent/tools/worker/cat.rs b/rust/src/agent/tools/worker/cat.rs
new file mode 100644
index 00000000..0e13257f
--- /dev/null
+++ b/rust/src/agent/tools/worker/cat.rs
@@ -0,0 +1,121 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! `cat` — read node content and collect as evidence.
+
+use crate::agent::command;
+use crate::agent::config::{DocContext, Evidence};
+use crate::agent::state::WorkerState;
+
+use super::super::ToolResult;
+
+/// Execute `cat <target>` — read node content and collect as evidence.
+///
+/// Special targets:
+/// - `cat .` or `cat` (no arg) reads the current node's content.
+/// - Otherwise resolves the target to a child node by name.
+pub fn cat(target: &str, ctx: &DocContext, state: &mut WorkerState) -> ToolResult {
+    let node_id = if target == "." || target.is_empty() {
+        state.current_node
+    } else {
+        match command::resolve_target_extended(target, ctx.nav_index, state.current_node, ctx.tree)
+        {
+            Some(id) => id,
+            None => {
+                return ToolResult::fail(format!(
+                    "Target '{}' not found. Use 'ls' to see children, or 'cat .' to read current node.",
+                    target
+                ));
+            }
+        }
+    };
+
+    if state.visited.contains(&node_id) {
+        let title = ctx.node_title(node_id).unwrap_or("unknown");
+        return ToolResult::ok(format!(
+            "[Already collected: {}]. Use a different target or cd to another branch.",
+            title
+        ));
+    }
+
+    match ctx.cat(node_id) {
+        Some(content) => {
+            let title = ctx.node_title(node_id).unwrap_or("unknown").to_string();
+            let content_string = content.to_string();
+
+            state.add_evidence(Evidence {
+                source_path: format!("{}/{}", state.path_str(), title),
+                node_title: title.clone(),
+                content: content_string.clone(),
+                doc_name: Some(ctx.doc_name.to_string()),
+            });
+
+            state.visited.insert(node_id);
+
+            let preview = if content_string.len() > 500 {
+                format!(
+                    "{}...(truncated, {} chars total)",
+                    &content_string[..500],
+                    content_string.len()
+                )
+            } else {
+                content_string
+            };
+
+            ToolResult::ok(format!("[Evidence collected: {}]\n{}", title, preview))
+        }
+        None => ToolResult::fail(format!("No content available for '{}'.", target)),
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::document::{ChildRoute, DocumentTree, NavigationIndex, NodeId};
+
+    fn build_test_tree() -> (DocumentTree, NavigationIndex, NodeId, NodeId, NodeId) {
+        let mut tree = DocumentTree::new("Root", "root content");
+        let root = tree.root();
+        let c1 = tree.add_child(root, "Getting Started", "gs content");
+        let c2 = tree.add_child(root, "API Reference", "api content");
+
+        let mut nav = NavigationIndex::new();
+        nav.add_child_routes(
+            root,
+            vec![
+                ChildRoute {
+                    node_id: c1,
+                    title: "Getting Started".to_string(),
+                    description: "Setup guide".to_string(),
+                    leaf_count: 3,
+                },
+                ChildRoute {
+                    node_id: c2,
+                    title: "API Reference".to_string(),
+                    description: "API docs".to_string(),
+                    leaf_count: 7,
+                },
+            ],
+        );
+
+        (tree, nav, root, c1, c2)
+    }
+
+    #[test]
+    fn test_cat_collects_evidence() {
+        let (tree, nav, root, _, _) = build_test_tree();
+        let ctx = DocContext {
+            tree: &tree,
+            nav_index: &nav,
+            reasoning_index: &crate::document::ReasoningIndex::default(),
+            doc_name: "test",
+        };
+        let mut state = WorkerState::new(root, 8);
+
+        let result = cat("Getting Started", &ctx, &mut state);
+        assert!(result.success);
+        assert!(result.feedback.contains("Evidence collected"));
+        assert_eq!(state.evidence.len(), 1);
+        assert_eq!(state.evidence[0].content, "gs content");
+    }
+}
diff --git a/rust/src/agent/tools/worker/cd.rs b/rust/src/agent/tools/worker/cd.rs
new file mode 100644
index 00000000..765d6e29
--- /dev/null
+++ b/rust/src/agent/tools/worker/cd.rs
@@ -0,0 +1,158 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! `cd`, `cd_absolute`, `cd_up` — navigation commands.
+
+use crate::agent::command;
+use crate::agent::config::DocContext;
+use crate::agent::state::WorkerState;
+
+use super::super::ToolResult;
+
+/// Execute `cd <target>` — navigate into a child node.
+///
+/// Supports:
+/// - Relative names (child of current node): `cd "Getting Started"`
+/// - Absolute paths starting with `/`: `cd /root/Chapter 1/Section 1.2`
+pub fn cd(target: &str, ctx: &DocContext, state: &mut WorkerState) -> ToolResult {
+    if target.starts_with('/') {
+        return cd_absolute(target, ctx, state);
+    }
+
+    match command::resolve_target_extended(target, ctx.nav_index, state.current_node, ctx.tree) {
+        Some(node_id) => {
+            let title = ctx.node_title(node_id).unwrap_or(target).to_string();
+            state.cd(node_id, &title);
+            ToolResult::ok(format!("Entered: {}", state.path_str()))
+        }
+        None => ToolResult::fail(format!(
+            "Target '{}' not found. Use ls to see available children.",
+            target
+        )),
+    }
+}
+
+/// Navigate using an absolute path (e.g., `/root/Chapter 1/Section 1.2`).
+fn cd_absolute(path: &str, ctx: &DocContext, state: &mut WorkerState) -> ToolResult {
+    let segments: Vec<&str> = path.split('/').filter(|s| !s.is_empty()).collect();
+
+    if segments.is_empty() {
+        return ToolResult::fail("Empty absolute path.".to_string());
+    }
+
+    let root = ctx.root();
+    let mut current = root;
+
+    let start_idx = if !segments.is_empty() && segments[0].eq_ignore_ascii_case("root") {
+        1
+    } else {
+        0
+    };
+
+    let mut breadcrumb = vec!["root".to_string()];
+
+    for segment in &segments[start_idx..] {
+        match command::resolve_target_extended(segment, ctx.nav_index, current, ctx.tree) {
+            Some(node_id) => {
+                let title = ctx.node_title(node_id).unwrap_or(*segment).to_string();
+                breadcrumb.push(title);
+                current = node_id;
+            }
+            None => {
+                return ToolResult::fail(format!(
+                    "Path segment '{}' not found. Stopped at: /{}",
+                    segment,
+                    breadcrumb.join("/")
+                ));
+            }
+        }
+    }
+
+    state.breadcrumb = breadcrumb;
+    state.current_node = current;
+    state.visited.insert(current);
+
+    ToolResult::ok(format!("Entered: {}", state.path_str()))
+}
+
+/// Execute `cd ..` — navigate back to parent.
+pub fn cd_up(ctx: &DocContext, state: &mut WorkerState) -> ToolResult {
+    match ctx.parent(state.current_node) {
+        Some(parent) => {
+            if state.cd_up(parent) {
+                ToolResult::ok(format!("Back to: {}", state.path_str()))
+            } else {
+                ToolResult::ok("Already at root.".to_string())
+            }
+        }
+        None => ToolResult::ok("Already at root (no parent).".to_string()),
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::document::{ChildRoute, DocumentTree, NavigationIndex, NodeId};
+
+    fn build_test_tree() -> (DocumentTree, NavigationIndex, NodeId, NodeId, NodeId) {
+        let mut tree = DocumentTree::new("Root", "root content");
+        let root = tree.root();
+        let c1 = tree.add_child(root, "Getting Started", "gs content");
+        let c2 = tree.add_child(root, "API Reference", "api content");
+
+        let mut nav = NavigationIndex::new();
+        nav.add_child_routes(
+            root,
+            vec![
+                ChildRoute {
+                    node_id: c1,
+                    title: "Getting Started".to_string(),
+                    description: "Setup guide".to_string(),
+                    leaf_count: 3,
+                },
+                ChildRoute {
+                    node_id: c2,
+                    title: "API Reference".to_string(),
+                    description: "API docs".to_string(),
+                    leaf_count: 7,
+                },
+            ],
+        );
+
+        (tree, nav, root, c1, c2)
+    }
+
+    #[test]
+    fn test_cd_navigates() {
+        let (tree, nav, root, c1, _) = build_test_tree();
+        let ctx = DocContext {
+            tree: &tree,
+            nav_index: &nav,
+            reasoning_index: &crate::document::ReasoningIndex::default(),
+            doc_name: "test",
+        };
+        let mut state = WorkerState::new(root, 8);
+
+        let result = cd("Getting Started", &ctx, &mut state);
+        assert!(result.success);
+        assert_eq!(state.current_node, c1);
+        assert!(state.path_str().contains("Getting Started"));
+    }
+
+    #[test]
+    fn test_cd_up_goes_back() {
+        let (tree, nav, root, _c1, _) = build_test_tree();
+        let ctx = DocContext {
+            tree: &tree,
+            nav_index: &nav,
+            reasoning_index: &crate::document::ReasoningIndex::default(),
+            doc_name: "test",
+        };
+        let mut state = WorkerState::new(root, 8);
+
+        cd("Getting Started", &ctx, &mut state);
+        let result = cd_up(&ctx, &mut state);
+        assert!(result.success);
+        assert_eq!(state.current_node, root);
+    }
+}
diff --git a/rust/src/agent/tools/worker/find.rs b/rust/src/agent/tools/worker/find.rs
new file mode 100644
index 00000000..47912b01
--- /dev/null
+++ b/rust/src/agent/tools/worker/find.rs
@@ -0,0 +1,128 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! `find_tree` — search for nodes by title pattern across the entire tree.
+
+use crate::agent::config::DocContext;
+
+use super::super::ToolResult;
+
+/// Execute `findtree <pattern>` — search for nodes by title pattern across the entire tree.
+///
+/// Returns all nodes whose title contains the pattern (case-insensitive).
+pub fn find_tree(pattern: &str, ctx: &DocContext) -> ToolResult {
+    let pattern_lower = pattern.to_lowercase();
+    let all_nodes = ctx.tree.traverse();
+
+    let mut results = Vec::new();
+    for node_id in &all_nodes {
+        if let Some(node) = ctx.tree.get(*node_id) {
+            if node.title.to_lowercase().contains(&pattern_lower) {
+                let depth = ctx.tree.depth(*node_id);
+                let leaf_count = ctx.nav_entry(*node_id).map(|e| e.leaf_count).unwrap_or(0);
+                results.push((node.title.clone(), depth, leaf_count));
+            }
+        }
+    }
+
+    if results.is_empty() {
+        return ToolResult::ok(format!("No nodes matching '{}'.", pattern));
+    }
+
+    let mut output = format!("Nodes matching '{}' ({} found):\n", pattern, results.len());
+    for (title, depth, leaves) in &results {
+        output.push_str(&format!(
+            "  - {} (depth {}, {} leaves)\n",
+            title, depth, leaves
+        ));
+    }
+
+    ToolResult::ok(output)
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::agent::config::DocContext;
+    use crate::document::{ChildRoute, DocumentTree, NavigationIndex, NodeId};
+
+    fn build_rich_tree() -> (DocumentTree, NavigationIndex, NodeId) {
+        let mut tree = DocumentTree::new(
+            "Root",
+            "Welcome to the financial report.\nThis document covers 2024 and 2023 figures.",
+        );
+        let root = tree.root();
+        let c1 = tree.add_child(
+            root,
+            "Revenue",
+            "Total revenue in 2024 was $10.2M.\nQ1 revenue: $2.5M\nQ2 revenue: $2.8M\nEBITDA margin: 32%",
+        );
+        let c2 = tree.add_child(
+            root,
+            "Expenses",
+            "Operating expenses totaled $6.8M.\nR&D spending: $3.1M\nMarketing: $1.2M",
+        );
+
+        let mut nav = NavigationIndex::new();
+        nav.add_child_routes(
+            root,
+            vec![
+                ChildRoute {
+                    node_id: c1,
+                    title: "Revenue".to_string(),
+                    description: "Revenue breakdown".to_string(),
+                    leaf_count: 2,
+                },
+                ChildRoute {
+                    node_id: c2,
+                    title: "Expenses".to_string(),
+                    description: "Cost analysis".to_string(),
+                    leaf_count: 2,
+                },
+            ],
+        );
+
+        (tree, nav, root)
+    }
+
+    macro_rules! rich_ctx {
+        ($tree:expr, $nav:expr) => {
+            DocContext {
+                tree: &$tree,
+                nav_index: &$nav,
+                reasoning_index: &crate::document::ReasoningIndex::default(),
+                doc_name: "test",
+            }
+        };
+    }
+
+    #[test]
+    fn test_find_tree() {
+        let (tree, nav, _root) = build_rich_tree();
+        let ctx = rich_ctx!(tree, nav);
+
+        let result = find_tree("revenue", &ctx);
+        assert!(result.success);
+        assert!(result.feedback.contains("Revenue"));
+    }
+
+    #[test]
+    fn test_find_tree_case_insensitive() {
+        let (tree, nav, _root) = build_rich_tree();
+        let ctx = rich_ctx!(tree, nav);
+
+        let result = find_tree("EXPENSE", &ctx);
+        assert!(result.success);
+        assert!(result.feedback.contains("Expenses"));
+    }
+
+    #[test]
+    fn test_find_tree_no_match() {
+        let (tree, nav, _root) = build_rich_tree();
+        let ctx = rich_ctx!(tree, nav);
+
+        let result = find_tree("nonexistent_xyz", &ctx);
+        assert!(result.success);
+        assert!(result.feedback.contains("No nodes matching"));
+    }
+}
diff --git a/rust/src/agent/tools/worker/grep.rs b/rust/src/agent/tools/worker/grep.rs
new file mode 100644
index 00000000..6dc5c2c7
--- /dev/null
+++ b/rust/src/agent/tools/worker/grep.rs
@@ -0,0 +1,180 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! `grep` — regex search across all node content in the current subtree.
+
+use crate::agent::config::DocContext;
+use crate::agent::state::WorkerState;
+
+use super::super::ToolResult;
+use super::collect_subtree;
+
+/// Execute `grep <pattern>` — regex search across all node content in the current subtree.
+///
+/// Searches content of the current node and all descendants. Returns matching lines
+/// with their node titles, capped at 30 matches to avoid overwhelming feedback.
+pub fn grep(pattern: &str, ctx: &DocContext, state: &WorkerState) -> ToolResult {
+    let re = match regex::Regex::new(pattern) {
+        Ok(re) => re,
+        Err(e) => return ToolResult::fail(format!("Invalid regex '{}': {}", pattern, e)),
+    };
+
+    let subtree = collect_subtree(state.current_node, ctx.tree);
+    let mut matches_found = 0;
+    let mut output = String::new();
+    let max_matches = 30;
+
+    for node_id in &subtree {
+        if matches_found >= max_matches {
+            output.push_str("\n... (truncated, more matches available)");
+            break;
+        }
+
+        let content = match ctx.cat(*node_id) {
+            Some(c) if !c.is_empty() => c,
+            _ => continue,
+        };
+
+        let title = ctx.node_title(*node_id).unwrap_or("?");
+
+        for line in content.lines() {
+            if matches_found >= max_matches {
+                break;
+            }
+            if re.is_match(line) {
+                let preview = if line.len() > 120 {
+                    format!("{}...", &line[..120])
+                } else {
+                    line.to_string()
+                };
+                output.push_str(&format!("[{}] {}\n", title, preview));
+                matches_found += 1;
+            }
+        }
+    }
+
+    if matches_found == 0 {
+        ToolResult::ok(format!("No matches for /{}/ in subtree.", pattern))
+    } else {
+        ToolResult::ok(format!(
+            "Found {} match(es) for /{}/:\n{}",
+            matches_found, pattern, output
+        ))
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::agent::config::DocContext;
+    use crate::agent::state::WorkerState;
+    use crate::document::{ChildRoute, DocumentTree, NavigationIndex, NodeId};
+
+    fn build_rich_tree() -> (DocumentTree, NavigationIndex, NodeId) {
+        let mut tree = DocumentTree::new(
+            "Root",
+            "Welcome to the financial report.\nThis document covers 2024 and 2023 figures.",
+        );
+        let root = tree.root();
+        let c1 = tree.add_child(
+            root,
+            "Revenue",
+            "Total revenue in 2024 was $10.2M.\nQ1 revenue: $2.5M\nQ2 revenue: $2.8M\nEBITDA margin: 32%",
+        );
+        let c2 = tree.add_child(
+            root,
+            "Expenses",
+            "Operating expenses totaled $6.8M.\nR&D spending: $3.1M\nMarketing: $1.2M",
+        );
+
+        let mut nav = NavigationIndex::new();
+        nav.add_child_routes(
+            root,
+            vec![
+                ChildRoute {
+                    node_id: c1,
+                    title: "Revenue".to_string(),
+                    description: "Revenue breakdown".to_string(),
+                    leaf_count: 2,
+                },
+                ChildRoute {
+                    node_id: c2,
+                    title: "Expenses".to_string(),
+                    description: "Cost analysis".to_string(),
+                    leaf_count: 2,
+                },
+            ],
+        );
+
+        (tree, nav, root)
+    }
+
+    macro_rules! rich_ctx {
+        ($tree:expr, $nav:expr) => {
+            DocContext {
+                tree: &$tree,
+                nav_index: &$nav,
+                reasoning_index: &crate::document::ReasoningIndex::default(),
+                doc_name: "test",
+            }
+        };
+    }
+
+    #[test]
+    fn test_grep_finds_matches() {
+        let (tree, nav, root) = build_rich_tree();
+        let ctx = rich_ctx!(tree, nav);
+        let state = WorkerState::new(root, 8);
+
+        let result = grep("revenue", &ctx, &state);
+        assert!(result.success);
+        assert!(result.feedback.contains("revenue"));
+        assert!(result.feedback.contains("[Revenue]"));
+    }
+
+    #[test]
+    fn test_grep_regex() {
+        let (tree, nav, root) = build_rich_tree();
+        let ctx = rich_ctx!(tree, nav);
+        let state = WorkerState::new(root, 8);
+
+        let result = grep("EBITDA|\\$\\d+", &ctx, &state);
+        assert!(result.success);
+        assert!(result.feedback.contains("EBITDA"));
+        assert!(result.feedback.contains("$10"));
+    }
+
+    #[test]
+    fn test_grep_no_matches() {
+        let (tree, nav, root) = build_rich_tree();
+        let ctx = rich_ctx!(tree, nav);
+        let state = WorkerState::new(root, 8);
+
+        let result = grep("nonexistent_term_xyz", &ctx, &state);
+        assert!(result.success);
+        assert!(result.feedback.contains("No matches"));
+    }
+
+    #[test]
+    fn test_grep_invalid_regex() {
+        let (tree, nav, root) = build_rich_tree();
+        let ctx = rich_ctx!(tree, nav);
+        let state = WorkerState::new(root, 8);
+
+        let result = grep("[invalid", &ctx, &state);
+        assert!(!result.success);
+        assert!(result.feedback.contains("Invalid regex"));
+    }
+
+    #[test]
+    fn test_grep_subtree_only() {
+        let (tree, nav, root) = build_rich_tree();
+        let ctx = rich_ctx!(tree, nav);
+        let mut state = WorkerState::new(root, 8);
+
+        crate::agent::tools::worker::cd::cd("Expenses", &ctx, &mut state);
+        let result = grep("revenue", &ctx, &state);
+        assert!(result.success);
+        assert!(result.feedback.contains("No matches"));
+    }
+}
diff --git a/rust/src/agent/tools/worker/head.rs b/rust/src/agent/tools/worker/head.rs
new file mode 100644
index 00000000..06dd4432
--- /dev/null
+++ b/rust/src/agent/tools/worker/head.rs
@@ -0,0 +1,119 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! `head` — preview first N lines of a node without collecting evidence.
+
+use crate::agent::command;
+use crate::agent::config::DocContext;
+use crate::agent::state::WorkerState;
+
+use super::super::ToolResult;
+
+/// Execute `head <target>` — preview first N lines of a node without collecting evidence.
+pub fn head(target: &str, lines: usize, ctx: &DocContext, state: &WorkerState) -> ToolResult {
+    let node_id =
+        match command::resolve_target_extended(target, ctx.nav_index, state.current_node, ctx.tree)
+        {
+            Some(id) => id,
+            None => {
+                return ToolResult::fail(format!(
+                    "Target '{}' not found. Use ls to see available children.",
+                    target
+                ));
+            }
+        };
+
+    let content = match ctx.cat(node_id) {
+        Some(c) => c,
+        None => return ToolResult::fail(format!("No content for '{}'.", target)),
+    };
+
+    let title = ctx.node_title(node_id).unwrap_or("unknown");
+    let total_lines = content.lines().count();
+    let preview: Vec<&str> = content.lines().take(lines).collect();
+
+    let mut output = format!(
+        "[Preview: {} — showing {}/{} lines]\n",
+        title,
+        preview.len().min(lines),
+        total_lines
+    );
+    output.push_str(&preview.join("\n"));
+
+    if total_lines > lines {
+        output.push_str(&format!(
+            "\n... ({} more lines, use cat to read all)",
+            total_lines - lines
+        ));
+    }
+
+    ToolResult::ok(output)
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::agent::config::DocContext;
+    use crate::agent::state::WorkerState;
+    use crate::document::{ChildRoute, DocumentTree, NavigationIndex, NodeId};
+
+    fn build_rich_tree() -> (DocumentTree, NavigationIndex, NodeId) {
+        let mut tree = DocumentTree::new(
+            "Root",
+            "Welcome to the financial report.\nThis document covers 2024 and 2023 figures.",
+        );
+        let root = tree.root();
+        let c1 = tree.add_child(
+            root,
+            "Revenue",
+            "Total revenue in 2024 was $10.2M.\nQ1 revenue: $2.5M\nQ2 revenue: $2.8M\nEBITDA margin: 32%",
+        );
+
+        let mut nav = NavigationIndex::new();
+        nav.add_child_routes(
+            root,
+            vec![ChildRoute {
+                node_id: c1,
+                title: "Revenue".to_string(),
+                description: "Revenue breakdown".to_string(),
+                leaf_count: 2,
+            }],
+        );
+
+        (tree, nav, root)
+    }
+
+    macro_rules! rich_ctx {
+        ($tree:expr, $nav:expr) => {
+            DocContext {
+                tree: &$tree,
+                nav_index: &$nav,
+                reasoning_index: &crate::document::ReasoningIndex::default(),
+                doc_name: "test",
+            }
+        };
+    }
+
+    #[test]
+    fn test_head_preview() {
+        let (tree, nav, root) = build_rich_tree();
+        let ctx = rich_ctx!(tree, nav);
+        let state = WorkerState::new(root, 8);
+
+        let result = head("Revenue", 2, &ctx, &state);
+        assert!(result.success);
+        assert!(result.feedback.contains("Preview"));
+        assert!(result.feedback.contains("$10.2M"));
+        assert!(result.feedback.contains("2/4 lines"));
+    }
+
+    #[test]
+    fn test_head_not_found() {
+        let (tree, nav, root) = build_rich_tree();
+        let ctx = rich_ctx!(tree, nav);
+        let state = WorkerState::new(root, 8);
+
+        let result = head("NonExistent", 10, &ctx, &state);
+        assert!(!result.success);
+    }
+}
diff --git a/rust/src/agent/tools/worker/ls.rs b/rust/src/agent/tools/worker/ls.rs
new file mode 100644
index 00000000..256762ef
--- /dev/null
+++ b/rust/src/agent/tools/worker/ls.rs
@@ -0,0 +1,113 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! `ls` — list children of the current node.
+
+use crate::agent::config::DocContext;
+use crate::agent::state::WorkerState;
+
+use super::super::ToolResult;
+
+/// Execute `ls` — list children of the current node.
+pub fn ls(ctx: &DocContext, state: &WorkerState) -> ToolResult {
+    let mut output = String::new();
+
+    if let Some(entry) = ctx.nav_entry(state.current_node) {
+        output.push_str(&format!("Current section: {}\n", entry.overview));
+        if !entry.question_hints.is_empty() {
+            output.push_str(&format!(
+                "Can answer: {}\n",
+                entry.question_hints.join(", ")
+            ));
+        }
+        output.push('\n');
+    }
+
+    match ctx.ls(state.current_node) {
+        Some(routes) => {
+            if routes.is_empty() {
+                output
+                    .push_str("(leaf node — no children)\nUse cd .. to go back or done to finish.");
+                return ToolResult::ok(output);
+            }
+
+            for (i, route) in routes.iter().enumerate() {
+                output.push_str(&format!(
+                    "[{}] {} — {} ({} leaves)",
+                    i + 1,
+                    route.title,
+                    route.description,
+                    route.leaf_count
+                ));
+                if let Some(nav) = ctx.nav_entry(route.node_id) {
+                    if !nav.question_hints.is_empty() {
+                        output.push_str(&format!(
+                            "\n    Can answer: {}",
+                            nav.question_hints.join(", ")
+                        ));
+                    }
+                    if !nav.topic_tags.is_empty() {
+                        output.push_str(&format!("\n    Topics: {}", nav.topic_tags.join(", ")));
+                    }
+                }
+                output.push('\n');
+            }
+            ToolResult::ok(output)
+        }
+        None => {
+            output.push_str("(no navigation data for this node)\nUse cat to read content or cd .. to go back.");
+            ToolResult::ok(output)
+        }
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::document::{ChildRoute, DocumentTree, NavigationIndex, NodeId};
+
+    fn build_test_tree() -> (DocumentTree, NavigationIndex, NodeId, NodeId, NodeId) {
+        let mut tree = DocumentTree::new("Root", "root content");
+        let root = tree.root();
+        let c1 = tree.add_child(root, "Getting Started", "gs content");
+        let c2 = tree.add_child(root, "API Reference", "api content");
+
+        let mut nav = NavigationIndex::new();
+        nav.add_child_routes(
+            root,
+            vec![
+                ChildRoute {
+                    node_id: c1,
+                    title: "Getting Started".to_string(),
+                    description: "Setup guide".to_string(),
+                    leaf_count: 3,
+                },
+                ChildRoute {
+                    node_id: c2,
+                    title: "API Reference".to_string(),
+                    description: "API docs".to_string(),
+                    leaf_count: 7,
+                },
+            ],
+        );
+
+        (tree, nav, root, c1, c2)
+    }
+
+    #[test]
+    fn test_ls_shows_children() {
+        let (tree, nav, root, _, _) = build_test_tree();
+        let ctx = DocContext {
+            tree: &tree,
+            nav_index: &nav,
+            reasoning_index: &crate::document::ReasoningIndex::default(),
+            doc_name: "test",
+        };
+        let state = WorkerState::new(root, 8);
+
+        let result = ls(&ctx, &state);
+        assert!(result.success);
+        assert!(result.feedback.contains("Getting Started"));
+        assert!(result.feedback.contains("API Reference"));
+    }
+}
diff --git a/rust/src/agent/tools/worker/mod.rs b/rust/src/agent/tools/worker/mod.rs
new file mode 100644
index 00000000..eb73d34f
--- /dev/null
+++ b/rust/src/agent/tools/worker/mod.rs
@@ -0,0 +1,39 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! Worker tools: ls, cd, cd_up, cat, pwd, grep, head, find_tree, wc.
+
+mod cat;
+mod cd;
+mod find;
+mod grep;
+mod head;
+mod ls;
+mod pwd;
+mod wc;
+
+pub use cat::cat;
+pub use cd::{cd, cd_up};
+pub use find::find_tree;
+pub use grep::grep;
+pub use head::head;
+pub use ls::ls;
+pub use pwd::pwd;
+pub use wc::wc;
+
+use crate::document::{DocumentTree, NodeId};
+
+/// Collect all NodeIds in the subtree rooted at `node` (inclusive).
+pub(super) fn collect_subtree(node: NodeId, tree: &DocumentTree) -> Vec<NodeId> {
+    let mut result = vec![node];
+    let mut stack = vec![node];
+
+    while let Some(current) = stack.pop() {
+        for child in tree.children_iter(current) {
+            result.push(child);
+            stack.push(child);
+        }
+    }
+
+    result
+}
diff --git a/rust/src/agent/tools/worker/pwd.rs b/rust/src/agent/tools/worker/pwd.rs
new file mode 100644
index 00000000..4f71a7ee
--- /dev/null
+++ b/rust/src/agent/tools/worker/pwd.rs
@@ -0,0 +1,58 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! `pwd` — show current navigation path.
+
+use crate::agent::state::WorkerState;
+
+use super::super::ToolResult;
+
+/// Execute `pwd` — show current navigation path.
+pub fn pwd(state: &WorkerState) -> ToolResult {
+    ToolResult::ok(format!("Current path: {}", state.path_str()))
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::agent::config::DocContext;
+    use crate::agent::tools::worker::cd::cd;
+    use crate::document::{ChildRoute, DocumentTree, NavigationIndex};
+
+    fn build_test_tree() -> (DocumentTree, NavigationIndex) {
+        let mut tree = DocumentTree::new("Root", "root content");
+        let root = tree.root();
+        let c1 = tree.add_child(root, "API Reference", "api content");
+
+        let mut nav = NavigationIndex::new();
+        nav.add_child_routes(
+            root,
+            vec![ChildRoute {
+                node_id: c1,
+                title: "API Reference".to_string(),
+                description: "API docs".to_string(),
+                leaf_count: 7,
+            }],
+        );
+
+        (tree, nav)
+    }
+
+    #[test]
+    fn test_pwd() {
+        let (tree, nav) = build_test_tree();
+        let root = tree.root();
+        let ctx = DocContext {
+            tree: &tree,
+            nav_index: &nav,
+            reasoning_index: &crate::document::ReasoningIndex::default(),
+            doc_name: "test",
+        };
+        let mut state = WorkerState::new(root, 8);
+        cd("API Reference", &ctx, &mut state);
+
+        let result = pwd(&state);
+        assert!(result.success);
+        assert!(result.feedback.contains("API Reference"));
+    }
+}
diff --git a/rust/src/agent/tools/worker/wc.rs b/rust/src/agent/tools/worker/wc.rs
new file mode 100644
index 00000000..ac37f298
--- /dev/null
+++ b/rust/src/agent/tools/worker/wc.rs
@@ -0,0 +1,109 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! `wc` — show node content statistics.
+
+use crate::agent::command;
+use crate::agent::config::DocContext;
+use crate::agent::state::WorkerState;
+
+use super::super::ToolResult;
+
+/// Execute `wc <target>` — show node content statistics.
+pub fn wc(target: &str, ctx: &DocContext, state: &WorkerState) -> ToolResult {
+    let node_id =
+        match command::resolve_target_extended(target, ctx.nav_index, state.current_node, ctx.tree)
+        {
+            Some(id) => id,
+            None => {
+                return ToolResult::fail(format!(
+                    "Target '{}' not found. Use ls to see available children.",
+                    target
+                ));
+            }
+        };
+
+    let content = match ctx.cat(node_id) {
+        Some(c) => c,
+        None => return ToolResult::fail(format!("No content for '{}'.", target)),
+    };
+
+    let title = ctx.node_title(node_id).unwrap_or("unknown");
+    let lines = content.lines().count();
+    let words = content.split_whitespace().count();
+    let chars = content.len();
+
+    ToolResult::ok(format!(
+        "[{}] {} lines, {} words, {} chars",
+        title, lines, words, chars
+    ))
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::agent::config::DocContext;
+    use crate::agent::state::WorkerState;
+    use crate::document::{ChildRoute, DocumentTree, NavigationIndex, NodeId};
+
+    fn build_rich_tree() -> (DocumentTree, NavigationIndex, NodeId) {
+        let mut tree = DocumentTree::new(
+            "Root",
+            "Welcome to the financial report.\nThis document covers 2024 and 2023 figures.",
+        );
+        let root = tree.root();
+        let c1 = tree.add_child(
+            root,
+            "Revenue",
+            "Total revenue in 2024 was $10.2M.\nQ1 revenue: $2.5M\nQ2 revenue: $2.8M\nEBITDA margin: 32%",
+        );
+
+        let mut nav = NavigationIndex::new();
+        nav.add_child_routes(
+            root,
+            vec![ChildRoute {
+                node_id: c1,
+                title: "Revenue".to_string(),
+                description: "Revenue breakdown".to_string(),
+                leaf_count: 2,
+            }],
+        );
+
+        (tree, nav, root)
+    }
+
+    macro_rules! rich_ctx {
+        ($tree:expr, $nav:expr) => {
+            DocContext {
+                tree: &$tree,
+                nav_index: &$nav,
+                reasoning_index: &crate::document::ReasoningIndex::default(),
+                doc_name: "test",
+            }
+        };
+    }
+
+    #[test]
+    fn test_wc_stats() {
+        let (tree, nav, root) = build_rich_tree();
+        let ctx = rich_ctx!(tree, nav);
+        let state = WorkerState::new(root, 8);
+
+        let result = wc("Revenue", &ctx, &state);
+        assert!(result.success);
+        assert!(result.feedback.contains("Revenue"));
+        assert!(result.feedback.contains("lines"));
+        assert!(result.feedback.contains("words"));
+        assert!(result.feedback.contains("chars"));
+    }
+
+    #[test]
+    fn test_wc_not_found() {
+        let (tree, nav, root) = build_rich_tree();
+        let ctx = rich_ctx!(tree, nav);
+        let state = WorkerState::new(root, 8);
+
+        let result = wc("NonExistent", &ctx, &state);
+        assert!(!result.success);
+    }
+}
diff --git a/rust/src/agent/worker/execute.rs b/rust/src/agent/worker/execute.rs
new file mode 100644
index 00000000..be767299
--- /dev/null
+++ b/rust/src/agent/worker/execute.rs
@@ -0,0 +1,240 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! Command execution — dispatch parsed Command to tool functions.
+
+use tracing::{info, warn};
+
+use crate::llm::LlmClient;
+
+use super::super::command::{Command, parse_command};
+use super::super::config::{DocContext, Step};
+use super::super::events::EventEmitter;
+use super::super::prompts::{check_sufficiency, parse_sufficiency_response};
+use super::super::state::WorkerState;
+use super::super::tools::worker as tools;
+
+/// Execute a single parsed command, mutating state.
+///
+/// Returns a `Step` indicating whether to continue or stop.
+pub async fn execute_command(
+    command: &Command,
+    ctx: &DocContext<'_>,
+    state: &mut WorkerState,
+    query: &str,
+    llm: &LlmClient,
+    llm_calls: &mut u32,
+    emitter: &EventEmitter,
+) -> Step {
+    info!(
+        doc = ctx.doc_name,
+        command = ?command,
+        "Executing tool"
+    );
+    match command {
+        Command::Ls => {
+            let result = tools::ls(ctx, state);
+            info!(doc = ctx.doc_name, feedback = %truncate_log(&result.feedback), "ls result");
+            state.set_feedback(result.feedback);
+            Step::Continue
+        }
+
+        Command::Cd { target } => {
+            let result = tools::cd(target, ctx, state);
+            info!(doc = ctx.doc_name, target, feedback = %truncate_log(&result.feedback), "cd result");
+            state.set_feedback(result.feedback);
+            Step::Continue
+        }
+
+        Command::CdUp => {
+            let result = tools::cd_up(ctx, state);
+            info!(doc = ctx.doc_name, feedback = %truncate_log(&result.feedback), "cd_up result");
+            state.set_feedback(result.feedback);
+            Step::Continue
+        }
+
+        Command::Cat { target } => {
+            let evidence_before = state.evidence.len();
+            let result = tools::cat(target, ctx, state);
+            info!(doc = ctx.doc_name, target, feedback = %truncate_log(&result.feedback), "cat result");
+            state.set_feedback(result.feedback);
+            if state.evidence.len() > evidence_before {
+                if let Some(ev) = state.evidence.last() {
+                    info!(
+                        doc = ctx.doc_name,
+                        node = %ev.node_title,
+                        path = %ev.source_path,
+                        len = ev.content.len(),
+                        total = state.evidence.len(),
+                        "Evidence collected"
+                    );
+                    emitter.emit_evidence(
+                        ctx.doc_name,
+                        &ev.node_title,
+                        &ev.source_path,
+                        ev.content.len(),
+                        state.evidence.len(),
+                    );
+                }
+            }
+            Step::Continue
+        }
+
+        Command::Find { keyword } => {
+            let feedback = match ctx.find(keyword) {
+                Some(hit) => {
+                    let mut entries = hit.entries.clone();
+                    entries.sort_by(|a, b| {
+                        b.weight
+                            .partial_cmp(&a.weight)
+                            .unwrap_or(std::cmp::Ordering::Equal)
+                    });
+                    let mut seen_nodes = std::collections::HashSet::new();
+                    let mut output = format!("Results for '{}':\n", keyword);
+                    for entry in &entries {
+                        if !seen_nodes.insert(entry.node_id) {
+                            continue;
+                        }
+                        let title = ctx.node_title(entry.node_id).unwrap_or("unknown");
+                        let summary = ctx
+                            .nav_entry(entry.node_id)
+                            .map(|e| e.overview.as_str())
+                            .unwrap_or("");
+                        output.push_str(&format!(
+                            "  - {} (depth {}, weight {:.2})",
+                            title, entry.depth, entry.weight
+                        ));
+                        if !summary.is_empty() {
+                            output.push_str(&format!(" — {}", summary));
+                        }
+                        output.push('\n');
+                    }
+                    output
+                }
+                None => format!("No results for '{}'", keyword),
+            };
+            info!(doc = ctx.doc_name, keyword, feedback = %truncate_log(&feedback), "find result");
+            state.set_feedback(feedback);
+            Step::Continue
+        }
+
+        Command::Pwd => {
+            let result = tools::pwd(state);
+            state.set_feedback(result.feedback);
+            Step::Continue
+        }
+
+        Command::Check => {
+            let evidence_summary = state.evidence_summary();
+
+            let (system, user) = check_sufficiency(query, &evidence_summary);
+
+            info!(
+                doc = ctx.doc_name,
+                system = %system,
+                user = %user,
+                "Check prompt"
+            );
+
+            match llm.complete(&system, &user).await {
+                Ok(response) => {
+                    *llm_calls += 1;
+                    state.check_count += 1;
+                    let sufficient = parse_sufficiency_response(&response);
+                    info!(
+                        doc = ctx.doc_name,
+                        sufficient,
+                        evidence = state.evidence.len(),
+                        response = %response,
+                        "Sufficiency check"
+                    );
+                    emitter.emit_worker_sufficiency_check(
+                        ctx.doc_name,
+                        sufficient,
+                        state.evidence.len(),
+                        None,
+                    );
+                    if sufficient {
+                        state.last_feedback =
+                            "Evidence is sufficient. Use done to finish.".to_string();
+                        Step::Done
+                    } else {
+                        let reason = response
+                            .trim()
+                            .strip_prefix("INSUFFICIENT")
+                            .unwrap_or(response.trim())
+                            .trim()
+                            .trim_start_matches(|c: char| c == '-' || c == ' ');
+                        if !reason.is_empty() {
+                            state.missing_info = reason.to_string();
+                        }
+                        state.set_feedback(format!(
+                            "Evidence not yet sufficient: {}",
+                            response.trim()
+                        ));
+                        Step::Continue
+                    }
+                }
+                Err(e) => {
+                    warn!(error = %e, "Check LLM call failed");
+                    state.last_feedback = "Could not evaluate sufficiency.".to_string();
+                    Step::Continue
+                }
+            }
+        }
+
+        Command::Done => {
+            state.last_feedback = "Navigation complete.".to_string();
+            Step::Done
+        }
+
+        Command::Grep { pattern } => {
+            let result = tools::grep(pattern, ctx, state);
+            info!(doc = ctx.doc_name, pattern, feedback = %truncate_log(&result.feedback), "grep result");
+            state.set_feedback(result.feedback);
+            Step::Continue
+        }
+
+        Command::Head { target, lines } => {
+            let result = tools::head(target, *lines, ctx, state);
+            info!(doc = ctx.doc_name, target, lines, feedback = %truncate_log(&result.feedback), "head result");
+            state.set_feedback(result.feedback);
+            Step::Continue
+        }
+
+        Command::FindTree { pattern } => {
+            let result = tools::find_tree(pattern, ctx);
+            info!(doc = ctx.doc_name, pattern, feedback = %truncate_log(&result.feedback), "find_tree result");
+            state.set_feedback(result.feedback);
+            Step::Continue
+        }
+
+        Command::Wc { target } => {
+            let result = tools::wc(target, ctx, state);
+            info!(doc = ctx.doc_name, target, feedback = %truncate_log(&result.feedback), "wc result");
+            state.set_feedback(result.feedback);
+            Step::Continue
+        }
+    }
+}
+
+/// Truncate feedback for log output — keep first 300 chars to avoid noisy logs.
+fn truncate_log(s: &str) -> std::borrow::Cow<'_, str> {
+    const MAX: usize = 300;
+    if s.len() <= MAX {
+        std::borrow::Cow::Borrowed(s)
+    } else {
+        std::borrow::Cow::Owned(format!("{}...(truncated, {} chars total)", &s[..MAX], s.len()))
+    }
+}
+
+/// Parse the LLM output and detect parse failures.
+///
+/// Returns `(command, is_parse_failure)`.
+pub fn parse_and_detect_failure(llm_output: &str) -> (Command, bool) {
+    let command = parse_command(llm_output);
+    let trimmed = llm_output.trim();
+    let is_parse_failure =
+        matches!(command, Command::Ls) && !trimmed.starts_with("ls") && !trimmed.is_empty();
+    (command, is_parse_failure)
+}
diff --git a/rust/src/agent/worker/format.rs b/rust/src/agent/worker/format.rs
new file mode 100644
index 00000000..be9e029f
--- /dev/null
+++ b/rust/src/agent/worker/format.rs
@@ -0,0 +1,20 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! Formatting helpers for Worker prompts.
+
+use super::super::config::DocContext;
+use super::super::state::WorkerState;
+
+/// Resolve visited NodeIds to their titles for prompt injection.
+pub fn format_visited_titles(state: &WorkerState, ctx: &DocContext<'_>) -> String {
+    if state.visited.is_empty() {
+        return "(none)".to_string();
+    }
+    state
+        .visited
+        .iter()
+        .filter_map(|&node_id| ctx.node_title(node_id).map(|t| t.to_string()))
+        .collect::<Vec<_>>()
+        .join(", ")
+}
diff --git a/rust/src/agent/worker/mod.rs b/rust/src/agent/worker/mod.rs
new file mode 100644
index 00000000..8d913a3f
--- /dev/null
+++ b/rust/src/agent/worker/mod.rs
@@ -0,0 +1,352 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! Worker agent — document navigation and evidence collection.
+//!
+//! The Worker is a consuming-self struct implementing [`Agent`]:
+//! 1. Bird's-eye: ls(root) for initial overview
+//! 2. Navigation planning: LLM generates a plan (keyword hits as context)
+//! 3. Navigation loop: LLM → parse → execute → repeat (max N rounds)
+//!
+//! Dispatched by the Orchestrator, one per document.
+//! Returns raw evidence — no answer synthesis. Rerank owns all answer generation.
+
+mod execute;
+mod format;
+mod planning;
+
+use tracing::{debug, info};
+
+use super::Agent;
+use super::command::Command;
+use super::config::{DocContext, Step, WorkerConfig, WorkerOutput};
+use super::context::FindHit;
+use super::events::EventEmitter;
+use super::prompts::{NavigationParams, worker_dispatch, worker_navigation};
+use super::state::WorkerState;
+use super::tools::worker as tools;
+use crate::error::Error;
+use crate::llm::LlmClient;
+use crate::query::QueryPlan;
+use crate::scoring::bm25::extract_keywords;
+
+use execute::{execute_command, parse_and_detect_failure};
+use format::format_visited_titles;
+use planning::{build_plan_prompt, build_replan_prompt, format_keyword_hints};
+
+/// Worker agent — navigates a single document to collect evidence.
+///
+/// Holds all execution context. Calling [`run()`](Agent::run) consumes self.
+pub struct Worker<'a> {
+    query: String,
+    task: Option<String>,
+    ctx: &'a DocContext<'a>,
+    config: WorkerConfig,
+    llm: LlmClient,
+    emitter: EventEmitter,
+    query_plan: QueryPlan,
+}
+
+impl<'a> Worker<'a> {
+    /// Create a new Worker.
+    pub fn new(
+        query: &str,
+        task: Option<&str>,
+        ctx: &'a DocContext<'a>,
+        config: WorkerConfig,
+        llm: LlmClient,
+        emitter: EventEmitter,
+        query_plan: QueryPlan,
+    ) -> Self {
+        Self {
+            query: query.to_string(),
+            task: task.map(|s| s.to_string()),
+            ctx,
+            config,
+            llm,
+            emitter,
+            query_plan,
+        }
+    }
+}
+
+impl<'a> Agent for Worker<'a> {
+    type Output = WorkerOutput;
+
+    fn name(&self) -> &str {
+        "worker"
+    }
+
+    async fn run(self) -> crate::error::Result<WorkerOutput> {
+        let Worker {
+            query,
+            task,
+            ctx,
+            config,
+            llm,
+            emitter,
+            query_plan,
+        } = self;
+        let task_ref = task.as_deref();
+
+        let intent_context = format!("{} — {}", query_plan.intent, query_plan.strategy_hint);
+
+        emitter.emit_worker_started(ctx.doc_name, task_ref, config.max_rounds);
+
+        info!(
+            doc = ctx.doc_name,
+            task = task_ref.unwrap_or("(full query)"),
+            max_rounds = config.max_rounds,
+            max_llm_calls = config.max_llm_calls,
+            "Worker starting"
+        );
+
+        let mut llm_calls: u32 = 0;
+        let max_llm = config.max_llm_calls;
+
+        macro_rules! llm_budget_exhausted {
+            () => {
+                max_llm > 0 && llm_calls >= max_llm
+            };
+        }
+
+        // Gather keyword hits as context for LLM planning (not routing rules)
+        let keywords = extract_keywords(&query);
+        let index_hits: Vec<FindHit> = ctx.find_all(&keywords);
+        if !index_hits.is_empty() {
+            debug!(
+                doc = ctx.doc_name,
+                hit_count = index_hits.len(),
+                "ReasoningIndex keyword hits available for planning"
+            );
+        }
+
+        // --- Phase 1: Bird's-eye view ---
+        let mut state = WorkerState::new(ctx.root(), config.max_rounds);
+        let ls_result = tools::ls(ctx, &state);
+        state.set_feedback(ls_result.feedback);
+
+        // --- Phase 1.5: Navigation planning ---
+        if state.remaining > 0 && !llm_budget_exhausted!() {
+            let plan_prompt = build_plan_prompt(
+                &query,
+                task_ref,
+                &state.last_feedback,
+                ctx.doc_name,
+                &index_hits,
+                ctx,
+                query_plan.intent,
+            );
+            let plan_output = llm
+                .complete(&plan_prompt.0, &plan_prompt.1)
+                .await
+                .map_err(|e| Error::LlmReasoning {
+                    stage: "worker/plan".to_string(),
+                    detail: format!("Navigation plan LLM call failed: {e}"),
+                })?;
+            llm_calls += 1;
+            let plan_text = plan_output.trim().to_string();
+            if !plan_text.is_empty() {
+                info!(
+                    doc = ctx.doc_name,
+                    plan = %plan_text,
+                    "Navigation plan generated"
+                );
+                emitter.emit_worker_plan_generated(ctx.doc_name, plan_text.len());
+                state.plan = plan_text;
+                state.plan_generated = true;
+            }
+        }
+
+        // --- Phase 2: Navigation loop ---
+        let use_dispatch_prompt = task_ref.is_some();
+        let keyword_hints = format_keyword_hints(&index_hits, ctx);
+
+        loop {
+            if state.remaining == 0 {
+                info!(doc = ctx.doc_name, "Navigation budget exhausted");
+                break;
+            }
+            if llm_budget_exhausted!() {
+                info!(
+                    doc = ctx.doc_name,
+                    llm_calls, max_llm, "LLM call budget exhausted"
+                );
+                break;
+            }
+
+            // Build prompt
+            let (system, user) = if use_dispatch_prompt && state.remaining == config.max_rounds {
+                worker_dispatch(&super::prompts::WorkerDispatchParams {
+                    original_query: &query,
+                    task: task_ref.unwrap_or(&query),
+                    doc_name: ctx.doc_name,
+                    breadcrumb: &state.path_str(),
+                })
+            } else {
+                let visited_titles = format_visited_titles(&state, ctx);
+                worker_navigation(&NavigationParams {
+                    query: &query,
+                    task: task_ref,
+                    breadcrumb: &state.path_str(),
+                    evidence_summary: &state.evidence_summary(),
+                    missing_info: &state.missing_info,
+                    last_feedback: &state.last_feedback,
+                    remaining: state.remaining,
+                    max_rounds: state.max_rounds,
+                    history: &state.history_text(),
+                    visited_titles: &visited_titles,
+                    plan: &state.plan,
+                    intent_context: &intent_context,
+                    keyword_hints: &keyword_hints,
+                })
+            };
+
+            // LLM decision
+            let round_num = config.max_rounds - state.remaining + 1;
+            let round_start = std::time::Instant::now();
+            let llm_output =
+                llm.complete(&system, &user)
+                    .await
+                    .map_err(|e| Error::LlmReasoning {
+                        stage: "worker/navigation".to_string(),
+                        detail: format!(
+                            "Nav loop LLM call failed (round {round_num}): {e}"
+                        ),
+                    })?;
+            llm_calls += 1;
+
+            // Parse command
+            if llm_output.trim().len() < 2 {
+                tracing::warn!(
+                    doc = ctx.doc_name,
+                    round = config.max_rounds - state.remaining + 1,
+                    response = llm_output.trim(),
+                    "LLM response unusually short"
+                );
+            }
+            let (command, is_parse_failure) = parse_and_detect_failure(&llm_output);
+            if is_parse_failure {
+                let raw_preview = if llm_output.trim().len() > 200 {
+                    format!("{}...", &llm_output.trim()[..200])
+                } else {
+                    llm_output.trim().to_string()
+                };
+                state.last_feedback = format!(
+                    "Your output was not recognized as a valid command:\n\"{}\"\n\n\
+                     Please output exactly one command (ls, cd, cat, head, find, findtree, grep, wc, pwd, check, or done).",
+                    raw_preview
+                );
+                state.push_history("(unrecognized) → parse failure".to_string());
+                continue;
+            }
+
+            debug!(doc = ctx.doc_name, ?command, "Parsed command");
+
+            let is_check = matches!(command, Command::Check);
+
+            // Execute
+            let step = execute_command(
+                &command,
+                ctx,
+                &mut state,
+                &query,
+                &llm,
+                &mut llm_calls,
+                &emitter,
+            )
+            .await;
+
+            // Dynamic re-planning after insufficient check
+            if is_check
+                && !state.missing_info.is_empty()
+                && state.remaining >= 3
+                && !llm_budget_exhausted!()
+            {
+                let missing = state.missing_info.clone();
+                let replan = build_replan_prompt(&query, task_ref, &state, ctx);
+                let new_plan =
+                    llm.complete(&replan.0, &replan.1)
+                        .await
+                        .map_err(|e| Error::LlmReasoning {
+                            stage: "worker/replan".to_string(),
+                            detail: format!("Re-plan LLM call failed: {e}"),
+                        })?;
+                llm_calls += 1;
+                let plan_text = new_plan.trim().to_string();
+                if !plan_text.is_empty() {
+                    info!(
+                        doc = ctx.doc_name,
+                        plan = %plan_text,
+                        "Re-plan generated"
+                    );
+                    emitter.emit_worker_replan(ctx.doc_name, &missing, plan_text.len());
+                    state.plan = plan_text;
+                }
+                state.missing_info.clear();
+            } else if is_check && !state.missing_info.is_empty() {
+                state.plan.clear();
+                state.missing_info.clear();
+            }
+
+            // Emit round event
+            let cmd_str = format!("{:?}", command);
+            let success = !matches!(step, Step::ForceDone(_));
+            let round_elapsed = round_start.elapsed().as_millis() as u64;
+            emitter.emit_worker_round(ctx.doc_name, round_num, &cmd_str, success, round_elapsed);
+
+            let feedback_preview = if state.last_feedback.len() > 120 {
+                format!("{}...", &state.last_feedback[..120])
+            } else {
+                state.last_feedback.clone()
+            };
+            state.push_history(format!("{} → {}", cmd_str, feedback_preview));
+
+            // Check termination
+            match step {
+                Step::Done => {
+                    info!(
+                        doc = ctx.doc_name,
+                        evidence = state.evidence.len(),
+                        "Navigation done"
+                    );
+                    break;
+                }
+                Step::ForceDone(reason) => {
+                    info!(doc = ctx.doc_name, reason = %reason, "Forced done");
+                    break;
+                }
+                Step::Continue => {
+                    if !is_check {
+                        state.dec_round();
+                    }
+                }
+            }
+        }
+
+        let budget_exhausted = state.remaining == 0 || llm_budget_exhausted!();
+
+        // Worker returns raw evidence — no synthesis.
+        // The Orchestrator owns the single synthesis/fusion point via rerank::process.
+        let output = state.into_worker_output(llm_calls, budget_exhausted, ctx.doc_name);
+
+        emitter.emit_worker_done(
+            ctx.doc_name,
+            output.evidence.len(),
+            output.metrics.rounds_used,
+            output.metrics.llm_calls,
+            output.metrics.budget_exhausted,
+            output.metrics.plan_generated,
+        );
+
+        info!(
+            doc = ctx.doc_name,
+            evidence = output.evidence.len(),
+            rounds = output.metrics.rounds_used,
+            llm_calls = output.metrics.llm_calls,
+            "Worker complete"
+        );
+
+        Ok(output)
+    }
+}
diff --git a/rust/src/agent/worker/planning.rs b/rust/src/agent/worker/planning.rs
new file mode 100644
index 00000000..eb989991
--- /dev/null
+++ b/rust/src/agent/worker/planning.rs
@@ -0,0 +1,686 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! Navigation planning prompts — initial plan, re-plan, semantic hints, deep expansion.
+
+use std::collections::HashSet;
+
+use crate::query::QueryIntent;
+use crate::scoring::bm25::{Bm25Engine, FieldDocument, extract_keywords};
+
+use super::super::config::DocContext;
+use super::super::context::FindHit;
+use super::super::state::WorkerState;
+use super::format::format_visited_titles;
+
+/// Maximum total chars for keyword + semantic sections in planning prompt.
+const PLAN_CONTEXT_BUDGET: usize = 1500;
+
+/// Build the navigation planning prompt (Phase 1.5).
+pub fn build_plan_prompt(
+    query: &str,
+    task: Option<&str>,
+    ls_output: &str,
+    doc_name: &str,
+    keyword_hits: &[FindHit],
+    ctx: &DocContext<'_>,
+    intent: QueryIntent,
+) -> (String, String) {
+    let task_section = match task {
+        Some(t) => format!("\nYour specific task: {}", t),
+        None => String::new(),
+    };
+
+    let query_keywords = extract_keywords(query);
+    let query_lower = query.to_lowercase();
+
+    let mut keyword_section = if keyword_hits.is_empty() {
+        String::new()
+    } else {
+        let mut section =
+            String::from("\nKeyword index matches (use these to prioritize navigation):\n");
+        for hit in keyword_hits {
+            let mut entries = hit.entries.clone();
+            entries.sort_by(|a, b| {
+                b.weight
+                    .partial_cmp(&a.weight)
+                    .unwrap_or(std::cmp::Ordering::Equal)
+            });
+            let mut seen = HashSet::new();
+            for entry in &entries {
+                if !seen.insert(entry.node_id) {
+                    continue;
+                }
+                let ancestor_path = build_ancestor_path(entry.node_id, ctx);
+                section.push_str(&format!(
+                    "  - keyword '{}' → {} (depth {}, weight {:.2})\n",
+                    hit.keyword, ancestor_path, entry.depth, entry.weight
+                ));
+                if section.len() > PLAN_CONTEXT_BUDGET {
+                    section.push_str("  ... (more hits truncated)\n");
+                    break;
+                }
+            }
+            if section.len() > PLAN_CONTEXT_BUDGET {
+                break;
+            }
+        }
+        section
+    };
+
+    let deep_expansion = build_deep_expansion(keyword_hits, ctx);
+    if !deep_expansion.is_empty() {
+        if keyword_section.len() + deep_expansion.len() <= PLAN_CONTEXT_BUDGET {
+            keyword_section.push_str(&deep_expansion);
+        }
+    }
+
+    let semantic_section = build_semantic_hints(&query_keywords, &query_lower, ctx);
+
+    let intent_section = build_intent_signals(intent, ctx);
+
+    let system = "You are a document navigation planner. Given a user question, the top-level \
+         document structure, keyword index matches, and semantic hints, output a brief navigation \
+         plan: which sections to visit and in what order. Prioritize sections that matched keywords \
+         or semantic hints. The plan should be 2-5 steps. Each step should be a specific action \
+         like \"cd to X, then cat Y\" or \"grep for Z in current subtree\". \
+         Pay attention to 'Can answer' and 'Topics' annotations in the structure listing — \
+         they indicate what questions each section addresses. \
+         Output only the plan, nothing else.\n\n\
+         Example plan for \"What is the Q1 revenue?\":\n\
+         1. cd to Revenue (matched keyword 'revenue')\n\
+         2. ls to see sub-sections\n\
+         3. cat Q1 Report\n\
+         4. check\n\
+         5. done".to_string();
+
+    let user = format!(
+        "Document: {doc_name}\n\
+         Top-level structure:\n{ls_output}{keyword_section}{semantic_section}{intent_section}\
+         User question: {query}{task_section}\n\n\
+         Navigation plan:"
+    );
+
+    (system, user)
+}
+
+/// Build a focused re-planning prompt when check returns INSUFFICIENT.
+pub fn build_replan_prompt(
+    query: &str,
+    task: Option<&str>,
+    state: &WorkerState,
+    ctx: &DocContext<'_>,
+) -> (String, String) {
+    let task_section = match task {
+        Some(t) => format!("\nOriginal sub-task: {}", t),
+        None => String::new(),
+    };
+
+    let visited = format_visited_titles(state, ctx);
+    let evidence_summary = state.evidence_summary();
+
+    let current_children = match ctx.ls(state.current_node) {
+        Some(routes) if !routes.is_empty() => {
+            let items: Vec<String> = routes
+                .iter()
+                .map(|r| format!("  - {} ({} leaves)", r.title, r.leaf_count))
+                .collect();
+            format!("Children at current position:\n{}\n", items.join("\n"))
+        }
+        _ => "Current position is a leaf node — consider cd .. to go back.\n".to_string(),
+    };
+
+    let sibling_hints = build_sibling_hints(state, ctx);
+
+    let system = "You are re-planning a document navigation strategy. The previous plan did not \
+         find sufficient evidence. Given what's been found and what's still missing, generate a \
+         focused 2-3 step plan. Each step should be a specific action like \
+         \"cd to X, then cat Y\" or \"grep for Z in current subtree\". \
+         Prefer exploring unvisited branches. If current branch is exhausted, cd .. and try \
+         a different path. Output only the plan, nothing else."
+        .to_string();
+
+    let user = format!(
+        "Original question: {query}{task_section}\n\
+         Current position: /{}\n\
+         Evidence collected so far:\n{evidence_summary}\n\
+         What's missing: {}\n\
+         Already visited: {visited}\n\
+         {current_children}\
+         {sibling_hints}\
+         Remaining rounds: {}/{}\n\n\
+         Revised navigation plan:",
+        state.path_str(),
+        state.missing_info,
+        state.remaining,
+        state.max_rounds,
+    );
+
+    (system, user)
+}
+
+/// Format keyword index hits into a compact string for LLM context.
+///
+/// Returns a string like:
+/// ```text
+/// Keyword matches (use find <keyword> to jump directly):
+///   - 'complex' → Performance (weight 0.85)
+///   - 'latency' → Performance (weight 0.72)
+/// ```
+pub fn format_keyword_hints(keyword_hits: &[FindHit], ctx: &DocContext<'_>) -> String {
+    if keyword_hits.is_empty() {
+        return String::new();
+    }
+
+    let mut section = String::from("Keyword matches (use find <keyword> to jump directly):\n");
+    for hit in keyword_hits {
+        let mut entries = hit.entries.clone();
+        entries.sort_by(|a, b| {
+            b.weight
+                .partial_cmp(&a.weight)
+                .unwrap_or(std::cmp::Ordering::Equal)
+        });
+        let mut seen = HashSet::new();
+        for entry in &entries {
+            if !seen.insert(entry.node_id) {
+                continue;
+            }
+            let title = ctx.node_title(entry.node_id).unwrap_or("unknown");
+            section.push_str(&format!(
+                "  - '{}' → {} (weight {:.2})\n",
+                hit.keyword, title, entry.weight
+            ));
+            if section.len() > 800 {
+                section.push_str("  ... (more)\n");
+                return section;
+            }
+        }
+    }
+    section
+}
+
+/// Build the ancestor path string for a node (e.g., "root/Chapter 1/Section 1.2").
+pub fn build_ancestor_path(node_id: crate::document::NodeId, ctx: &DocContext<'_>) -> String {
+    let mut path: Vec<crate::document::NodeId> = ctx.tree.ancestors_iter(node_id).collect();
+    path.reverse();
+    path.iter()
+        .filter_map(|&id| ctx.node_title(id))
+        .collect::<Vec<_>>()
+        .join("/")
+}
+
+/// Build intent-specific index signals for the planning prompt.
+///
+/// Injects pre-computed ReasoningIndex data as context for the LLM:
+/// - Summary intent → summary_shortcut (document overview + section summaries)
+/// - Navigational intent → section_map matches from query keywords
+/// - Factual/Analytical → no additional signals (keyword hits already injected)
+fn build_intent_signals(intent: QueryIntent, ctx: &DocContext<'_>) -> String {
+    match intent {
+        QueryIntent::Summary => {
+            let shortcut = match ctx.summary_shortcut() {
+                Some(s) => s,
+                None => return String::new(),
+            };
+            let mut section = String::from(
+                "\nPre-computed document overview (use this to plan breadth-first scan):\n",
+            );
+            if !shortcut.document_summary.is_empty() {
+                section.push_str(&format!(
+                    "Document summary: {}\n",
+                    &shortcut.document_summary[..shortcut.document_summary.len().min(500)]
+                ));
+            }
+            for ss in &shortcut.section_summaries {
+                let summary_preview = if ss.summary.len() > 200 {
+                    format!("{}...", &ss.summary[..200])
+                } else {
+                    ss.summary.clone()
+                };
+                section.push_str(&format!(
+                    "  - Section '{}' (depth {}): {}\n",
+                    ss.title, ss.depth, summary_preview
+                ));
+                if section.len() > PLAN_CONTEXT_BUDGET {
+                    section.push_str("  ... (more sections truncated)\n");
+                    break;
+                }
+            }
+            section
+        }
+        QueryIntent::Navigational => {
+            let root = ctx.root();
+            let routes = match ctx.ls(root) {
+                Some(r) => r,
+                None => return String::new(),
+            };
+            let mut section =
+                String::from("\nSection map (known top-level sections for direct navigation):\n");
+            for route in routes {
+                section.push_str(&format!(
+                    "  - {} ({} leaves)\n",
+                    route.title, route.leaf_count
+                ));
+            }
+            section
+        }
+        _ => String::new(),
+    }
+}
+
+/// Build semantic hints section using BM25 scoring over child routes.
+fn build_semantic_hints(
+    query_keywords: &[String],
+    query_lower: &str,
+    ctx: &DocContext<'_>,
+) -> String {
+    let root = ctx.root();
+    let routes = match ctx.ls(root) {
+        Some(r) => r,
+        None => return String::new(),
+    };
+
+    if routes.is_empty() {
+        return String::new();
+    }
+
+    let field_docs: Vec<FieldDocument<String>> = routes
+        .iter()
+        .map(|route| {
+            let nav = ctx.nav_entry(route.node_id);
+            let overview = nav.map(|n| n.overview.as_str()).unwrap_or("");
+            let hints_text = nav.map(|n| n.question_hints.join(" ")).unwrap_or_default();
+            let tags_text = nav.map(|n| n.topic_tags.join(" ")).unwrap_or_default();
+            let content = if overview.is_empty() && hints_text.is_empty() && tags_text.is_empty() {
+                String::new()
+            } else {
+                format!("{} {} {}", overview, hints_text, tags_text)
+            };
+            FieldDocument::new(
+                route.title.clone(),
+                route.title.clone(),
+                route.description.clone(),
+                content,
+            )
+        })
+        .collect();
+
+    let engine = Bm25Engine::fit_to_corpus(&field_docs);
+    let bm25_results: std::collections::HashMap<String, f32> = engine
+        .search_weighted(query_lower, routes.len())
+        .into_iter()
+        .collect();
+
+    let mut section = String::new();
+    let budget_remaining = PLAN_CONTEXT_BUDGET.saturating_sub(section.len());
+
+    for route in routes {
+        let nav = match ctx.nav_entry(route.node_id) {
+            Some(n) => n,
+            None => continue,
+        };
+
+        let bm25_score = bm25_results.get(&route.title).copied().unwrap_or(0.0);
+        if bm25_score <= 0.0 {
+            continue;
+        }
+
+        let mut annotations = Vec::new();
+
+        for hint in &nav.question_hints {
+            let hint_lower = hint.to_lowercase();
+            for kw in query_keywords {
+                if hint_lower.contains(&kw.to_lowercase()) {
+                    annotations.push(format!("question \"{}\"", hint));
+                    break;
+                }
+            }
+            if !annotations.iter().any(|a| a.contains(&hint.clone())) {
+                for word in hint_lower.split_whitespace() {
+                    if word.len() > 3 && query_lower.contains(word) {
+                        annotations.push(format!("question \"{}\"", hint));
+                        break;
+                    }
+                }
+            }
+        }
+
+        for tag in &nav.topic_tags {
+            let tag_lower = tag.to_lowercase();
+            for kw in query_keywords {
+                if tag_lower.contains(&kw.to_lowercase()) || kw.to_lowercase().contains(&tag_lower)
+                {
+                    annotations.push(format!("topic \"{}\"", tag));
+                    break;
+                }
+            }
+            if !annotations
+                .iter()
+                .any(|a| a.contains(&format!("topic \"{}\"", tag)))
+            {
+                if query_lower.contains(&tag_lower) && tag.len() > 2 {
+                    annotations.push(format!("topic \"{}\"", tag));
+                }
+            }
+        }
+
+        let annotation_str = if annotations.is_empty() {
+            String::new()
+        } else {
+            format!(", {}", annotations.join(", "))
+        };
+
+        let line = format!(
+            "  - Section '{}' — BM25: {:.2}{}\n",
+            route.title, bm25_score, annotation_str
+        );
+        if section.len() + line.len() > budget_remaining {
+            break;
+        }
+        section.push_str(&line);
+    }
+
+    if section.is_empty() {
+        String::new()
+    } else {
+        format!(
+            "\nSemantic hints (BM25-scored sections, higher = more relevant):\n{}",
+            section
+        )
+    }
+}
+
+/// For keyword hits that land in deep nodes (depth >= 2), expand the parent node's children.
+fn build_deep_expansion(keyword_hits: &[FindHit], ctx: &DocContext<'_>) -> String {
+    if keyword_hits.is_empty() {
+        return String::new();
+    }
+
+    let mut seen_parents = HashSet::new();
+    let mut expansion = String::new();
+
+    for hit in keyword_hits {
+        for entry in &hit.entries {
+            if entry.depth < 2 {
+                continue;
+            }
+            let parent = match ctx.parent(entry.node_id) {
+                Some(p) => p,
+                None => continue,
+            };
+            if !seen_parents.insert(parent) {
+                continue;
+            }
+            let routes = match ctx.ls(parent) {
+                Some(r) => r,
+                None => continue,
+            };
+            let parent_title = ctx.node_title(parent).unwrap_or("unknown");
+            expansion.push_str(&format!(
+                "Siblings near keyword hit '{}' (under {}):\n",
+                hit.keyword, parent_title
+            ));
+            for route in routes {
+                let marker = if ctx.node_title(entry.node_id) == Some(&route.title) {
+                    " ← keyword hit"
+                } else {
+                    ""
+                };
+                expansion.push_str(&format!(
+                    "  - {} ({} leaves){}\n",
+                    route.title, route.leaf_count, marker
+                ));
+            }
+            expansion.push('\n');
+            if expansion.len() > 500 {
+                expansion.push_str("  ... (more expansions truncated)\n");
+                break;
+            }
+        }
+        if expansion.len() > 500 {
+            break;
+        }
+    }
+
+    expansion
+}
+
+/// Build unvisited sibling branch hints for structured backtracking.
+fn build_sibling_hints(state: &WorkerState, ctx: &DocContext<'_>) -> String {
+    let mut hints = String::new();
+
+    if let Some(parent) = ctx.parent(state.current_node) {
+        if let Some(routes) = ctx.ls(parent) {
+            let unvisited: Vec<&crate::document::ChildRoute> = routes
+                .iter()
+                .filter(|r| !state.visited.contains(&r.node_id))
+                .collect();
+            if !unvisited.is_empty() {
+                hints.push_str("Unvisited sibling branches at current level:\n");
+                for route in &unvisited {
+                    hints.push_str(&format!(
+                        "  - {} ({} leaves)\n",
+                        route.title, route.leaf_count
+                    ));
+                }
+            }
+        }
+
+        if let Some(grandparent) = ctx.parent(parent) {
+            if let Some(routes) = ctx.ls(grandparent) {
+                let unvisited_parent_siblings: Vec<&crate::document::ChildRoute> = routes
+                    .iter()
+                    .filter(|r| !state.visited.contains(&r.node_id) && r.node_id != parent)
+                    .collect();
+                if !unvisited_parent_siblings.is_empty() {
+                    hints.push_str("Unvisited branches at parent level (cd .. then explore):\n");
+                    for route in &unvisited_parent_siblings {
+                        hints.push_str(&format!(
+                            "  - {} ({} leaves)\n",
+                            route.title, route.leaf_count
+                        ));
+                    }
+                }
+            }
+        }
+    }
+
+    if hints.is_empty() {
+        String::new()
+    } else {
+        format!("\n{}", hints)
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::agent::config::DocContext;
+    use crate::agent::config::Evidence;
+    use crate::agent::state::WorkerState;
+    use crate::document::{ChildRoute, NavEntry, NodeId};
+    use crate::scoring::bm25::extract_keywords;
+
+    fn build_semantic_test_tree() -> (
+        crate::document::DocumentTree,
+        crate::document::NavigationIndex,
+        NodeId,
+        NodeId,
+        NodeId,
+    ) {
+        let mut tree = crate::document::DocumentTree::new("Root", "root content");
+        let root = tree.root();
+        let revenue = tree.add_child(root, "Revenue", "revenue content");
+        let expenses = tree.add_child(root, "Expenses", "expense content");
+
+        let mut nav = crate::document::NavigationIndex::new();
+        nav.add_entry(
+            root,
+            NavEntry {
+                overview: "Annual financial report".to_string(),
+                question_hints: vec!["What is the financial overview?".to_string()],
+                topic_tags: vec!["finance".to_string()],
+                leaf_count: 4,
+                level: 0,
+            },
+        );
+        nav.add_child_routes(
+            root,
+            vec![
+                ChildRoute {
+                    node_id: revenue,
+                    title: "Revenue".to_string(),
+                    description: "Revenue breakdown".to_string(),
+                    leaf_count: 2,
+                },
+                ChildRoute {
+                    node_id: expenses,
+                    title: "Expenses".to_string(),
+                    description: "Cost analysis".to_string(),
+                    leaf_count: 2,
+                },
+            ],
+        );
+        nav.add_entry(
+            revenue,
+            NavEntry {
+                overview: "Revenue figures for 2024".to_string(),
+                question_hints: vec![
+                    "What is the total revenue?".to_string(),
+                    "What was the Q1 revenue?".to_string(),
+                ],
+                topic_tags: vec![
+                    "revenue".to_string(),
+                    "sales".to_string(),
+                    "income".to_string(),
+                ],
+                leaf_count: 2,
+                level: 1,
+            },
+        );
+        nav.add_entry(
+            expenses,
+            NavEntry {
+                overview: "Operating expenses".to_string(),
+                question_hints: vec!["What are the operating costs?".to_string()],
+                topic_tags: vec!["expenses".to_string(), "costs".to_string()],
+                leaf_count: 2,
+                level: 1,
+            },
+        );
+
+        (tree, nav, root, revenue, expenses)
+    }
+
+    #[test]
+    fn test_build_ancestor_path() {
+        let (tree, nav, root, revenue, _) = build_semantic_test_tree();
+        let ctx = DocContext {
+            tree: &tree,
+            nav_index: &nav,
+            reasoning_index: &crate::document::ReasoningIndex::default(),
+            doc_name: "test",
+        };
+        assert_eq!(build_ancestor_path(revenue, &ctx), "Root/Revenue");
+        assert_eq!(build_ancestor_path(root, &ctx), "Root");
+    }
+
+    #[test]
+    fn test_semantic_hints_keyword_match() {
+        let (tree, nav, _, _, _) = build_semantic_test_tree();
+        let ctx = DocContext {
+            tree: &tree,
+            nav_index: &nav,
+            reasoning_index: &crate::document::ReasoningIndex::default(),
+            doc_name: "test",
+        };
+        let keywords = extract_keywords("What is the revenue?");
+        let hints = build_semantic_hints(&keywords, &"what is the revenue".to_lowercase(), &ctx);
+        assert!(
+            hints.contains("Revenue"),
+            "Should match Revenue section, got: {}",
+            hints
+        );
+        assert!(hints.contains("BM25"));
+    }
+
+    #[test]
+    fn test_semantic_hints_topic_match() {
+        let (tree, nav, _, _, _) = build_semantic_test_tree();
+        let ctx = DocContext {
+            tree: &tree,
+            nav_index: &nav,
+            reasoning_index: &crate::document::ReasoningIndex::default(),
+            doc_name: "test",
+        };
+        let keywords = extract_keywords("operating costs analysis");
+        let hints =
+            build_semantic_hints(&keywords, &"operating costs analysis".to_lowercase(), &ctx);
+        assert!(
+            hints.contains("Expenses"),
+            "Should match Expenses via topic 'costs', got: {}",
+            hints
+        );
+    }
+
+    #[test]
+    fn test_semantic_hints_no_match() {
+        let (tree, nav, _, _, _) = build_semantic_test_tree();
+        let ctx = DocContext {
+            tree: &tree,
+            nav_index: &nav,
+            reasoning_index: &crate::document::ReasoningIndex::default(),
+            doc_name: "test",
+        };
+        let keywords = extract_keywords("xyzzy foobar");
+        let hints = build_semantic_hints(&keywords, &"xyzzy foobar".to_lowercase(), &ctx);
+        assert!(hints.is_empty(), "Should not match, got: {}", hints);
+    }
+
+    #[test]
+    fn test_build_replan_prompt() {
+        let (tree, nav, root, _, _) = build_semantic_test_tree();
+        let mut state = WorkerState::new(root, 8);
+        state.missing_info = "Need Q2 revenue figures".to_string();
+        state.add_evidence(Evidence {
+            source_path: "root/Revenue".to_string(),
+            node_title: "Revenue".to_string(),
+            content: "Q1 revenue was $2.5M".to_string(),
+            doc_name: None,
+        });
+        let ctx = DocContext {
+            tree: &tree,
+            nav_index: &nav,
+            reasoning_index: &crate::document::ReasoningIndex::default(),
+            doc_name: "test",
+        };
+        let (system, user) = build_replan_prompt("What is total revenue?", None, &state, &ctx);
+        assert!(system.contains("re-planning"));
+        assert!(user.contains("What is total revenue?"));
+        assert!(user.contains("Q2 revenue"));
+    }
+
+    #[test]
+    fn test_build_plan_prompt_with_semantic_hints() {
+        let (tree, nav, _, _, _) = build_semantic_test_tree();
+        let ctx = DocContext {
+            tree: &tree,
+            nav_index: &nav,
+            reasoning_index: &crate::document::ReasoningIndex::default(),
+            doc_name: "Financial Report",
+        };
+        let ls_output =
+            "[1] Revenue — Revenue breakdown (2 leaves)\n[2] Expenses — Cost analysis (2 leaves)\n";
+        let (system, user) = build_plan_prompt(
+            "What is the revenue?",
+            None,
+            ls_output,
+            "Financial Report",
+            &[],
+            &ctx,
+            QueryIntent::Factual,
+        );
+        assert!(system.contains("semantic hints"));
+        assert!(user.contains("What is the revenue?"));
+    }
+}
diff --git a/rust/src/client/builder.rs b/rust/src/client/builder.rs
index 5f08af95..b3ccc6ea 100644
--- a/rust/src/client/builder.rs
+++ b/rust/src/client/builder.rs
@@ -7,8 +7,8 @@
 //! [`Engine`] instances with sensible defaults.
 
 use crate::{
-    client::engine::Engine, config::Config, events::EventEmitter, metrics::MetricsHub,
-    retrieval::PipelineRetriever, storage::Workspace,
+    client::engine::Engine, client::retriever::RetrieverClient, config::Config,
+    events::EventEmitter, metrics::MetricsHub, storage::Workspace,
 };
 
 /// Builder for creating a [`Engine`] client.
@@ -195,17 +195,8 @@ impl EngineBuilder {
         // Indexer uses pool.index()
         let indexer = crate::client::indexer::IndexerClient::with_llm(pool.index().clone());
 
-        // Retriever uses pool.retrieval()
-        let retrieval_config = config.retrieval.clone();
-        let mut retriever =
-            PipelineRetriever::new().with_max_iterations(retrieval_config.search.max_iterations);
-        retriever = retriever.with_llm_client(pool.retrieval().clone());
-
-        // Configure content aggregator if enabled
-        if retrieval_config.content.enabled {
-            retriever =
-                retriever.with_content_config(retrieval_config.content.to_aggregator_config());
-        }
+        // Retriever uses pool.retrieval() via agent system
+        let retriever = RetrieverClient::new(pool.retrieval().clone());
 
         // Build engine
         let events = self.events.unwrap_or_default();
diff --git a/rust/src/client/engine.rs b/rust/src/client/engine.rs
index a607263c..3658ceb2 100644
--- a/rust/src/client/engine.rs
+++ b/rust/src/client/engine.rs
@@ -41,7 +41,7 @@ use std::{
     collections::HashMap,
     sync::Arc,
     sync::Mutex,
-    sync::atomic::{AtomicBool, AtomicU32, Ordering},
+    sync::atomic::{AtomicBool, Ordering},
 };
 
 use futures::StreamExt;
@@ -57,7 +57,7 @@ use crate::{
         incremental::{self, IndexAction},
     },
     metrics::MetricsHub,
-    retrieval::{PipelineRetriever, RetrieveEventReceiver},
+    retrieval::RetrieveEventReceiver,
     storage::{PersistedDocument, Workspace},
 };
 
@@ -66,18 +66,13 @@ use super::{
     indexer::IndexerClient,
     query_context::{QueryContext, QueryScope},
     retriever::RetrieverClient,
-    types::{
-        DocumentInfo, FailedItem, IndexItem, IndexMode, IndexResult, QueryResult, QueryResultItem,
-    },
+    types::{DocumentInfo, FailedItem, IndexItem, IndexMode, IndexResult, QueryResult},
     workspace::WorkspaceClient,
 };
 
 /// Shared cancel state: `true` means cancelled.
 type CancelFlag = Arc<AtomicBool>;
 
-/// Max consecutive graph rebuild failures before giving up.
-const GRAPH_REBUILD_MAX_FAILURES: u32 = 3;
-
 /// The main Engine client.
 ///
 /// Provides high-level operations for document indexing and retrieval.
@@ -107,12 +102,6 @@ pub struct Engine {
     /// Central metrics hub for unified collection.
     metrics_hub: Arc<MetricsHub>,
 
-    /// Whether the document graph needs rebuilding (set after index, consumed in query).
-    graph_dirty: Arc<AtomicBool>,
-
-    /// Consecutive graph rebuild failures — skip rebuild after threshold.
-    graph_fail_count: Arc<AtomicU32>,
-
     /// Shared cancel flag — set by `cancel()`, checked by long-running operations.
     cancelled: CancelFlag,
 
@@ -129,7 +118,7 @@ impl Engine {
     pub(crate) async fn with_components(
         config: Config,
         workspace: Workspace,
-        retriever: PipelineRetriever,
+        retriever: RetrieverClient,
         indexer: IndexerClient,
         events: EventEmitter,
         metrics_hub: Arc<MetricsHub>,
@@ -139,8 +128,8 @@ impl Engine {
         // Attach event emitter to indexer
         let indexer = indexer.with_events(events.clone());
 
-        // Create retriever client
-        let retriever = RetrieverClient::new(retriever).with_events(events.clone());
+        // Attach event emitter to retriever
+        let retriever = retriever.with_events(events.clone());
 
         // Create workspace client
         let workspace_client = WorkspaceClient::new(workspace)
@@ -153,8 +142,6 @@ impl Engine {
             retriever,
             workspace: workspace_client,
             metrics_hub,
-            graph_dirty: Arc::new(AtomicBool::new(false)),
-            graph_fail_count: Arc::new(AtomicU32::new(0)),
             cancelled: Arc::new(AtomicBool::new(false)),
             active_ops: Arc::new(Mutex::new(0)),
         })
@@ -205,11 +192,15 @@ impl Engine {
                 )));
             }
 
-            // Mark graph as dirty — will be lazily rebuilt on next query()
-            // Also reset failure count so the new data gets a fresh rebuild attempt.
+            // Rebuild cross-document graph in the background so index returns immediately.
             if !items.is_empty() && self.config.graph.enabled {
-                self.graph_dirty.store(true, Ordering::Relaxed);
-                self.graph_fail_count.store(0, Ordering::Relaxed);
+                let engine = self.clone();
+                tokio::spawn(async move {
+                    info!("Rebuilding document graph in background...");
+                    if let Err(e) = engine.rebuild_graph().await {
+                        tracing::warn!("Background graph rebuild failed: {e}");
+                    }
+                });
             }
 
             Ok(IndexResult::with_partial(items, failed))
@@ -460,108 +451,22 @@ impl Engine {
 
         self.with_timeout(timeout_secs, async move {
             let doc_ids = self.resolve_scope(&ctx.scope).await?;
-            let mut options = ctx.to_retrieve_options(&self.config);
-
-            // Lazy graph rebuild: only rebuild if index() marked it dirty
-            if self.config.graph.enabled {
-                let fail_count = self.graph_fail_count.load(Ordering::Relaxed);
-                let should_try = fail_count < GRAPH_REBUILD_MAX_FAILURES;
-
-                if self.graph_dirty.swap(false, Ordering::Relaxed) {
-                    if should_try {
-                        if let Err(e) = self.rebuild_graph().await {
-                            let count = self.graph_fail_count.fetch_add(1, Ordering::Relaxed) + 1;
-                            tracing::warn!(count, "Graph rebuild failed: {e}");
-                            // Re-mark dirty so next query retries
-                            self.graph_dirty.store(true, Ordering::Relaxed);
-                        } else {
-                            // Reset failure count on success
-                            self.graph_fail_count.store(0, Ordering::Relaxed);
-                        }
-                    } else {
-                        tracing::warn!(
-                            count = fail_count,
-                            "Skipping graph rebuild after {} consecutive failures",
-                            fail_count
-                        );
-                    }
-                }
-                // Load (now up-to-date) graph for retrieval
-                if let Ok(Some(graph)) = self.workspace.get_graph().await {
-                    options = options.with_document_graph(Arc::new(graph));
-                }
-            }
-
-            // Query documents in parallel (with concurrency limit)
-            let concurrency = self.config.llm.throttle.max_concurrent_requests;
-            let query = ctx.query.clone();
-            let cancelled = Arc::clone(&self.cancelled);
-
-            let results: Vec<(String, std::result::Result<QueryResultItem, String>)> =
-                futures::stream::iter(doc_ids.into_iter())
-                    .map(|doc_id| {
-                        let engine = self.clone();
-                        let options = options.clone();
-                        let query = query.clone();
-                        let cancelled = Arc::clone(&cancelled);
-                        async move {
-                            if cancelled.load(Ordering::Relaxed) {
-                                return (doc_id, Err("Operation cancelled".to_string()));
-                            }
-
-                            let (tree, reasoning_index) = match engine.get_structure(&doc_id).await
-                            {
-                                Ok(t) => t,
-                                Err(e) => return (doc_id, Err(e.to_string())),
-                            };
-
-                            match engine
-                                .retriever
-                                .query_with_reasoning_index(
-                                    &tree,
-                                    &query,
-                                    &options,
-                                    reasoning_index,
-                                )
-                                .await
-                            {
-                                Ok(mut result) => {
-                                    result.doc_id = doc_id.clone();
-                                    (doc_id, Ok(result))
-                                }
-                                Err(e) => (doc_id, Err(e.to_string())),
-                            }
-                        }
-                    })
-                    .buffer_unordered(concurrency)
-                    .collect()
-                    .await;
 
-            let mut items = Vec::new();
-            let mut failed = Vec::new();
-            for (doc_id, result) in results {
-                match result {
-                    Ok(item) => items.push(item),
-                    Err(e) => {
-                        tracing::warn!("Query failed for {}: {}", doc_id, e);
-                        failed.push(FailedItem::new(&doc_id, e));
-                    }
-                }
-            }
-
-            if items.is_empty() && !failed.is_empty() {
+            let (documents, failed) = self.load_documents(&doc_ids).await?;
+            if documents.is_empty() {
                 return Err(Error::Config(format!(
-                    "Query failed for all {} document(s): {}",
-                    failed.len(),
-                    failed
-                        .iter()
-                        .map(|f| format!("{} ({})", f.source, f.error))
-                        .collect::<Vec<_>>()
-                        .join("; ")
+                    "No documents available for query: {} failures",
+                    failed.len()
                 )));
             }
 
-            Ok(QueryResult::with_partial(items, failed))
+            let skip_analysis = !ctx.force_analysis;
+            let mut result = self
+                .retriever
+                .query(&documents, &ctx.query, skip_analysis)
+                .await?;
+            result.failed.extend(failed);
+            Ok(result)
         })
         .await
     }
@@ -569,28 +474,337 @@ impl Engine {
     /// Query a document with streaming results.
     ///
     /// Returns a receiver that yields retrieval events
-    /// as the retrieval pipeline progresses through each stage.
+    /// as the retrieval agent progresses through navigation.
     ///
-    /// Only supports single-document scope (via `with_doc_ids` with one ID).
+    /// Supports single-document and multi-document scope.
+    /// Events are translated from the agent's internal [`AgentEvent`](crate::agent::AgentEvent)
+    /// into the public [`RetrieveEvent`] stream.
     pub async fn query_stream(&self, ctx: QueryContext) -> Result<RetrieveEventReceiver> {
-        let doc_id = match &ctx.scope {
-            QueryScope::Documents(ids) if ids.len() == 1 => ids[0].clone(),
-            _ => {
-                return Err(Error::Config(
-                    "query_stream requires a single doc_id via with_doc_ids".to_string(),
-                ));
-            }
-        };
+        self.check_cancel()?;
+        let _guard = self.inc_active();
+
+        let doc_ids = self.resolve_scope(&ctx.scope).await?;
+        let query = ctx.query.clone();
+
+        // Load all requested documents (need owned PersistedDocument for spawned task)
+        let mut docs = Vec::new();
+        for doc_id in &doc_ids {
+            let doc = match self.workspace.load(doc_id).await? {
+                Some(d) => d,
+                None => return Err(Error::Config(format!("Document not found: {}", doc_id))),
+            };
+            docs.push((doc_id.clone(), doc));
+        }
+
+        // Create agent event channel
+        let (agent_tx, mut agent_rx) =
+            crate::agent::events::channel(crate::agent::events::DEFAULT_AGENT_EVENT_BOUND);
+        let (retrieve_tx, retrieve_rx) =
+            crate::retrieval::stream::channel(crate::retrieval::stream::DEFAULT_STREAM_BOUND);
+
+        // Spawn a task that translates AgentEvents → RetrieveEvents
+        tokio::spawn(async move {
+            use crate::agent::AgentEvent;
+            use crate::retrieval::stream::RetrieveEvent;
+
+            while let Some(event) = agent_rx.recv().await {
+                let translated = match event {
+                    // ── Query Understanding ──
+                    AgentEvent::QueryUnderstandingStarted { query } => RetrieveEvent::Started {
+                        query,
+                        strategy: "query_understanding".to_string(),
+                    },
+                    AgentEvent::QueryUnderstandingCompleted { query, .. } => {
+                        RetrieveEvent::StageCompleted {
+                            stage: format!("query_understanding: {}", query),
+                            elapsed_ms: 0,
+                        }
+                    }
 
-        let (tree, _reasoning_index) = self.get_structure(&doc_id).await?;
-        let options = ctx.to_retrieve_options(&self.config);
+                    // ── Orchestrator ──
+                    AgentEvent::OrchestratorStarted {
+                        query,
+                        doc_count,
+                        skip_analysis,
+                    } => RetrieveEvent::Started {
+                        query,
+                        strategy: if skip_analysis {
+                            "orchestrator_skip_analysis".to_string()
+                        } else {
+                            format!("orchestrator({}_docs)", doc_count)
+                        },
+                    },
+                    AgentEvent::OrchestratorAnalyzing {
+                        doc_count,
+                        keywords,
+                    } => RetrieveEvent::StageCompleted {
+                        stage: format!(
+                            "orchestrator_analyzing_{}_docs_kw_{}",
+                            doc_count,
+                            keywords.len()
+                        ),
+                        elapsed_ms: 0,
+                    },
+                    AgentEvent::WorkerDispatched {
+                        doc_idx,
+                        doc_name,
+                        task,
+                        ..
+                    } => RetrieveEvent::StageCompleted {
+                        stage: format!("dispatch_{}_{}_{}", doc_idx, doc_name, task.len().min(30)),
+                        elapsed_ms: 0,
+                    },
+                    AgentEvent::WorkerCompleted {
+                        doc_idx,
+                        doc_name,
+                        evidence_count,
+                        rounds_used,
+                        llm_calls,
+                        success,
+                    } => RetrieveEvent::StageCompleted {
+                        stage: format!(
+                            "worker_{}_{}_done_e{}_r{}_l{}_{}",
+                            doc_idx, doc_name, evidence_count, rounds_used, llm_calls, success
+                        ),
+                        elapsed_ms: 0,
+                    },
+                    AgentEvent::OrchestratorEvaluated {
+                        sufficient,
+                        evidence_count,
+                        missing_info: _,
+                    } => RetrieveEvent::SufficiencyCheck {
+                        level: if sufficient {
+                            crate::retrieval::SufficiencyLevel::Sufficient
+                        } else {
+                            crate::retrieval::SufficiencyLevel::Insufficient
+                        },
+                        tokens: evidence_count,
+                    },
+                    AgentEvent::OrchestratorReplanning {
+                        reason,
+                        evidence_count,
+                    } => RetrieveEvent::StageCompleted {
+                        stage: format!(
+                            "orchestrator_replan_{}_e{}",
+                            &reason[..reason.len().min(30)],
+                            evidence_count
+                        ),
+                        elapsed_ms: 0,
+                    },
+                    AgentEvent::OrchestratorCompleted {
+                        evidence_count,
+                        total_llm_calls,
+                        dispatch_rounds,
+                    } => RetrieveEvent::StageCompleted {
+                        stage: format!(
+                            "orchestrator_done_e{}_l{}_r{}",
+                            evidence_count, total_llm_calls, dispatch_rounds
+                        ),
+                        elapsed_ms: 0,
+                    },
+
+                    // ── Worker ──
+                    AgentEvent::WorkerStarted {
+                        doc_name,
+                        task: _,
+                        max_rounds,
+                    } => RetrieveEvent::StageCompleted {
+                        stage: format!("worker_started_{}_r{}", doc_name, max_rounds),
+                        elapsed_ms: 0,
+                    },
+                    AgentEvent::WorkerPlanGenerated { doc_name, plan_len } => {
+                        RetrieveEvent::StageCompleted {
+                            stage: format!("plan_{}_{}chars", doc_name, plan_len),
+                            elapsed_ms: 0,
+                        }
+                    }
+                    AgentEvent::WorkerRound {
+                        doc_name,
+                        round,
+                        command,
+                        success: _,
+                        elapsed_ms,
+                    } => RetrieveEvent::StageCompleted {
+                        stage: format!("round_{}_{}_{}", doc_name, round, command),
+                        elapsed_ms,
+                    },
+                    AgentEvent::EvidenceCollected {
+                        doc_name,
+                        node_title,
+                        source_path,
+                        content_len,
+                        total_evidence: _,
+                    } => RetrieveEvent::ContentFound {
+                        node_id: source_path,
+                        title: format!("[{}] {}", doc_name, node_title),
+                        preview: String::new(),
+                        score: if content_len > 0 { 0.8 } else { 0.0 },
+                    },
+                    AgentEvent::WorkerSufficiencyCheck {
+                        doc_name: _,
+                        sufficient,
+                        evidence_count,
+                        ..
+                    } => RetrieveEvent::SufficiencyCheck {
+                        level: if sufficient {
+                            crate::retrieval::SufficiencyLevel::Sufficient
+                        } else {
+                            crate::retrieval::SufficiencyLevel::Insufficient
+                        },
+                        tokens: evidence_count,
+                    },
+                    AgentEvent::WorkerReplan {
+                        doc_name,
+                        missing_info,
+                        plan_len,
+                    } => RetrieveEvent::StageCompleted {
+                        stage: format!(
+                            "replan_{}_{}_{}chars",
+                            doc_name,
+                            &missing_info[..missing_info.len().min(30)],
+                            plan_len
+                        ),
+                        elapsed_ms: 0,
+                    },
+                    AgentEvent::WorkerBudgetWarning {
+                        doc_name,
+                        warning_type,
+                        round,
+                    } => RetrieveEvent::StageCompleted {
+                        stage: format!(
+                            "budget_warning_{}_{}_round_{}",
+                            doc_name, warning_type, round
+                        ),
+                        elapsed_ms: 0,
+                    },
+                    AgentEvent::WorkerDone {
+                        doc_name,
+                        evidence_count,
+                        rounds_used,
+                        llm_calls,
+                        budget_exhausted: _,
+                        plan_generated: _,
+                    } => RetrieveEvent::StageCompleted {
+                        stage: format!(
+                            "worker_done_{}_e{}_r{}_l{}",
+                            doc_name, evidence_count, rounds_used, llm_calls
+                        ),
+                        elapsed_ms: 0,
+                    },
+
+                    // ── Answer Pipeline ──
+                    AgentEvent::AnswerStarted {
+                        evidence_count,
+                        multi_doc,
+                    } => RetrieveEvent::StageCompleted {
+                        stage: format!(
+                            "answer_start_{}_e{}",
+                            if multi_doc { "multi" } else { "single" },
+                            evidence_count
+                        ),
+                        elapsed_ms: 0,
+                    },
+                    AgentEvent::AnswerCompleted {
+                        answer_len,
+                        confidence,
+                    } => RetrieveEvent::StageCompleted {
+                        stage: format!("synthesis_{}_{}chars", confidence, answer_len),
+                        elapsed_ms: 0,
+                    },
+
+                    // ── Terminal ──
+                    AgentEvent::Completed {
+                        evidence_count,
+                        llm_calls,
+                        answer_len,
+                    } => {
+                        let response = crate::retrieval::RetrieveResponse {
+                            results: Vec::new(),
+                            content: String::new(),
+                            confidence: if evidence_count > 0 { 0.8 } else { 0.0 },
+                            is_sufficient: true,
+                            strategy_used: format!("agent(l={},a={})", llm_calls, answer_len),
+                            reasoning_chain: crate::retrieval::ReasoningChain::default(),
+                            tokens_used: answer_len,
+                        };
+                        let _ = retrieve_tx
+                            .send(RetrieveEvent::Completed { response })
+                            .await;
+                        break; // Completed is terminal
+                    }
+                    AgentEvent::Error { stage, message } => {
+                        let _ = retrieve_tx
+                            .send(RetrieveEvent::Error {
+                                message: format!("[{}] {}", stage, message),
+                            })
+                            .await;
+                        break; // Error is terminal
+                    }
+                };
+
+                // For non-terminal events, send the translated event
+                if !matches!(
+                    translated,
+                    RetrieveEvent::Completed { .. } | RetrieveEvent::Error { .. }
+                ) {
+                    if retrieve_tx.send(translated).await.is_err() {
+                        break; // Receiver dropped
+                    }
+                }
+            }
+        });
+
+        // Run the agent in a background task
+        let config = self.retriever.config().clone();
+        let llm = self.retriever.llm().clone();
+        let emitter = crate::agent::EventEmitter::new(agent_tx);
+        let metrics_hub = Arc::clone(&self.metrics_hub);
+        let start = std::time::Instant::now();
+
+        tokio::spawn(async move {
+            // Prepare owned indices (fill defaults for missing)
+            let owned_docs: Vec<(
+                String,
+                crate::storage::PersistedDocument,
+                crate::document::NavigationIndex,
+                crate::document::ReasoningIndex,
+            )> = docs
+                .into_iter()
+                .map(|(id, doc)| {
+                    let nav = doc.navigation_index.clone().unwrap_or_default();
+                    let ridx = doc.reasoning_index.clone().unwrap_or_default();
+                    (id, doc, nav, ridx)
+                })
+                .collect();
+
+            // All streaming queries are user-specified docs → always use Scope::Specified
+            let doc_contexts: Vec<crate::agent::DocContext> = owned_docs
+                .iter()
+                .map(|(id, doc, nav, ridx)| crate::agent::DocContext {
+                    tree: &doc.tree,
+                    nav_index: nav,
+                    reasoning_index: ridx,
+                    doc_name: id.as_str(),
+                })
+                .collect();
+            let scope = crate::agent::Scope::Specified(doc_contexts);
+            let result =
+                crate::retrieval::dispatcher::dispatch(&query, scope, &config, &llm, &emitter)
+                    .await;
 
-        let rx = self
-            .retriever
-            .query_stream(&tree, &ctx.query, &options)
-            .await?;
+            // Bridge agent metrics into global MetricsHub
+            if let Ok(output) = result {
+                let m = &output.metrics;
+                let elapsed = start.elapsed();
+                metrics_hub.record_retrieval_query(
+                    m.rounds_used as u64,
+                    m.nodes_visited as u64,
+                    elapsed.as_millis() as u64,
+                );
+            }
+        });
 
-        Ok(rx)
+        Ok(retrieve_rx)
     }
 
     // ============================================================
@@ -660,6 +874,39 @@ impl Engine {
     // Internal
     // ============================================================
 
+    /// Load documents by ID, returning loaded artifacts and failures.
+    async fn load_documents(
+        &self,
+        doc_ids: &[String],
+    ) -> Result<(
+        Vec<(
+            crate::document::DocumentTree,
+            crate::document::NavigationIndex,
+            crate::document::ReasoningIndex,
+            String,
+        )>,
+        Vec<FailedItem>,
+    )> {
+        let mut documents = Vec::new();
+        let mut failed = Vec::new();
+        for doc_id in doc_ids {
+            match self.workspace.load(doc_id).await {
+                Ok(Some(doc)) => {
+                    let nav_index = doc.navigation_index.unwrap_or_default();
+                    let reasoning_index = doc.reasoning_index.unwrap_or_default();
+                    documents.push((doc.tree, nav_index, reasoning_index, doc_id.clone()));
+                }
+                Ok(None) => {
+                    failed.push(FailedItem::new(doc_id, "Document not found"));
+                }
+                Err(e) => {
+                    failed.push(FailedItem::new(doc_id, &e.to_string()));
+                }
+            }
+        }
+        Ok((documents, failed))
+    }
+
     /// Check cancel flag, returning an error if cancelled.
     fn check_cancel(&self) -> Result<()> {
         if self.cancelled.load(Ordering::Relaxed) {
@@ -677,11 +924,6 @@ impl Engine {
         }
     }
 
-    /// Get current active operation count.
-    pub fn active_operations(&self) -> usize {
-        *self.active_ops.lock().unwrap()
-    }
-
     /// Run a future with an optional timeout.
     /// If `timeout_secs` is `Some`, wraps the future in `tokio::time::timeout`.
     async fn with_timeout<F, T>(&self, timeout_secs: Option<u64>, fut: F) -> Result<T>
@@ -699,19 +941,6 @@ impl Engine {
         }
     }
 
-    /// Get document structure (tree) and optional reasoning index. Internal use only.
-    pub(crate) async fn get_structure(
-        &self,
-        doc_id: &str,
-    ) -> Result<(DocumentTree, Option<crate::document::ReasoningIndex>)> {
-        let doc =
-            self.workspace.load(doc_id).await?.ok_or_else(|| {
-                Error::DocumentNotFound(format!("Document not found: {}", doc_id))
-            })?;
-
-        Ok((doc.tree, doc.reasoning_index))
-    }
-
     /// Resolve QueryScope into a list of document IDs.
     async fn resolve_scope(&self, scope: &QueryScope) -> Result<Vec<String>> {
         match scope {
@@ -844,6 +1073,7 @@ impl Engine {
 
         // Load all documents in parallel and extract keyword profiles
         let doc_ids = self.workspace.inner().list_documents().await;
+        info!(doc_count = doc_ids.len(), "Loading documents for graph rebuild");
         let concurrency = self.config.llm.throttle.max_concurrent_requests;
 
         let loaded: Vec<Option<PersistedDocument>> = futures::stream::iter(doc_ids.iter().cloned())
@@ -855,6 +1085,9 @@ impl Engine {
             .collect()
             .await;
 
+        let loaded_count = loaded.iter().filter(|d| d.is_some()).count();
+        info!(loaded_count, "Documents loaded, building graph");
+
         let mut builder = crate::graph::DocumentGraphBuilder::new(self.config.graph.clone());
         for doc in loaded.into_iter().flatten() {
             let keywords = Self::extract_keywords_from_doc(&doc);
@@ -868,6 +1101,7 @@ impl Engine {
         }
 
         let graph = builder.build();
+        info!(nodes = graph.node_count(), edges = graph.edge_count(), "Graph built, persisting");
         self.workspace.set_graph(&graph).await?;
         Ok(())
     }
@@ -894,8 +1128,6 @@ impl Clone for Engine {
             retriever: self.retriever.clone(),
             workspace: self.workspace.clone(),
             metrics_hub: Arc::clone(&self.metrics_hub),
-            graph_dirty: Arc::clone(&self.graph_dirty),
-            graph_fail_count: Arc::clone(&self.graph_fail_count),
             cancelled: Arc::clone(&self.cancelled),
             active_ops: Arc::clone(&self.active_ops),
         }
@@ -941,20 +1173,6 @@ mod tests {
         assert!(!flag.load(Ordering::Relaxed));
     }
 
-    #[test]
-    fn test_graph_dirty_flag() {
-        let dirty = Arc::new(AtomicBool::new(false));
-        assert!(!dirty.load(Ordering::Relaxed));
-
-        // Simulate: index marks dirty
-        dirty.store(true, Ordering::Relaxed);
-
-        // Simulate: query swaps to false and rebuilds
-        let was_dirty = dirty.swap(false, Ordering::Relaxed);
-        assert!(was_dirty);
-        assert!(!dirty.load(Ordering::Relaxed));
-    }
-
     #[test]
     fn test_active_guard_decrement() {
         let active_ops: Arc<Mutex<usize>> = Arc::new(Mutex::new(0));
diff --git a/rust/src/client/indexed_document.rs b/rust/src/client/indexed_document.rs
index 58560644..3aa78f65 100644
--- a/rust/src/client/indexed_document.rs
+++ b/rust/src/client/indexed_document.rs
@@ -49,6 +49,9 @@ pub(crate) struct IndexedDocument {
 
     /// Pre-computed reasoning index for retrieval acceleration.
     pub reasoning_index: Option<crate::document::ReasoningIndex>,
+
+    /// Pre-computed navigation index for agent-based retrieval.
+    pub navigation_index: Option<crate::document::NavigationIndex>,
 }
 
 impl IndexedDocument {
@@ -65,6 +68,7 @@ impl IndexedDocument {
             pages: Vec::new(),
             metrics: None,
             reasoning_index: None,
+            navigation_index: None,
         }
     }
 
diff --git a/rust/src/client/indexer.rs b/rust/src/client/indexer.rs
index fa6a314f..2c598382 100644
--- a/rust/src/client/indexer.rs
+++ b/rust/src/client/indexer.rs
@@ -279,6 +279,7 @@ impl IndexerClient {
             .with_metrics(result.metrics);
 
         doc.reasoning_index = result.reasoning_index;
+        doc.navigation_index = result.navigation_index;
 
         if let Some(p) = path {
             doc = doc.with_source_path(p);
@@ -365,6 +366,7 @@ impl IndexerClient {
         }
 
         persisted.reasoning_index = doc.reasoning_index;
+        persisted.navigation_index = doc.navigation_index;
         persisted
             .meta
             .update_processing_stats(node_count, summary_tokens, duration_ms);
diff --git a/rust/src/client/mod.rs b/rust/src/client/mod.rs
index 903316fa..8a370e57 100644
--- a/rust/src/client/mod.rs
+++ b/rust/src/client/mod.rs
@@ -95,8 +95,8 @@ pub use query_context::QueryContext;
 // ============================================================
 
 pub use types::{
-    DocumentInfo, FailedItem, IndexItem, IndexMode, IndexOptions, IndexResult, QueryResult,
-    QueryResultItem,
+    Confidence, DocumentInfo, EvidenceItem, FailedItem, IndexItem, IndexMode, IndexOptions,
+    IndexResult, QueryMetrics, QueryResult, QueryResultItem,
 };
 
 // ============================================================
diff --git a/rust/src/client/query_context.rs b/rust/src/client/query_context.rs
index 3b8f0726..e9513315 100644
--- a/rust/src/client/query_context.rs
+++ b/rust/src/client/query_context.rs
@@ -19,9 +19,6 @@
 //! let ctx = QueryContext::new("Explain the algorithm");
 //! ```
 
-use crate::config::Config;
-use crate::retrieval::{RetrieveOptions, StrategyPreference};
-
 /// Query scope — determines which documents to search.
 #[derive(Debug, Clone)]
 pub(crate) enum QueryScope {
@@ -54,14 +51,19 @@ pub struct QueryContext {
     pub(crate) scope: QueryScope,
     /// Maximum tokens for the result content.
     pub(crate) max_tokens: Option<usize>,
-    /// Retrieval strategy override.
-    pub(crate) strategy: Option<StrategyPreference>,
     /// Whether to include the pilot reasoning chain in the result.
     pub(crate) include_reasoning: bool,
     /// Maximum tree traversal depth for the pilot.
     pub(crate) depth_limit: Option<usize>,
     /// Per-operation timeout (seconds). `None` means no timeout.
     pub(crate) timeout_secs: Option<u64>,
+    /// Force Orchestrator analysis even when documents are specified.
+    ///
+    /// When `true`, the Orchestrator analyzes DocCards to select relevant
+    /// documents instead of dispatching all specified docs directly.
+    /// Useful when the user wants the system to decide which documents
+    /// (or sections) are most relevant to the query.
+    pub(crate) force_analysis: bool,
 }
 
 impl QueryContext {
@@ -71,10 +73,10 @@ impl QueryContext {
             query: query.into(),
             scope: QueryScope::Workspace,
             max_tokens: None,
-            strategy: None,
             include_reasoning: true,
             depth_limit: None,
             timeout_secs: None,
+            force_analysis: false,
         }
     }
 
@@ -99,12 +101,6 @@ impl QueryContext {
         self
     }
 
-    /// Set the retrieval strategy.
-    pub fn with_strategy(mut self, strategy: StrategyPreference) -> Self {
-        self.strategy = Some(strategy);
-        self
-    }
-
     /// Set whether to include the pilot reasoning chain.
     pub fn with_include_reasoning(mut self, include: bool) -> Self {
         self.include_reasoning = include;
@@ -123,22 +119,19 @@ impl QueryContext {
         self
     }
 
-    /// Convert to internal `RetrieveOptions`, merging with engine config.
-    pub(crate) fn to_retrieve_options(&self, config: &Config) -> RetrieveOptions {
-        let mut opts = RetrieveOptions::new()
-            .with_top_k(config.retrieval.top_k)
-            .with_include_content(true)
-            .with_include_summaries(true);
-
-        if let Some(max_tokens) = self.max_tokens {
-            opts = opts.with_max_tokens(max_tokens);
-        }
-
-        if let Some(strategy) = &self.strategy {
-            opts = opts.with_strategy(strategy.clone());
-        }
-
-        opts
+    /// Force the Orchestrator to analyze documents before dispatching Workers.
+    ///
+    /// By default, when documents are specified via `with_doc_ids()`, the
+    /// Orchestrator skips its analysis phase and dispatches Workers to all
+    /// specified documents directly. Setting this to `true` forces the
+    /// Orchestrator to analyze DocCards and decide which documents are
+    /// relevant, even when the user specified documents explicitly.
+    ///
+    /// This is useful when querying across many documents where only a subset
+    /// is likely relevant to the specific question.
+    pub fn with_force_analysis(mut self, force: bool) -> Self {
+        self.force_analysis = force;
+        self
     }
 }
 
diff --git a/rust/src/client/retriever.rs b/rust/src/client/retriever.rs
index 6e612571..67f53f6b 100644
--- a/rust/src/client/retriever.rs
+++ b/rust/src/client/retriever.rs
@@ -3,47 +3,39 @@
 
 //! Document retrieval client.
 //!
-//! This module provides query and retrieval operations for document content.
-//!
-//! # Example
-//!
-//! ```rust,ignore
-//! let retriever = RetrieverClient::new(pipeline_retriever);
-//!
-//! let result = retriever
-//!     .query(&tree, "What is this?", RetrieveOptions::default())
-//!     .await?;
-//!
-//! println!("Found {} results", result.results.len());
-//! ```
-
-use std::sync::Arc;
+//! This module provides query and retrieval operations for document content,
+//! dispatching through the retrieval layer to the agent-based system.
 
 use tracing::info;
 
-use super::types::QueryResultItem;
-use crate::document::{DocumentTree, ReasoningIndex};
-use crate::error::{Error, Result};
+use crate::agent::{self, config::AgentConfig, events::EventEmitter as AgentEventEmitter};
+use crate::client::types::QueryResult;
+use crate::document::{DocumentTree, NavigationIndex, ReasoningIndex};
+use crate::error::Result;
 use crate::events::{EventEmitter, QueryEvent};
-use crate::retrieval::stream::RetrieveEventReceiver;
-use crate::retrieval::{RetrieveOptions, RetrieveResponse};
+use crate::llm::LlmClient;
+use crate::retrieval::{dispatcher, postprocessor};
 
 /// Document retrieval client.
 ///
-/// Provides operations for querying document content.
+/// Delegates to the agent-based retrieval system.
 pub(crate) struct RetrieverClient {
-    /// Pipeline retriever.
-    retriever: Arc<crate::retrieval::PipelineRetriever>,
+    /// LLM client for agent navigation decisions.
+    llm: LlmClient,
+
+    /// Agent configuration.
+    config: AgentConfig,
 
     /// Event emitter.
     events: EventEmitter,
 }
 
 impl RetrieverClient {
-    /// Create a new retriever client.
-    pub fn new(retriever: crate::retrieval::PipelineRetriever) -> Self {
+    /// Create a new retriever client with an LLM client.
+    pub fn new(llm: LlmClient) -> Self {
         Self {
-            retriever: Arc::new(retriever),
+            llm,
+            config: AgentConfig::default(),
             events: EventEmitter::new(),
         }
     }
@@ -54,141 +46,83 @@ impl RetrieverClient {
         self
     }
 
-    /// Query a document tree with optional reasoning index for fast-path lookup.
-    #[tracing::instrument(skip_all, fields(question = %question))]
-    ///
-    /// # Errors
-    ///
-    /// Returns an error if the retrieval pipeline fails.
-    pub async fn query_with_reasoning_index(
-        &self,
-        tree: &DocumentTree,
-        question: &str,
-        options: &RetrieveOptions,
-        reasoning_index: Option<ReasoningIndex>,
-    ) -> Result<QueryResultItem> {
-        self.events.emit_query(QueryEvent::Started {
-            query: question.to_string(),
-        });
-
-        info!("Querying: {:?}", question);
-
-        // Execute retrieval with reasoning index
-        let response = self
-            .retriever
-            .retrieve_with_reasoning_index(tree, question, options, reasoning_index)
-            .await
-            .map_err(|e| Error::Retrieval(e.to_string()))?;
-
-        // Build result
-        let result = self.build_query_result(&response);
+    /// Set custom agent configuration.
+    pub fn with_config(mut self, config: AgentConfig) -> Self {
+        self.config = config;
+        self
+    }
 
-        self.events.emit_query(QueryEvent::Complete {
-            total_results: result.node_ids.len(),
-            confidence: result.score,
-        });
+    /// Get a reference to the agent configuration.
+    pub fn config(&self) -> &AgentConfig {
+        &self.config
+    }
 
-        Ok(result)
+    /// Get a reference to the LLM client.
+    pub fn llm(&self) -> &LlmClient {
+        &self.llm
     }
 
-    /// Query a document tree with streaming results.
+    /// Query documents through the agent-based retrieval system.
     ///
-    /// Returns a channel receiver that yields [`RetrieveEvent`]s
-    /// incrementally as the pipeline progresses through its stages.
-    /// The stream always terminates with either `Completed` or `Error`.
-    ///
-    /// Also emits events through the [`EventEmitter`] (configured via
-    /// [`with_events`](Self::with_events)), so existing `on_query()` handlers
-    /// receive streaming events too.
-    ///
-    /// This is the streaming counterpart of [`query`](Self::query).
-    /// The non-streaming path is completely unaffected.
-    ///
-    /// # Example
-    ///
-    /// ```rust,ignore
-    /// let options = RetrieveOptions::new().with_streaming(true);
-    /// let mut rx = client.query_stream(&tree, "query", &options).await?;
-    ///
-    /// while let Some(event) = rx.recv().await {
-    ///     match event {
-    ///         RetrieveEvent::StageCompleted { stage, .. } => println!("{stage} done"),
-    ///         RetrieveEvent::Completed { response } => {
-    ///             println!("Confidence: {}", response.confidence);
-    ///             break;
-    ///         }
-    ///         RetrieveEvent::Error { message } => { eprintln!("{message}"); break; }
-    ///         _ => {}
-    ///     }
-    /// }
-    /// ```
-    ///
-    /// # Errors
-    ///
-    /// Returns an error if the retriever cannot be cloned for streaming.
-    pub async fn query_stream(
+    /// - `skip_analysis = true` → `Scope::Specified` (user-specified docs, skip Orchestrator analysis)
+    /// - `skip_analysis = false` → `Scope::Workspace` (full Orchestrator analysis flow)
+    #[tracing::instrument(skip_all, fields(question = %question, docs = documents.len()))]
+    pub async fn query(
         &self,
-        tree: &DocumentTree,
+        documents: &[(DocumentTree, NavigationIndex, ReasoningIndex, String)],
         question: &str,
-        options: &RetrieveOptions,
-    ) -> Result<RetrieveEventReceiver> {
+        skip_analysis: bool,
+    ) -> Result<QueryResult> {
         self.events.emit_query(QueryEvent::Started {
             query: question.to_string(),
         });
 
-        info!("Streaming query: {:?}", question);
-
-        let (_handle, rx) = self.retriever.retrieve_streaming(tree, question, options);
-
-        // Note: The Complete event is NOT emitted via EventEmitter here because
-        // the streaming handle returns () — the actual result flows through the
-        // rx channel as RetrieveEvent::Completed { response }. Callers who need
-        // completion metrics should consume the channel.
-
-        Ok(rx)
-    }
+        info!(
+            docs = documents.len(),
+            skip_analysis, "Querying: {:?}", question
+        );
 
-    /// Build QueryResultItem from RetrieveResponse.
-    fn build_query_result(&self, response: &RetrieveResponse) -> QueryResultItem {
-        // Extract node IDs
-        let node_ids: Vec<String> = response
-            .results
+        let doc_contexts: Vec<agent::DocContext> = documents
             .iter()
-            .filter_map(|r| r.node_id.clone())
-            .collect();
-
-        // Build content
-        let content_parts: Vec<String> = response
-            .results
-            .iter()
-            .map(|r| {
-                let mut parts = vec![format!("## {}", r.title)];
-                if let Some(ref content) = r.content {
-                    parts.push(content.clone());
-                }
-                parts.join("\n\n")
+            .map(|(tree, nav, ridx, id)| agent::DocContext {
+                tree,
+                nav_index: nav,
+                reasoning_index: ridx,
+                doc_name: id.as_str(),
             })
             .collect();
 
-        let content = if content_parts.is_empty() {
-            response.content.clone()
+        let scope = if skip_analysis {
+            agent::Scope::Specified(doc_contexts)
         } else {
-            content_parts.join("\n\n---\n\n")
+            agent::Scope::Workspace(agent::WorkspaceContext::new(doc_contexts))
         };
 
-        QueryResultItem {
-            doc_id: String::new(), // Will be set by caller
-            node_ids,
-            content,
-            score: response.confidence,
-        }
+        let emitter = AgentEventEmitter::noop();
+        let output =
+            dispatcher::dispatch(question, scope, &self.config, &self.llm, &emitter).await?;
+
+        let fallback_id = documents
+            .first()
+            .map(|(_, _, _, id)| id.as_str())
+            .unwrap_or("");
+        let items = postprocessor::to_results(&output, fallback_id);
+        let result = QueryResult::new_with_items(items);
+
+        self.events.emit_query(QueryEvent::Complete {
+            total_results: result.len(),
+            confidence: result.single().map(|i| i.confidence).unwrap_or(0.0),
+        });
+
+        Ok(result)
     }
 }
 
 impl Clone for RetrieverClient {
     fn clone(&self) -> Self {
         Self {
-            retriever: Arc::clone(&self.retriever),
+            llm: self.llm.clone(),
+            config: self.config.clone(),
             events: self.events.clone(),
         }
     }
@@ -200,7 +134,7 @@ mod tests {
 
     #[test]
     fn test_retriever_client_creation() {
-        let retriever = crate::retrieval::PipelineRetriever::new();
-        let _client = RetrieverClient::new(retriever);
+        let _client =
+            RetrieverClient::new(LlmClient::new(crate::llm::config::LlmConfig::default()));
     }
 }
diff --git a/rust/src/client/test_support.rs b/rust/src/client/test_support.rs
index 6b936024..dd443da8 100644
--- a/rust/src/client/test_support.rs
+++ b/rust/src/client/test_support.rs
@@ -10,11 +10,13 @@ use std::sync::Arc;
 
 use crate::client::engine::Engine;
 use crate::client::indexer::IndexerClient;
+use crate::client::retriever::RetrieverClient;
 use crate::config::Config;
 use crate::events::EventEmitter;
 use crate::index::PipelineExecutor;
+use crate::llm::LlmClient;
+use crate::llm::config::LlmConfig;
 use crate::metrics::MetricsHub;
-use crate::retrieval::PipelineRetriever;
 use crate::storage::Workspace;
 
 /// Build an `Engine` with a no-LLM pipeline for integration testing.
@@ -37,7 +39,7 @@ pub async fn build_test_engine(workspace_dir: &std::path::Path) -> Engine {
     let indexer = IndexerClient::with_factory(executor_factory);
 
     let workspace = Workspace::new(workspace_dir).await.unwrap();
-    let retriever = PipelineRetriever::new();
+    let retriever = RetrieverClient::new(LlmClient::new(LlmConfig::default()));
 
     Engine::with_components(
         config,
diff --git a/rust/src/client/types.rs b/rust/src/client/types.rs
index 16503054..7da62176 100644
--- a/rust/src/client/types.rs
+++ b/rust/src/client/types.rs
@@ -256,20 +256,60 @@ impl IndexItem {
 // Query Types
 // ============================================================
 
+/// A single piece of evidence with source attribution.
+#[derive(Debug, Clone)]
+pub struct EvidenceItem {
+    /// Section title where this evidence was found.
+    pub title: String,
+    /// Navigation path (e.g., "Root/Chapter 1/Section 1.2").
+    pub path: String,
+    /// Raw evidence content.
+    pub content: String,
+    /// Source document name (set in multi-doc scenarios).
+    pub doc_name: Option<String>,
+}
+
+/// Query execution metrics.
+#[derive(Debug, Clone, Default)]
+pub struct QueryMetrics {
+    /// Number of LLM calls made.
+    pub llm_calls: u32,
+    /// Number of navigation rounds used.
+    pub rounds_used: u32,
+    /// Number of distinct nodes visited.
+    pub nodes_visited: usize,
+    /// Number of evidence items collected.
+    pub evidence_count: usize,
+    /// Total characters of collected evidence.
+    pub evidence_chars: usize,
+}
+
+/// Confidence score of the query result (0.0–1.0).
+///
+/// Derived from LLM evaluate() — whether evidence was deemed sufficient
+/// and how many replan rounds were needed.
+pub type Confidence = f32;
+
 /// A single document's query result.
 #[derive(Debug, Clone)]
 pub struct QueryResultItem {
     /// The document ID.
     pub doc_id: String,
 
-    /// Matching node IDs.
+    /// Matching node IDs (navigation paths).
     pub node_ids: Vec<String>,
 
-    /// Retrieved content.
+    /// Synthesized answer or raw evidence content.
     pub content: String,
 
-    /// Relevance score.
-    pub score: f32,
+    /// Evidence items that contributed to this result, with source attribution.
+    pub evidence: Vec<EvidenceItem>,
+
+    /// Execution metrics for this query.
+    pub metrics: Option<QueryMetrics>,
+
+    /// Confidence score (0.0–1.0) — derived from LLM evaluation.
+    pub confidence: Confidence,
 }
 
 /// Result of a document query.
@@ -295,6 +335,14 @@ impl QueryResult {
         }
     }
 
+    /// Create a query result with items.
+    pub fn new_with_items(items: Vec<QueryResultItem>) -> Self {
+        Self {
+            items,
+            failed: Vec::new(),
+        }
+    }
+
     /// Create a query result with a single item.
     pub fn from_single(item: QueryResultItem) -> Self {
         Self {
@@ -421,7 +469,9 @@ mod tests {
             doc_id: "doc-1".into(),
             node_ids: vec!["n1".into()],
             content: "content".into(),
-            score: 0.9,
+            evidence: vec![],
+            metrics: None,
+            confidence: 0.9,
         };
         let result = QueryResult::from_single(item);
         assert!(!result.is_empty());
diff --git a/rust/src/config/mod.rs b/rust/src/config/mod.rs
index f90f2af9..26c73ac3 100644
--- a/rust/src/config/mod.rs
+++ b/rust/src/config/mod.rs
@@ -11,7 +11,7 @@ mod validator;
 
 pub use types::Config;
 pub(crate) use types::{
-    CacheConfig, CompressionAlgorithm, FallbackBehavior, FallbackConfig, IndexerConfig, LlmConfig,
+    CompressionAlgorithm, FallbackBehavior, FallbackConfig, IndexerConfig, LlmConfig,
     LlmMetricsConfig, MetricsConfig, OnAllFailedBehavior, PilotMetricsConfig,
-    RetrievalMetricsConfig, SlotConfig, SufficiencyConfig,
+    RetrievalMetricsConfig, SlotConfig,
 };
diff --git a/rust/src/config/types/content.rs b/rust/src/config/types/content.rs
deleted file mode 100644
index 62741cd7..00000000
--- a/rust/src/config/types/content.rs
+++ /dev/null
@@ -1,222 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Content aggregator configuration types.
-
-use serde::{Deserialize, Serialize};
-
-/// Content aggregator configuration.
-#[derive(Debug, Clone, Serialize, Deserialize)]
-pub struct ContentAggregatorConfig {
-    /// Whether content aggregator is enabled.
-    /// When disabled, uses simple content collection (legacy behavior).
-    #[serde(default = "default_true")]
-    pub enabled: bool,
-
-    /// Maximum tokens for aggregated content.
-    #[serde(default = "default_token_budget")]
-    pub token_budget: usize,
-
-    /// Minimum relevance score threshold (0.0 - 1.0).
-    /// Content below this threshold will be filtered out.
-    #[serde(default = "default_min_relevance_score")]
-    pub min_relevance_score: f32,
-
-    /// Scoring strategy: "keyword_only" | "keyword_bm25" | "hybrid"
-    #[serde(default = "default_scoring_strategy")]
-    pub scoring_strategy: String,
-
-    /// Output format: "markdown" | "json" | "tree" | "flat"
-    #[serde(default = "default_output_format")]
-    pub output_format: String,
-
-    /// Include relevance scores in output.
-    #[serde(default)]
-    pub include_scores: bool,
-
-    /// Minimum budget allocation per depth level (0.0 - 1.0).
-    /// Ensures each tree level gets representation.
-    #[serde(default = "default_hierarchical_min_per_level")]
-    pub hierarchical_min_per_level: f32,
-
-    /// Enable content deduplication.
-    #[serde(default = "default_true")]
-    pub deduplicate: bool,
-
-    /// Similarity threshold for deduplication (0.0 - 1.0).
-    /// Higher = more aggressive deduplication.
-    #[serde(default = "default_dedup_threshold")]
-    pub dedup_threshold: f32,
-}
-
-fn default_true() -> bool {
-    true
-}
-
-fn default_token_budget() -> usize {
-    4000
-}
-
-fn default_min_relevance_score() -> f32 {
-    0.2
-}
-
-fn default_scoring_strategy() -> String {
-    "keyword_bm25".to_string()
-}
-
-fn default_output_format() -> String {
-    "markdown".to_string()
-}
-
-fn default_hierarchical_min_per_level() -> f32 {
-    0.1
-}
-
-fn default_dedup_threshold() -> f32 {
-    0.9
-}
-
-impl Default for ContentAggregatorConfig {
-    fn default() -> Self {
-        Self {
-            enabled: default_true(),
-            token_budget: default_token_budget(),
-            min_relevance_score: default_min_relevance_score(),
-            scoring_strategy: default_scoring_strategy(),
-            output_format: default_output_format(),
-            include_scores: false,
-            hierarchical_min_per_level: default_hierarchical_min_per_level(),
-            deduplicate: default_true(),
-            dedup_threshold: default_dedup_threshold(),
-        }
-    }
-}
-
-impl ContentAggregatorConfig {
-    /// Create a new config with defaults.
-    pub fn new() -> Self {
-        Self::default()
-    }
-
-    /// Disable content aggregator (use legacy behavior).
-    pub fn disabled() -> Self {
-        Self {
-            enabled: false,
-            ..Self::default()
-        }
-    }
-
-    /// Set the token budget.
-    pub fn with_token_budget(mut self, budget: usize) -> Self {
-        self.token_budget = budget;
-        self
-    }
-
-    /// Set the minimum relevance score.
-    pub fn with_min_relevance(mut self, score: f32) -> Self {
-        self.min_relevance_score = score.clamp(0.0, 1.0);
-        self
-    }
-
-    /// Set the scoring strategy.
-    pub fn with_scoring_strategy(mut self, strategy: impl Into<String>) -> Self {
-        self.scoring_strategy = strategy.into();
-        self
-    }
-
-    /// Set the output format.
-    pub fn with_output_format(mut self, format: impl Into<String>) -> Self {
-        self.output_format = format.into();
-        self
-    }
-
-    /// Enable/disable score inclusion.
-    pub fn with_include_scores(mut self, include: bool) -> Self {
-        self.include_scores = include;
-        self
-    }
-
-    /// Enable/disable deduplication.
-    pub fn with_deduplicate(mut self, dedupe: bool) -> Self {
-        self.deduplicate = dedupe;
-        self
-    }
-
-    /// Convert to the retrieval content aggregator config.
-    pub fn to_aggregator_config(&self) -> crate::retrieval::content::ContentAggregatorConfig {
-        use crate::retrieval::content::{
-            ContentAggregatorConfig as RetrievalContentConfig, OutputFormatConfig,
-            ScoringStrategyConfig,
-        };
-
-        let scoring_strategy = match self.scoring_strategy.as_str() {
-            "keyword_only" => ScoringStrategyConfig::KeywordOnly,
-            "hybrid" => ScoringStrategyConfig::Hybrid,
-            _ => ScoringStrategyConfig::KeywordWithBM25,
-        };
-
-        let output_format = match self.output_format.as_str() {
-            "json" => OutputFormatConfig::Json,
-            "tree" => OutputFormatConfig::Tree,
-            "flat" => OutputFormatConfig::Flat,
-            _ => OutputFormatConfig::Markdown,
-        };
-
-        RetrievalContentConfig {
-            token_budget: self.token_budget,
-            min_relevance_score: self.min_relevance_score,
-            scoring_strategy,
-            output_format,
-            include_scores: self.include_scores,
-            hierarchical_min_per_level: self.hierarchical_min_per_level,
-            deduplicate: self.deduplicate,
-            dedup_threshold: self.dedup_threshold,
-        }
-    }
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-
-    #[test]
-    fn test_content_aggregator_config_defaults() {
-        let config = ContentAggregatorConfig::default();
-        assert!(config.enabled);
-        assert_eq!(config.token_budget, 4000);
-        assert_eq!(config.min_relevance_score, 0.2);
-        assert_eq!(config.scoring_strategy, "keyword_bm25");
-        assert_eq!(config.output_format, "markdown");
-        assert!(config.deduplicate);
-    }
-
-    #[test]
-    fn test_content_aggregator_config_disabled() {
-        let config = ContentAggregatorConfig::disabled();
-        assert!(!config.enabled);
-    }
-
-    #[test]
-    fn test_content_aggregator_config_builder() {
-        let config = ContentAggregatorConfig::new()
-            .with_token_budget(8000)
-            .with_min_relevance(0.5)
-            .with_scoring_strategy("hybrid")
-            .with_output_format("json");
-
-        assert_eq!(config.token_budget, 8000);
-        assert_eq!(config.min_relevance_score, 0.5);
-        assert_eq!(config.scoring_strategy, "hybrid");
-        assert_eq!(config.output_format, "json");
-    }
-
-    #[test]
-    fn test_min_relevance_clamping() {
-        let config = ContentAggregatorConfig::new().with_min_relevance(1.5);
-        assert_eq!(config.min_relevance_score, 1.0);
-
-        let config = ContentAggregatorConfig::new().with_min_relevance(-0.5);
-        assert_eq!(config.min_relevance_score, 0.0);
-    }
-}
diff --git a/rust/src/config/types/mod.rs b/rust/src/config/types/mod.rs
index da53b1f3..b4421110 100644
--- a/rust/src/config/types/mod.rs
+++ b/rust/src/config/types/mod.rs
@@ -3,7 +3,6 @@
 
 //! Configuration type definitions.
 
-mod content;
 mod indexer;
 mod llm_pool;
 mod metrics;
@@ -20,7 +19,7 @@ pub(crate) use metrics::{
     LlmMetricsConfig, MetricsConfig, PilotMetricsConfig, RetrievalMetricsConfig,
 };
 pub(crate) use retrieval::RetrievalConfig;
-pub(crate) use storage::{CacheConfig, CompressionAlgorithm, StorageConfig, SufficiencyConfig};
+pub(crate) use storage::{CompressionAlgorithm, StorageConfig};
 
 /// Main configuration for vectorless.
 ///
@@ -169,27 +168,6 @@ impl Config {
             ));
         }
 
-        // Validate content aggregator
-        if self.retrieval.content.token_budget == 0 {
-            errors.push(ValidationError::error(
-                "retrieval.content.token_budget",
-                "Token budget must be greater than 0",
-            ));
-        }
-
-        if self.retrieval.content.min_relevance_score < 0.0
-            || self.retrieval.content.min_relevance_score > 1.0
-        {
-            errors.push(
-                ValidationError::error(
-                    "retrieval.content.min_relevance_score",
-                    "Min relevance score must be between 0.0 and 1.0",
-                )
-                .with_expected("0.0 - 1.0")
-                .with_actual(self.retrieval.content.min_relevance_score.to_string()),
-            );
-        }
-
         // Validate throttle
         if self.llm.throttle.max_concurrent_requests == 0 {
             errors.push(ValidationError::error(
@@ -363,8 +341,7 @@ mod tests {
     #[test]
     fn test_config_validation_errors() {
         let mut config = Config::default();
-        config.retrieval.content.token_budget = 0;
-        config.retrieval.content.min_relevance_score = 1.5;
+        config.retrieval.top_k = 0;
 
         let result = config.validate();
         assert!(result.is_err());
diff --git a/rust/src/config/types/retrieval.rs b/rust/src/config/types/retrieval.rs
index 18df7cc8..c4300987 100644
--- a/rust/src/config/types/retrieval.rs
+++ b/rust/src/config/types/retrieval.rs
@@ -9,7 +9,6 @@
 
 use serde::{Deserialize, Serialize};
 
-use super::content::ContentAggregatorConfig;
 use super::storage::{CacheConfig, StrategyConfig, SufficiencyConfig};
 
 /// Retrieval strategy configuration.
@@ -37,10 +36,6 @@ pub struct RetrievalConfig {
     /// Strategy-specific configuration.
     #[serde(default)]
     pub strategy: StrategyConfig,
-
-    /// Content aggregator configuration.
-    #[serde(default)]
-    pub content: ContentAggregatorConfig,
 }
 
 fn default_top_k() -> usize {
@@ -55,7 +50,6 @@ impl Default for RetrievalConfig {
             sufficiency: SufficiencyConfig::default(),
             cache: CacheConfig::default(),
             strategy: StrategyConfig::default(),
-            content: ContentAggregatorConfig::default(),
         }
     }
 }
diff --git a/rust/src/config/validator.rs b/rust/src/config/validator.rs
index 7f7d01f6..5d08c89a 100644
--- a/rust/src/config/validator.rs
+++ b/rust/src/config/validator.rs
@@ -119,36 +119,6 @@ impl ValidationRule for RangeValidator {
             ));
         }
 
-        // Content aggregator ranges
-        if config.retrieval.content.token_budget == 0 {
-            errors.push(ValidationError::error(
-                "retrieval.content.token_budget",
-                "Token budget must be greater than 0",
-            ));
-        }
-
-        if config.retrieval.content.min_relevance_score < 0.0
-            || config.retrieval.content.min_relevance_score > 1.0
-        {
-            errors.push(
-                ValidationError::error(
-                    "retrieval.content.min_relevance_score",
-                    "Min relevance score must be between 0.0 and 1.0",
-                )
-                .with_expected("0.0 - 1.0")
-                .with_actual(config.retrieval.content.min_relevance_score.to_string()),
-            );
-        }
-
-        if config.retrieval.content.hierarchical_min_per_level < 0.0
-            || config.retrieval.content.hierarchical_min_per_level > 1.0
-        {
-            errors.push(ValidationError::error(
-                "retrieval.content.hierarchical_min_per_level",
-                "Hierarchical min per level must be between 0.0 and 1.0",
-            ));
-        }
-
         // Throttle ranges
         if config.llm.throttle.max_concurrent_requests == 0 {
             errors.push(ValidationError::error(
@@ -192,17 +162,6 @@ impl ValidationRule for ConsistencyValidator {
             );
         }
 
-        // Check if content token budget is reasonable
-        if config.retrieval.content.token_budget > 100000 {
-            errors.push(
-                ValidationError::warning(
-                    "retrieval.content.token_budget",
-                    "Token budget is very high, may cause performance issues",
-                )
-                .with_actual(config.retrieval.content.token_budget.to_string()),
-            );
-        }
-
         // Check if sufficiency thresholds are consistent
         if config.retrieval.sufficiency.min_tokens > config.retrieval.sufficiency.target_tokens {
             errors.push(
@@ -225,29 +184,6 @@ impl ValidationRule for ConsistencyValidator {
                 .with_actual(config.retrieval.sufficiency.target_tokens.to_string()),
             );
         }
-
-        // Check scoring strategy validity
-        let valid_strategies = ["keyword_only", "keyword_bm25", "hybrid"];
-        if !valid_strategies.contains(&config.retrieval.content.scoring_strategy.as_str()) {
-            errors.push(
-                ValidationError::error(
-                    "retrieval.content.scoring_strategy",
-                    "Invalid scoring strategy",
-                )
-                .with_expected(format!("one of: {:?}", valid_strategies))
-                .with_actual(config.retrieval.content.scoring_strategy.clone()),
-            );
-        }
-
-        // Check output format validity
-        let valid_formats = ["markdown", "json", "tree", "flat"];
-        if !valid_formats.contains(&config.retrieval.content.output_format.as_str()) {
-            errors.push(
-                ValidationError::error("retrieval.content.output_format", "Invalid output format")
-                    .with_expected(format!("one of: {:?}", valid_formats))
-                    .with_actual(config.retrieval.content.output_format.clone()),
-            );
-        }
     }
 }
 
@@ -349,15 +285,14 @@ mod tests {
     #[test]
     fn test_validator_catches_range_errors() {
         let mut config = Config::default();
-        config.retrieval.content.token_budget = 0;
-        config.retrieval.content.min_relevance_score = 1.5;
+        config.retrieval.top_k = 0;
 
         let validator = ConfigValidator::new();
         let result = validator.validate(&config);
 
         assert!(result.is_err());
         let err = result.unwrap_err();
-        assert!(err.errors.iter().any(|e| e.path.contains("token_budget")));
+        assert!(err.errors.iter().any(|e| e.path.contains("top_k")));
     }
 
     #[test]
diff --git a/rust/src/document/mod.rs b/rust/src/document/mod.rs
index 7f00a84b..d3de3bfc 100644
--- a/rust/src/document/mod.rs
+++ b/rust/src/document/mod.rs
@@ -16,19 +16,22 @@
 //! - [`NodeReference`] - In-document reference (e.g., "see Appendix G")
 //! - [`RefType`] - Type of reference (Section, Appendix, Table, etc.)
 
+mod navigation;
 mod node;
 mod reasoning;
 mod reference;
+mod serde_helpers;
 mod structure;
 mod toc;
 mod tree;
 
+pub use navigation::{ChildRoute, DocCard, NavEntry, NavigationIndex, SectionCard};
 pub use node::{NodeId, TreeNode};
 pub use reasoning::{
-    HotNodeEntry, ReasoningIndex, ReasoningIndexBuilder, ReasoningIndexConfig, SectionSummary,
-    SummaryShortcut, TopicEntry,
+    ReasoningIndex, ReasoningIndexBuilder, ReasoningIndexConfig, SectionSummary, SummaryShortcut,
+    TopicEntry,
 };
-pub use reference::{NodeReference, RefType, ReferenceExtractor};
+pub use reference::ReferenceExtractor;
 pub use structure::{DocumentStructure, StructureNode};
 pub use toc::{TocConfig, TocEntry, TocNode, TocView};
 pub use tree::{DocumentTree, RetrievalIndex};
diff --git a/rust/src/document/navigation.rs b/rust/src/document/navigation.rs
new file mode 100644
index 00000000..dbfeadd4
--- /dev/null
+++ b/rust/src/document/navigation.rs
@@ -0,0 +1,626 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! Navigation index for Agent-based retrieval.
+//!
+//! This is the primary data source for the Agent during the query phase.
+//! It provides a compact, pre-computed view of the document tree optimized
+//! for navigation decisions — the Agent can decide where to descend without
+//! reading the actual content.
+//!
+//! # Design
+//!
+//! Based on the Corpus2Skill paper (2604.14572v1), this is the in-memory
+//! equivalent of SKILL.md / INDEX.md. The Agent reads `child_routes` at
+//! each decision point to see all available sub-topics and their descriptions,
+//! then chooses where to navigate next.
+//!
+//! # Data Flow
+//!
+//! ```text
+//! Enhance stage (writes to TreeNode):
+//!   summary, description, routing_keywords, leaf_count
+//!       │
+//!       └──→ Navigation stage (reads TreeNode fields)
+//!             Builds: NavigationIndex (NavEntry + ChildRoute)
+//! ```
+//!
+//! No LLM calls are made during Navigation stage construction.
+
+use std::collections::HashMap;
+
+use serde::{Deserialize, Serialize};
+
+use super::node::NodeId;
+
+/// Navigation index — Agent's primary data source during the query phase.
+///
+/// Contains pre-computed navigation metadata for every non-leaf node,
+/// allowing the Agent to make routing decisions without accessing the
+/// content layer (DocumentTree).
+///
+/// `HashMap<NodeId, _>` fields use `serde_helpers` (Vec pairs) because
+/// serde_json cannot deserialize integer-keyed maps.
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct NavigationIndex {
+    /// Navigation entry for each non-leaf node.
+    #[serde(with = "super::serde_helpers")]
+    nav_entries: HashMap<NodeId, NavEntry>,
+
+    /// Child routes for each non-leaf node.
+    #[serde(with = "super::serde_helpers")]
+    child_routes: HashMap<NodeId, Vec<ChildRoute>>,
+
+    /// Pre-computed document card for multi-document Orchestrator.
+    /// Built during compile phase by NavigationIndexStage.
+    #[serde(default, skip_serializing_if = "Option::is_none")]
+    doc_card: Option<DocCard>,
+}
+
+impl NavigationIndex {
+    /// Create a new empty navigation index.
+    pub fn new() -> Self {
+        Self {
+            nav_entries: HashMap::new(),
+            child_routes: HashMap::new(),
+            doc_card: None,
+        }
+    }
+
+    /// Add a navigation entry for a non-leaf node.
+    pub fn add_entry(&mut self, node_id: NodeId, entry: NavEntry) {
+        self.nav_entries.insert(node_id, entry);
+    }
+
+    /// Add child routes for a non-leaf node.
+    pub fn add_child_routes(&mut self, parent_id: NodeId, routes: Vec<ChildRoute>) {
+        self.child_routes.insert(parent_id, routes);
+    }
+
+    /// Get the navigation entry for a node.
+    pub fn get_entry(&self, node_id: NodeId) -> Option<&NavEntry> {
+        self.nav_entries.get(&node_id)
+    }
+
+    /// Get the child routes for a node.
+    pub fn get_child_routes(&self, node_id: NodeId) -> Option<&[ChildRoute]> {
+        self.child_routes.get(&node_id).map(|v| v.as_slice())
+    }
+
+    /// Get the number of navigation entries.
+    pub fn entry_count(&self) -> usize {
+        self.nav_entries.len()
+    }
+
+    /// Get the total number of child route records.
+    pub fn total_child_routes(&self) -> usize {
+        self.child_routes.values().map(|v| v.len()).sum()
+    }
+
+    /// Get the root node's navigation entry.
+    pub fn root_entry(&self) -> Option<&NavEntry> {
+        // The root should always be present if the index is non-empty.
+        // Return the first entry with level 0.
+        self.nav_entries.values().find(|e| e.level == 0)
+    }
+
+    /// Iterate over all navigation entries.
+    pub fn entries(&self) -> impl Iterator<Item = (&NodeId, &NavEntry)> {
+        self.nav_entries.iter()
+    }
+
+    /// Iterate over all child route sets.
+    pub fn all_child_routes(&self) -> impl Iterator<Item = (&NodeId, &[ChildRoute])> {
+        self.child_routes.iter().map(|(k, v)| (k, v.as_slice()))
+    }
+
+    /// Check if the index is empty.
+    pub fn is_empty(&self) -> bool {
+        self.nav_entries.is_empty()
+    }
+
+    /// Get the pre-computed document card.
+    pub fn doc_card(&self) -> Option<&DocCard> {
+        self.doc_card.as_ref()
+    }
+
+    /// Set the document card.
+    pub fn set_doc_card(&mut self, card: DocCard) {
+        self.doc_card = Some(card);
+    }
+}
+
+impl Default for NavigationIndex {
+    fn default() -> Self {
+        Self::new()
+    }
+}
+
+/// Navigation entry for a non-leaf node.
+///
+/// Provides the Agent with enough context to decide whether this subtree
+/// is relevant to the current query, without needing to read the node's
+/// actual content.
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct NavEntry {
+    /// Routing summary describing what this subtree covers.
+    /// Comes from Enhance stage's `summary` (routing-oriented).
+    pub overview: String,
+
+    /// Typical questions this subtree can answer.
+    /// Extracted from content/summary during Enhance stage.
+    pub question_hints: Vec<String>,
+
+    /// Topic tags for keyword-based matching.
+    /// Comes from Enhance stage's `routing_keywords`.
+    pub topic_tags: Vec<String>,
+
+    /// Total number of leaf nodes in this subtree.
+    /// Equivalent to the paper's `num_documents`.
+    pub leaf_count: usize,
+
+    /// Depth of this node in the tree.
+    /// Equivalent to the paper's `level`.
+    pub level: usize,
+}
+
+/// Child route — compact routing info for one child node.
+///
+/// The Agent sees a list of `ChildRoute`s when deciding which child
+/// to descend into. This provides progressive disclosure: the Agent
+/// doesn't need to enter the child node to understand what it contains.
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct ChildRoute {
+    /// The child node's ID (for the Agent to navigate to).
+    pub node_id: NodeId,
+
+    /// Child node's title.
+    pub title: String,
+
+    /// One-sentence description of what this child covers.
+    /// Comes from Enhance stage's `description` field.
+    pub description: String,
+
+    /// Number of leaf nodes in this child's subtree.
+    pub leaf_count: usize,
+}
+
+/// Pre-computed document card for multi-document Orchestrator Agent.
+///
+/// Built during the compile phase by `NavigationIndexStage`, this provides
+/// a compact summary of the entire document — enough for the Orchestrator
+/// to decide whether a document is relevant to a query without entering it.
+///
+/// All fields come from data already computed in earlier phases of the
+/// NavigationIndexStage (root NavEntry + root child_routes). No LLM calls.
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct DocCard {
+    /// Document title (root node title).
+    pub title: String,
+
+    /// Document overview (root NavEntry.overview).
+    pub overview: String,
+
+    /// Questions this document can answer (root NavEntry.question_hints).
+    pub question_hints: Vec<String>,
+
+    /// Topic keywords (root NavEntry.topic_tags).
+    pub topic_tags: Vec<String>,
+
+    /// Top-level section summaries (from root child_routes).
+    pub sections: Vec<SectionCard>,
+
+    /// Total leaf nodes in the document.
+    pub total_leaves: usize,
+}
+
+/// One top-level section in a [`DocCard`].
+///
+/// Provides a compact view of a single top-level section,
+/// allowing the Orchestrator to scan section titles and descriptions
+/// to assess document relevance.
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct SectionCard {
+    /// Section title.
+    pub title: String,
+
+    /// One-sentence description of this section.
+    pub description: String,
+
+    /// Number of leaf nodes in this section's subtree.
+    pub leaf_count: usize,
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::document::DocumentTree;
+
+    fn build_small_tree() -> DocumentTree {
+        // Root -> [Child1 (leaf), Child2 -> [Grandchild (leaf)]]
+        let mut tree = DocumentTree::new("Root", "");
+        let root = tree.root();
+        let _child1 = tree.add_child(root, "Child1", "leaf content");
+        let child2 = tree.add_child(root, "Child2", "");
+        let _grandchild = tree.add_child(child2, "Grandchild", "leaf content");
+        tree
+    }
+
+    #[test]
+    fn test_empty_navigation_index() {
+        let index = NavigationIndex::new();
+        assert!(index.is_empty());
+        assert_eq!(index.entry_count(), 0);
+        assert_eq!(index.total_child_routes(), 0);
+        assert!(index.root_entry().is_none());
+    }
+
+    #[test]
+    fn test_add_and_retrieve_entry() {
+        let tree = build_small_tree();
+        let root = tree.root();
+
+        let entry = NavEntry {
+            overview: "Payment integration guide".to_string(),
+            question_hints: vec!["How to set up Stripe?".to_string()],
+            topic_tags: vec!["payment".to_string(), "stripe".to_string()],
+            leaf_count: 5,
+            level: 0,
+        };
+
+        let mut index = NavigationIndex::new();
+        index.add_entry(root, entry);
+
+        assert!(!index.is_empty());
+        assert_eq!(index.entry_count(), 1);
+
+        let retrieved = index.get_entry(root).unwrap();
+        assert_eq!(retrieved.overview, "Payment integration guide");
+        assert_eq!(retrieved.leaf_count, 5);
+    }
+
+    #[test]
+    fn test_add_and_retrieve_child_routes() {
+        let tree = build_small_tree();
+        let root = tree.root();
+        let children: Vec<NodeId> = tree.children_iter(root).collect();
+
+        let routes = vec![
+            ChildRoute {
+                node_id: children[0],
+                title: "Getting Started".to_string(),
+                description: "Setup and installation".to_string(),
+                leaf_count: 3,
+            },
+            ChildRoute {
+                node_id: children[1],
+                title: "API Reference".to_string(),
+                description: "REST API endpoints".to_string(),
+                leaf_count: 7,
+            },
+        ];
+
+        let mut index = NavigationIndex::new();
+        index.add_child_routes(root, routes);
+
+        let retrieved = index.get_child_routes(root).unwrap();
+        assert_eq!(retrieved.len(), 2);
+        assert_eq!(retrieved[0].title, "Getting Started");
+        assert_eq!(retrieved[1].leaf_count, 7);
+        assert_eq!(index.total_child_routes(), 2);
+    }
+
+    #[test]
+    fn test_root_entry() {
+        let tree = build_small_tree();
+        let root = tree.root();
+        let children: Vec<NodeId> = tree.children_iter(root).collect();
+
+        let mut index = NavigationIndex::new();
+        index.add_entry(
+            root,
+            NavEntry {
+                overview: "Root".to_string(),
+                question_hints: vec![],
+                topic_tags: vec![],
+                leaf_count: 10,
+                level: 0,
+            },
+        );
+        index.add_entry(
+            children[1],
+            NavEntry {
+                overview: "Child".to_string(),
+                question_hints: vec![],
+                topic_tags: vec![],
+                leaf_count: 5,
+                level: 1,
+            },
+        );
+
+        let root_entry = index.root_entry().unwrap();
+        assert_eq!(root_entry.level, 0);
+        assert_eq!(root_entry.leaf_count, 10);
+    }
+
+    #[test]
+    fn test_get_entry_nonexistent() {
+        let index = NavigationIndex::new();
+        let tree = build_small_tree();
+        // Leaf node should never have an entry
+        let children: Vec<NodeId> = tree.children_iter(tree.root()).collect();
+        assert!(index.get_entry(children[0]).is_none());
+    }
+
+    #[test]
+    fn test_get_child_routes_nonexistent() {
+        let index = NavigationIndex::new();
+        let tree = build_small_tree();
+        assert!(index.get_child_routes(tree.root()).is_none());
+    }
+
+    #[test]
+    fn test_default_trait() {
+        let index = NavigationIndex::default();
+        assert!(index.is_empty());
+    }
+
+    #[test]
+    fn test_entries_iterator() {
+        let tree = build_small_tree();
+        let root = tree.root();
+        let children: Vec<NodeId> = tree.children_iter(root).collect();
+
+        let mut index = NavigationIndex::new();
+        index.add_entry(
+            root,
+            NavEntry {
+                overview: "Root".to_string(),
+                question_hints: vec![],
+                topic_tags: vec![],
+                leaf_count: 2,
+                level: 0,
+            },
+        );
+        index.add_entry(
+            children[1], // Child2 is non-leaf
+            NavEntry {
+                overview: "Child2".to_string(),
+                question_hints: vec![],
+                topic_tags: vec![],
+                leaf_count: 1,
+                level: 1,
+            },
+        );
+
+        let all_entries: Vec<_> = index.entries().collect();
+        assert_eq!(all_entries.len(), 2);
+    }
+
+    #[test]
+    fn test_all_child_routes_iterator() {
+        let tree = build_small_tree();
+        let root = tree.root();
+        let children: Vec<NodeId> = tree.children_iter(root).collect();
+
+        let mut index = NavigationIndex::new();
+        index.add_child_routes(
+            root,
+            vec![ChildRoute {
+                node_id: children[0],
+                title: "C1".to_string(),
+                description: "d".to_string(),
+                leaf_count: 1,
+            }],
+        );
+
+        let all_routes: Vec<_> = index.all_child_routes().collect();
+        assert_eq!(all_routes.len(), 1);
+        assert_eq!(all_routes[0].1.len(), 1);
+    }
+
+    #[test]
+    fn test_serialization_roundtrip() {
+        let tree = build_small_tree();
+        let root = tree.root();
+        let children: Vec<NodeId> = tree.children_iter(root).collect();
+
+        let mut index = NavigationIndex::new();
+        index.add_entry(
+            root,
+            NavEntry {
+                overview: "Root overview".to_string(),
+                question_hints: vec!["What is this?".to_string()],
+                topic_tags: vec!["intro".to_string(), "guide".to_string()],
+                leaf_count: 2,
+                level: 0,
+            },
+        );
+        index.add_child_routes(
+            root,
+            vec![
+                ChildRoute {
+                    node_id: children[0],
+                    title: "Child1".to_string(),
+                    description: "First child desc".to_string(),
+                    leaf_count: 1,
+                },
+                ChildRoute {
+                    node_id: children[1],
+                    title: "Child2".to_string(),
+                    description: "Second child desc".to_string(),
+                    leaf_count: 1,
+                },
+            ],
+        );
+
+        // Serialize
+        let json = serde_json::to_string(&index).expect("serialization failed");
+
+        // Deserialize
+        let deserialized: NavigationIndex =
+            serde_json::from_str(&json).expect("deserialization failed");
+
+        // Verify data survived round-trip
+        assert_eq!(deserialized.entry_count(), 1);
+        assert_eq!(deserialized.total_child_routes(), 2);
+
+        let entry = deserialized.get_entry(root).unwrap();
+        assert_eq!(entry.overview, "Root overview");
+        assert_eq!(entry.question_hints.len(), 1);
+        assert_eq!(entry.topic_tags.len(), 2);
+        assert_eq!(entry.leaf_count, 2);
+        assert_eq!(entry.level, 0);
+
+        let routes = deserialized.get_child_routes(root).unwrap();
+        assert_eq!(routes[0].title, "Child1");
+        assert_eq!(routes[1].title, "Child2");
+    }
+
+    #[test]
+    fn test_doc_card_default_none() {
+        let index = NavigationIndex::new();
+        assert!(index.doc_card().is_none());
+    }
+
+    #[test]
+    fn test_doc_card_set_and_get() {
+        let card = DocCard {
+            title: "Test Doc".to_string(),
+            overview: "A test document".to_string(),
+            question_hints: vec!["What?".to_string()],
+            topic_tags: vec!["test".to_string()],
+            sections: vec![SectionCard {
+                title: "Section 1".to_string(),
+                description: "First section".to_string(),
+                leaf_count: 5,
+            }],
+            total_leaves: 5,
+        };
+
+        let mut index = NavigationIndex::new();
+        index.set_doc_card(card);
+
+        let retrieved = index.doc_card().unwrap();
+        assert_eq!(retrieved.title, "Test Doc");
+        assert_eq!(retrieved.overview, "A test document");
+        assert_eq!(retrieved.question_hints.len(), 1);
+        assert_eq!(retrieved.topic_tags.len(), 1);
+        assert_eq!(retrieved.sections.len(), 1);
+        assert_eq!(retrieved.sections[0].title, "Section 1");
+        assert_eq!(retrieved.sections[0].leaf_count, 5);
+        assert_eq!(retrieved.total_leaves, 5);
+    }
+
+    #[test]
+    fn test_doc_card_serialization_roundtrip() {
+        let tree = build_small_tree();
+        let root = tree.root();
+        let children: Vec<NodeId> = tree.children_iter(root).collect();
+
+        let mut index = NavigationIndex::new();
+        index.add_entry(
+            root,
+            NavEntry {
+                overview: "Root overview".to_string(),
+                question_hints: vec!["What is this?".to_string()],
+                topic_tags: vec!["intro".to_string()],
+                leaf_count: 2,
+                level: 0,
+            },
+        );
+        index.add_child_routes(
+            root,
+            vec![
+                ChildRoute {
+                    node_id: children[0],
+                    title: "Child1".to_string(),
+                    description: "First".to_string(),
+                    leaf_count: 1,
+                },
+                ChildRoute {
+                    node_id: children[1],
+                    title: "Child2".to_string(),
+                    description: "Second".to_string(),
+                    leaf_count: 1,
+                },
+            ],
+        );
+
+        // Build DocCard from index data
+        let root_entry = index.get_entry(root).unwrap();
+        let sections: Vec<SectionCard> = index
+            .get_child_routes(root)
+            .unwrap()
+            .iter()
+            .map(|r| SectionCard {
+                title: r.title.clone(),
+                description: r.description.clone(),
+                leaf_count: r.leaf_count,
+            })
+            .collect();
+        index.set_doc_card(DocCard {
+            title: "Root".to_string(),
+            overview: root_entry.overview.clone(),
+            question_hints: root_entry.question_hints.clone(),
+            topic_tags: root_entry.topic_tags.clone(),
+            sections,
+            total_leaves: root_entry.leaf_count,
+        });
+
+        // Serialize + deserialize
+        let json = serde_json::to_string(&index).expect("serialization failed");
+        let deserialized: NavigationIndex =
+            serde_json::from_str(&json).expect("deserialization failed");
+
+        // Verify DocCard survived round-trip
+        let card = deserialized.doc_card().unwrap();
+        assert_eq!(card.title, "Root");
+        assert_eq!(card.overview, "Root overview");
+        assert_eq!(card.question_hints, vec!["What is this?"]);
+        assert_eq!(card.topic_tags, vec!["intro"]);
+        assert_eq!(card.sections.len(), 2);
+        assert_eq!(card.sections[0].title, "Child1");
+        assert_eq!(card.sections[1].leaf_count, 1);
+        assert_eq!(card.total_leaves, 2);
+    }
+
+    #[test]
+    fn test_doc_card_backward_compat_deserialize_without_card() {
+        // JSON from an older version that doesn't have doc_card
+        let tree = build_small_tree();
+        let root = tree.root();
+
+        let mut index = NavigationIndex::new();
+        index.add_entry(
+            root,
+            NavEntry {
+                overview: "Old index".to_string(),
+                question_hints: vec![],
+                topic_tags: vec![],
+                leaf_count: 2,
+                level: 0,
+            },
+        );
+        // No doc_card set
+
+        let json = serde_json::to_string(&index).expect("serialization failed");
+        let deserialized: NavigationIndex =
+            serde_json::from_str(&json).expect("deserialization failed");
+
+        assert!(deserialized.doc_card().is_none());
+        assert_eq!(deserialized.entry_count(), 1);
+    }
+
+    #[test]
+    fn test_section_card_fields() {
+        let card = SectionCard {
+            title: "Getting Started".to_string(),
+            description: "Quick setup guide".to_string(),
+            leaf_count: 3,
+        };
+        assert_eq!(card.title, "Getting Started");
+        assert_eq!(card.description, "Quick setup guide");
+        assert_eq!(card.leaf_count, 3);
+    }
+}
diff --git a/rust/src/document/node.rs b/rust/src/document/node.rs
index a62a92ff..c0a6ffe6 100644
--- a/rust/src/document/node.rs
+++ b/rust/src/document/node.rs
@@ -105,6 +105,20 @@ pub struct TreeNode {
     /// "refer to Table 5.3" that can be followed during retrieval.
     #[serde(default)]
     pub references: Vec<NodeReference>,
+
+    /// Routing keywords for navigation (non-leaf nodes).
+    ///
+    /// Populated by EnhanceStage with LLM-extracted topic tags.
+    /// Used by NavigationIndexStage to populate `NavEntry::topic_tags`.
+    #[serde(default)]
+    pub routing_keywords: Vec<String>,
+
+    /// Typical questions this subtree can answer (non-leaf nodes).
+    ///
+    /// Populated by EnhanceStage with LLM-extracted question hints.
+    /// Used by NavigationIndexStage to populate `NavEntry::question_hints`.
+    #[serde(default)]
+    pub question_hints: Vec<String>,
 }
 
 impl Default for TreeNode {
@@ -123,6 +137,8 @@ impl Default for TreeNode {
             physical_index: None,
             token_count: None,
             references: Vec::new(),
+            routing_keywords: Vec::new(),
+            question_hints: Vec::new(),
         }
     }
 }
diff --git a/rust/src/document/reasoning.rs b/rust/src/document/reasoning.rs
index f673ce58..2c4ab01b 100644
--- a/rust/src/document/reasoning.rs
+++ b/rust/src/document/reasoning.rs
@@ -29,12 +29,16 @@ pub struct ReasoningIndex {
 
     /// Nodes marked as hot (frequently retrieved).
     /// NodeId → cumulative hit count and rolling average score.
+    /// Uses `node_id_map` because serde_json cannot deserialize
+    /// `HashMap<NodeId, _>` (integer keys are incompatible with JSON).
+    #[serde(with = "super::serde_helpers")]
     hot_nodes: HashMap<NodeId, HotNodeEntry>,
 
     /// Depth-1 section title → NodeId mapping for fast ToC lookup.
     section_map: HashMap<String, NodeId>,
 
     /// Configuration used to build this index (for cache invalidation).
+    #[serde(default)]
     config_hash: u64,
 }
 
@@ -363,4 +367,78 @@ mod tests {
         let config = ReasoningIndexConfig::disabled();
         assert!(!config.enabled);
     }
+
+    #[test]
+    fn test_serialization_roundtrip_empty() {
+        let mut tree = crate::document::DocumentTree::new("Root", "content");
+        let child = tree.add_child(tree.root(), "Section 1", "s1 content");
+
+        let mut builder = ReasoningIndexBuilder::new();
+        builder.add_section("Section 1", child);
+        builder.add_topic_entry(
+            "section",
+            TopicEntry {
+                node_id: child,
+                weight: 0.8,
+                depth: 1,
+            },
+        );
+        let index = builder.build();
+
+        let json = serde_json::to_string(&index).unwrap();
+        let restored: ReasoningIndex = serde_json::from_str(&json).unwrap();
+
+        assert_eq!(restored.topic_count(), 1);
+        assert_eq!(restored.section_count(), 1);
+        assert_eq!(restored.hot_node_count(), 0);
+    }
+
+    #[test]
+    fn test_serialization_roundtrip_with_hot_nodes() {
+        let mut tree = crate::document::DocumentTree::new("Root", "");
+        let root = tree.root();
+        let c1 = tree.add_child(root, "S1", "content 1");
+        let c2 = tree.add_child(root, "S2", "content 2");
+
+        let mut index = ReasoningIndex::new();
+        index.update_hot_nodes(&[(c1, 0.9), (c2, 0.7), (c1, 0.8)], 2);
+
+        // c1 should be hot (2 hits >= threshold 2)
+        assert!(index.is_hot(c1));
+        // c2 should not be hot (1 hit < threshold 2)
+        assert!(!index.is_hot(c2));
+
+        let json = serde_json::to_string(&index).unwrap();
+
+        // hot_nodes should serialize as array of pairs, not as object
+        assert!(!json.contains("\"hot_nodes\":{}"));
+        assert!(json.contains("\"hot_nodes\":["));
+
+        let restored: ReasoningIndex = serde_json::from_str(&json).unwrap();
+        assert!(restored.is_hot(c1));
+        assert!(!restored.is_hot(c2));
+
+        let entry = restored.hot_entry(c1).unwrap();
+        assert_eq!(entry.hit_count, 2);
+        assert!(entry.avg_score > 0.0);
+    }
+
+    #[test]
+    fn test_backward_compat_hot_nodes_empty_object() {
+        // Simulate old JSON where hot_nodes was serialized as {} by derive.
+        let mut tree = crate::document::DocumentTree::new("Root", "");
+        let child = tree.add_child(tree.root(), "S1", "c");
+
+        let mut builder = ReasoningIndexBuilder::new();
+        builder.add_section("s1", child);
+        let index = builder.build();
+
+        // Serialize normally (produces "hot_nodes":[]), then replace with
+        // the old format to test backward compat
+        let json = serde_json::to_string(&index).unwrap();
+        let old_json = json.replace("\"hot_nodes\":[]", "\"hot_nodes\":{}");
+
+        let restored: ReasoningIndex = serde_json::from_str(&old_json).unwrap();
+        assert_eq!(restored.hot_node_count(), 0);
+    }
 }
diff --git a/rust/src/document/serde_helpers.rs b/rust/src/document/serde_helpers.rs
new file mode 100644
index 00000000..cb658c35
--- /dev/null
+++ b/rust/src/document/serde_helpers.rs
@@ -0,0 +1,241 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! Serde helpers for types that contain `HashMap<NodeId, _>`.
+//!
+//! JSON requires object keys to be strings, but `NodeId` (wrapping `indextree::NodeId`)
+//! serializes as an integer. When `serde_json` serializes a `HashMap<NodeId, V>`,
+//! it converts the integer key to a string (e.g., `42` → `"42"`), but on deserialization
+//! it cannot parse the string back to `NodeId` because the deserializer expects a number.
+//!
+//! This module provides a `#[serde(with = "node_id_map")]` adapter that serializes
+//! `HashMap<NodeId, V>` as a `Vec<(NodeId, V)>` instead, which is JSON-safe.
+//!
+//! # Usage
+//!
+//! ```rust,ignore
+//! use serde::{Serialize, Deserialize};
+//! use std::collections::HashMap;
+//! use crate::document::serde_helpers::node_id_map;
+//!
+//! #[derive(Serialize, Deserialize)]
+//! struct MyStruct {
+//!     #[serde(with = "node_id_map")]
+//!     entries: HashMap<NodeId, String>,
+//! }
+//! ```
+
+use std::collections::HashMap;
+
+use serde::de::DeserializeOwned;
+use serde::{Deserialize, Deserializer, Serialize, Serializer};
+
+use super::node::NodeId;
+
+/// Serialize `HashMap<NodeId, V>` as `Vec<(NodeId, V)>` (sorted by key for determinism).
+pub fn serialize<V, S>(map: &HashMap<NodeId, V>, serializer: S) -> Result<S::Ok, S::Error>
+where
+    V: Serialize,
+    S: Serializer,
+{
+    let mut pairs: Vec<_> = map.iter().map(|(k, v)| (*k, v)).collect();
+    pairs.sort_by_key(|(id, _)| usize::from(id.0));
+    pairs.serialize(serializer)
+}
+
+/// Deserialize `Vec<(NodeId, V)>` back into `HashMap<NodeId, V>`.
+///
+/// Also accepts `{}` (empty JSON object) for backward compatibility with
+/// data serialized before this helper was introduced, when `hot_nodes` etc.
+/// were empty and serialized as `{}`.
+pub fn deserialize<'de, V, D>(deserializer: D) -> Result<HashMap<NodeId, V>, D::Error>
+where
+    V: DeserializeOwned,
+    D: Deserializer<'de>,
+{
+    use serde::de;
+
+    // Try to deserialize as either a Vec of pairs or an empty object.
+    struct VecOrEmptyMap<V>(std::marker::PhantomData<V>);
+
+    impl<'de, V> de::Visitor<'de> for VecOrEmptyMap<V>
+    where
+        V: DeserializeOwned,
+    {
+        type Value = HashMap<NodeId, V>;
+
+        fn expecting(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+            f.write_str("a list of (NodeId, value) pairs or an empty object")
+        }
+
+        fn visit_seq<A>(self, seq: A) -> Result<Self::Value, A::Error>
+        where
+            A: de::SeqAccess<'de>,
+        {
+            let pairs: Vec<(NodeId, V)> =
+                Deserialize::deserialize(de::value::SeqAccessDeserializer::new(seq))?;
+            Ok(pairs.into_iter().collect())
+        }
+
+        fn visit_map<A>(self, map: A) -> Result<Self::Value, A::Error>
+        where
+            A: de::MapAccess<'de>,
+        {
+            // Consume the map (should be empty for backward compat)
+            let _: de::value::MapAccessDeserializer<A> = de::value::MapAccessDeserializer::new(map);
+            Ok(HashMap::new())
+        }
+    }
+
+    deserializer.deserialize_any(VecOrEmptyMap(std::marker::PhantomData))
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::document::DocumentTree;
+
+    /// Wrapper struct to test `#[serde(with)]` through serde_json round-trip.
+    #[derive(Serialize, Deserialize, Debug)]
+    struct Wrap<V: Serialize + serde::de::DeserializeOwned> {
+        #[serde(with = "super")]
+        map: HashMap<NodeId, V>,
+    }
+
+    #[test]
+    fn test_empty_map_roundtrip() {
+        let original = Wrap {
+            map: HashMap::<NodeId, String>::new(),
+        };
+        let json = serde_json::to_string(&original).unwrap();
+        assert!(json.contains("\"map\":[]"));
+
+        let restored: Wrap<String> = serde_json::from_str(&json).unwrap();
+        assert!(restored.map.is_empty());
+    }
+
+    #[test]
+    fn test_single_entry_roundtrip() {
+        let tree = DocumentTree::new("Root", "content");
+        let root = tree.root();
+
+        let original = Wrap {
+            map: {
+                let mut m = HashMap::new();
+                m.insert(root, "root data".to_string());
+                m
+            },
+        };
+
+        let json = serde_json::to_string(&original).unwrap();
+        let restored: Wrap<String> = serde_json::from_str(&json).unwrap();
+        assert_eq!(restored.map.get(&root), Some(&"root data".to_string()));
+    }
+
+    #[test]
+    fn test_multiple_entries_roundtrip() {
+        let mut tree = DocumentTree::new("Root", "");
+        let root = tree.root();
+        let c1 = tree.add_child(root, "C1", "c1");
+        let c2 = tree.add_child(root, "C2", "c2");
+
+        let original = Wrap {
+            map: {
+                let mut m = HashMap::new();
+                m.insert(root, 0u32);
+                m.insert(c1, 1u32);
+                m.insert(c2, 2u32);
+                m
+            },
+        };
+
+        let json = serde_json::to_string(&original).unwrap();
+        let restored: Wrap<u32> = serde_json::from_str(&json).unwrap();
+
+        assert_eq!(restored.map.len(), 3);
+        assert_eq!(restored.map[&root], 0);
+        assert_eq!(restored.map[&c1], 1);
+        assert_eq!(restored.map[&c2], 2);
+    }
+
+    #[test]
+    fn test_backward_compat_empty_object() {
+        // Old data serialized hot_nodes as {} before node_id_map was used.
+        let json = r#"{"map": {}}"#;
+        let restored: Wrap<String> = serde_json::from_str(json).unwrap();
+        assert!(restored.map.is_empty());
+    }
+
+    #[test]
+    fn test_backward_compat_nonempty_object_rejected() {
+        // A non-empty JSON object with string keys like {"1": "data"} should
+        // fail because the string key "1" cannot be deserialized as NodeId.
+        let json = r#"{"map": {"1": "data"}}"#;
+        let result: Result<Wrap<String>, _> = serde_json::from_str(json);
+        assert!(result.is_err());
+    }
+
+    #[test]
+    fn test_serialized_json_shape() {
+        let mut tree = DocumentTree::new("Root", "");
+        let root = tree.root();
+        let child = tree.add_child(root, "Child", "c");
+
+        let original = Wrap {
+            map: {
+                let mut m = HashMap::new();
+                m.insert(root, "a".to_string());
+                m.insert(child, "b".to_string());
+                m
+            },
+        };
+
+        let json = serde_json::to_string(&original).unwrap();
+        // Verify deterministic ordering: root (id 0) before child (id 1)
+        let root_pos = json.find("\"a\"").unwrap_or(usize::MAX);
+        let child_pos = json.find("\"b\"").unwrap_or(usize::MAX);
+        assert!(
+            root_pos < child_pos,
+            "root entry should come first: {}",
+            json
+        );
+    }
+
+    #[test]
+    fn test_roundtrip_with_complex_value() {
+        // Test with a non-trivial value type (not just String/u32)
+        let tree = DocumentTree::new("Root", "");
+        let root = tree.root();
+
+        #[derive(Serialize, Deserialize, Debug, PartialEq)]
+        struct Entry {
+            count: u32,
+            label: String,
+        }
+
+        #[derive(Serialize, Deserialize, Debug)]
+        struct ComplexWrap {
+            #[serde(with = "super")]
+            data: HashMap<NodeId, Entry>,
+        }
+
+        let original = ComplexWrap {
+            data: {
+                let mut m = HashMap::new();
+                m.insert(
+                    root,
+                    Entry {
+                        count: 42,
+                        label: "test".to_string(),
+                    },
+                );
+                m
+            },
+        };
+
+        let json = serde_json::to_string(&original).unwrap();
+        let restored: ComplexWrap = serde_json::from_str(&json).unwrap();
+        assert_eq!(restored.data[&root].count, 42);
+        assert_eq!(restored.data[&root].label, "test");
+    }
+}
diff --git a/rust/src/document/tree.rs b/rust/src/document/tree.rs
index a23b8c0e..1659471b 100644
--- a/rust/src/document/tree.rs
+++ b/rust/src/document/tree.rs
@@ -213,6 +213,8 @@ impl DocumentTree {
             physical_index: None,
             token_count: None,
             references: Vec::new(),
+            routing_keywords: Vec::new(),
+            question_hints: Vec::new(),
         };
         let root_id = arena.new_node(root_data);
 
@@ -297,6 +299,8 @@ impl DocumentTree {
             physical_index: None,
             token_count: None,
             references: Vec::new(),
+            routing_keywords: Vec::new(),
+            question_hints: Vec::new(),
         };
         let child_id = self.arena.new_node(child_data);
         parent.0.append(child_id, &mut self.arena);
@@ -417,6 +421,21 @@ impl DocumentTree {
         self.get(id).map(|n| n.depth).unwrap_or(0)
     }
 
+    /// Get the maximum depth of any node in the tree (root = 0, leaf ≥ 0).
+    ///
+    /// Uses a single BFS pass. Returns 0 for a single-node tree.
+    pub fn max_depth(&self) -> usize {
+        let mut max_d = 0;
+        let mut stack = vec![(self.root_id, 0usize)];
+        while let Some((id, d)) = stack.pop() {
+            max_d = max_d.max(d);
+            for child in self.children_iter(id) {
+                stack.push((child, d + 1));
+            }
+        }
+        max_d
+    }
+
     /// Get the first child of a node.
     ///
     /// Returns None if the node has no children.
diff --git a/rust/src/error.rs b/rust/src/error.rs
index 42e10adf..36acf0e5 100644
--- a/rust/src/error.rs
+++ b/rust/src/error.rs
@@ -63,7 +63,7 @@ pub enum Error {
     // =========================================================================
     // LLM Errors
     // =========================================================================
-    /// An error occurred during LLM call.
+    /// An error occurred during LLM call (transient: network, timeout).
     #[error("LLM error: {0}")]
     Llm(String),
 
@@ -75,6 +75,16 @@ pub enum Error {
     #[error("LLM quota exceeded")]
     QuotaExceeded,
 
+    /// LLM reasoning failure — model responded but output is unusable.
+    /// Not transient. Do not retry the same prompt.
+    #[error("LLM reasoning failure at '{stage}': {detail}")]
+    LlmReasoning {
+        /// The pipeline stage where reasoning failed.
+        stage: String,
+        /// Why the output was unusable.
+        detail: String,
+    },
+
     // =========================================================================
     // Summary Errors
     // =========================================================================
diff --git a/rust/src/index/mod.rs b/rust/src/index/mod.rs
index 89e23403..051f5326 100644
--- a/rust/src/index/mod.rs
+++ b/rust/src/index/mod.rs
@@ -29,6 +29,9 @@
 //! Priority  45: ┌────▼──────────┐
 //!               │ Reasoning Idx │  Pre-computed reasoning index
 //!               └────┬──────────┘
+//! Priority  50: ┌────▼──────────┐
+//!               │ Navigation Idx│  Agent navigation index
+//!               └────┬──────────┘
 //! Priority  60: ┌────▼──────┐
 //!               │ Optimize  │  Final tree optimization
 //!               └───────────┘
diff --git a/rust/src/index/pipeline/context.rs b/rust/src/index/pipeline/context.rs
index 4f38d0a6..f34876b9 100644
--- a/rust/src/index/pipeline/context.rs
+++ b/rust/src/index/pipeline/context.rs
@@ -6,7 +6,7 @@
 use std::collections::HashMap;
 use std::path::PathBuf;
 
-use crate::document::{DocumentTree, NodeId, ReasoningIndex};
+use crate::document::{DocumentTree, NavigationIndex, NodeId, ReasoningIndex};
 use crate::index::parse::{DocumentFormat, RawNode};
 use crate::llm::LlmClient;
 
@@ -248,6 +248,9 @@ pub struct IndexContext {
     /// Pre-computed reasoning index (built by ReasoningIndexStage).
     pub reasoning_index: Option<ReasoningIndex>,
 
+    /// Navigation index for Agent-based retrieval (built by NavigationIndexStage).
+    pub navigation_index: Option<NavigationIndex>,
+
     /// Existing tree from previous indexing (for incremental updates).
     /// When set, the enhance and reasoning stages can reuse data from unchanged nodes.
     pub existing_tree: Option<DocumentTree>,
@@ -285,6 +288,7 @@ impl IndexContext {
             llm_client: None,
             summary_cache: SummaryCache::default(),
             reasoning_index: None,
+            navigation_index: None,
             existing_tree: None,
             stage_results: HashMap::new(),
             metrics: IndexMetrics::default(),
@@ -382,6 +386,7 @@ impl IndexContext {
             metrics: self.metrics,
             summary_cache: self.summary_cache,
             reasoning_index: self.reasoning_index,
+            navigation_index: self.navigation_index,
         }
     }
 }
@@ -421,6 +426,9 @@ pub struct PipelineResult {
 
     /// Pre-computed reasoning index for retrieval acceleration.
     pub reasoning_index: Option<ReasoningIndex>,
+
+    /// Navigation index for Agent-based retrieval.
+    pub navigation_index: Option<NavigationIndex>,
 }
 
 impl PipelineResult {
@@ -443,6 +451,7 @@ impl PipelineResult {
             + self.metrics.enhance_time_ms
             + self.metrics.enrich_time_ms
             + self.metrics.reasoning_index_time_ms
+            + self.metrics.navigation_index_time_ms
             + self.metrics.optimize_time_ms
     }
 }
diff --git a/rust/src/index/pipeline/executor.rs b/rust/src/index/pipeline/executor.rs
index e54750f4..34c1f43a 100644
--- a/rust/src/index/pipeline/executor.rs
+++ b/rust/src/index/pipeline/executor.rs
@@ -13,8 +13,8 @@ use crate::llm::LlmClient;
 
 use super::super::PipelineOptions;
 use super::super::stages::{
-    BuildStage, EnhanceStage, EnrichStage, IndexStage, OptimizeStage, ParseStage,
-    ReasoningIndexStage, SplitStage, ValidateStage,
+    BuildStage, EnhanceStage, EnrichStage, IndexStage, NavigationIndexStage, OptimizeStage,
+    ParseStage, ReasoningIndexStage, SplitStage, ValidateStage,
 };
 use super::context::{IndexInput, PipelineResult};
 use super::orchestrator::PipelineOrchestrator;
@@ -55,7 +55,8 @@ impl PipelineExecutor {
     /// 4. `split` - Split oversized leaf nodes (optional)
     /// 5. `enrich` - Add metadata and cross-references
     /// 6. `reasoning_index` - Build pre-computed reasoning index
-    /// 7. `optimize` - Optimize tree structure
+    /// 7. `navigation_index` - Build Agent navigation index
+    /// 8. `optimize` - Optimize tree structure
     pub fn new() -> Self {
         let orchestrator = PipelineOrchestrator::new()
             .stage_with_priority(ParseStage::new(), 10)
@@ -64,6 +65,7 @@ impl PipelineExecutor {
             .stage_with_priority(SplitStage::new(), 25)
             .stage_with_priority(EnrichStage::new(), 40)
             .stage_with_priority(ReasoningIndexStage::new(), 45)
+            .stage_with_priority(NavigationIndexStage::new(), 50)
             .stage_with_priority(OptimizeStage::new(), 60);
 
         Self { orchestrator }
@@ -79,7 +81,8 @@ impl PipelineExecutor {
     /// 5. `enhance` - LLM-based enhancement (summaries)
     /// 6. `enrich` - Add metadata
     /// 7. `reasoning_index` - Build pre-computed reasoning index
-    /// 8. `optimize` - Optimize tree
+    /// 8. `navigation_index` - Build Agent navigation index
+    /// 9. `optimize` - Optimize tree
     pub fn with_llm(client: LlmClient) -> Self {
         tracing::info!(
             "PipelineExecutor::with_llm — cloning client to ParseStage + EnhanceStage + context"
@@ -93,6 +96,7 @@ impl PipelineExecutor {
             .stage_with_priority(EnhanceStage::with_llm_client(client), 30)
             .stage_with_priority(EnrichStage::new(), 40)
             .stage_with_priority(ReasoningIndexStage::new(), 45)
+            .stage_with_priority(NavigationIndexStage::new(), 50)
             .stage_with_priority(OptimizeStage::new(), 60);
 
         Self { orchestrator }
diff --git a/rust/src/index/pipeline/orchestrator.rs b/rust/src/index/pipeline/orchestrator.rs
index fc77dbc9..10f1f3ad 100644
--- a/rust/src/index/pipeline/orchestrator.rs
+++ b/rust/src/index/pipeline/orchestrator.rs
@@ -25,7 +25,7 @@
 
 use std::collections::HashMap;
 use std::time::Instant;
-use tracing::{error, info, warn};
+use tracing::{debug, error, info, warn};
 
 use crate::error::Result;
 
@@ -450,15 +450,23 @@ impl PipelineOrchestrator {
         // Resolve execution order
         let order = self.resolve_order()?;
         let stage_names: Vec<&str> = order.iter().map(|&i| self.stages[i].stage.name()).collect();
-        info!("Execution order: {:?}", stage_names);
+        info!("[pipeline] Execution order: {:?}", stage_names);
 
         // Compute execution groups for potential parallelization
         let groups = self.compute_execution_groups(&order);
-        info!(
-            "Execution groups: {} ({} parallelizable)",
-            groups.len(),
-            groups.iter().filter(|g| g.parallel).count()
-        );
+        let parallel_count = groups.iter().filter(|g| g.parallel).count();
+        if parallel_count > 0 {
+            info!(
+                "[pipeline] {} execution groups ({} parallelizable)",
+                groups.len(),
+                parallel_count
+            );
+        } else {
+            debug!(
+                "[pipeline] {} execution groups (all sequential)",
+                groups.len()
+            );
+        }
 
         // Create context
         let mut opts = options;
@@ -509,19 +517,15 @@ impl PipelineOrchestrator {
         // Execute each group
         for (group_idx, group) in groups.iter().enumerate() {
             if group.parallel {
-                info!(
-                    "Executing parallel group {} with {} stages: {:?}",
-                    group_idx,
-                    group.stage_indices.len(),
-                    group
-                        .stage_indices
-                        .iter()
-                        .map(|&i| self.stages[i].stage.name())
-                        .collect::<Vec<_>>()
-                );
+                let names: Vec<&str> = group
+                    .stage_indices
+                    .iter()
+                    .map(|&i| self.stages[i].stage.name())
+                    .collect();
+                info!("[pipeline] Parallel group {}: {:?}", group_idx, names);
             }
 
-            if group.parallel && group.stage_indices.len() == 2 {
+            if group.parallel && !group.stage_indices.is_empty() {
                 // Check if all stages in this group are already completed (from checkpoint)
                 let all_completed = group.stage_indices.iter().all(|&idx| {
                     let name = self.stages[idx].stage.name();
@@ -533,99 +537,137 @@ impl PipelineOrchestrator {
                         .iter()
                         .map(|&i| self.stages[i].stage.name())
                         .collect();
-                    info!("Skipping already completed parallel group: {:?}", names);
+                    info!("[pipeline] Skipping completed parallel group: {:?}", names);
                     continue;
                 }
 
-                // === Parallel execution for 2-stage groups ===
-                // One stage gets the main ctx (mutates tree), the other
-                // gets a cloned snapshot (read-only). Results are merged back.
-                let idx_a = group.stage_indices[0];
-                let idx_b = group.stage_indices[1];
-
-                // Determine which stage reads tree (gets snapshot) vs writes tree (gets ctx)
-                // using AccessPattern instead of hardcoded name checks.
-                let (writer_idx, reader_idx) = {
-                    let ap_a = self.stages[idx_a].stage.access_pattern();
-                    let ap_b = self.stages[idx_b].stage.access_pattern();
-                    // The stage that writes tree gets the main ctx;
-                    // the other (read-only on tree) gets a clone.
-                    if ap_b.writes_tree && !ap_a.writes_tree {
-                        (idx_b, idx_a) // b writes tree, a is reader
-                    } else {
-                        (idx_a, idx_b) // a writes tree (or both/neither write), b is reader
-                    }
-                };
-
-                // Clone tree snapshot for the reader stage
-                let tree_snapshot = ctx.tree.clone();
-                let options_snapshot = ctx.options.clone();
-                let existing_tree_snapshot = ctx.existing_tree.clone();
-
-                // Take both stages out to avoid double &mut self
-                let mut stage_writer =
-                    std::mem::replace(&mut self.stages[writer_idx].stage, Box::new(NopStage));
-                let mut stage_reader =
-                    std::mem::replace(&mut self.stages[reader_idx].stage, Box::new(NopStage));
-
-                let writer_name = stage_writer.name().to_string();
-                let reader_name = stage_reader.name().to_string();
-                let writer_policy = stage_writer.failure_policy();
-                let reader_policy = stage_reader.failure_policy();
-
-                info!("Parallel: executing {} ∥ {}", writer_name, reader_name);
-
-                // Build a minimal context clone for the reader stage
-                let mut reader_ctx = IndexContext::new(IndexInput::content(""), options_snapshot);
-                reader_ctx.tree = tree_snapshot;
-                reader_ctx.existing_tree = existing_tree_snapshot;
-                reader_ctx.doc_id = ctx.doc_id.clone();
-                reader_ctx.name = ctx.name.clone();
-                reader_ctx.format = ctx.format;
-                reader_ctx.source_path = ctx.source_path.clone();
-
-                // Execute both stages concurrently
-                let (writer_result, reader_result) = tokio::join!(
-                    Self::execute_stage_with_policy(&mut stage_writer, &mut ctx),
-                    Self::execute_stage_with_policy(&mut stage_reader, &mut reader_ctx),
-                );
-
-                // Put stages back
-                self.stages[writer_idx].stage = stage_writer;
-                self.stages[reader_idx].stage = stage_reader;
+                // === N-stage parallel execution ===
+                //
+                // At most one stage may write_tree — it gets the main ctx.
+                // All other stages get cloned contexts with tree snapshots.
+                // All stages run concurrently via futures::future::join_all.
+                // After all complete, outputs are merged back by AccessPattern.
 
-                // Handle writer result
-                Self::handle_stage_result(writer_result, &writer_name, &writer_policy, &mut ctx)?;
+                // Identify the tree writer (if any)
+                let tree_writer_idx: Option<usize> = group
+                    .stage_indices
+                    .iter()
+                    .find(|&&idx| self.stages[idx].stage.access_pattern().writes_tree)
+                    .copied();
 
-                // Handle reader result
-                Self::handle_stage_result(reader_result, &reader_name, &reader_policy, &mut ctx)?;
+                // For each stage, prepare (stage, context) pair.
+                // Swap out stages from self.stages to get owned Box<dyn IndexStage>.
+                let mut entries: Vec<ParallelEntry> = Vec::with_capacity(group.stage_indices.len());
 
-                // Merge reader's outputs back based on its AccessPattern
-                let reader_ap = self.stages[reader_idx].stage.access_pattern();
-                if reader_ap.writes_reasoning_index {
-                    ctx.reasoning_index = reader_ctx.reasoning_index;
+                for &idx in &group.stage_indices {
+                    let stage = std::mem::replace(&mut self.stages[idx].stage, Box::new(NopStage));
+                    let name = stage.name().to_string();
+                    let policy = stage.failure_policy();
+                    let access = stage.access_pattern();
+
+                    let stage_ctx = if Some(idx) == tree_writer_idx {
+                        // Tree writer gets a placeholder; we'll use &mut ctx directly
+                        None
+                    } else {
+                        // Reader gets a cloned context
+                        let mut clone =
+                            IndexContext::new(IndexInput::content(""), ctx.options.clone());
+                        clone.tree = ctx.tree.clone();
+                        clone.existing_tree = ctx.existing_tree.clone();
+                        clone.doc_id = ctx.doc_id.clone();
+                        clone.name = ctx.name.clone();
+                        clone.format = ctx.format;
+                        clone.source_path = ctx.source_path.clone();
+                        if let Some(ref llm) = ctx.llm_client {
+                            clone.llm_client = Some(llm.clone());
+                        }
+                        Some(clone)
+                    };
+
+                    entries.push(ParallelEntry {
+                        idx,
+                        stage,
+                        ctx: stage_ctx,
+                        name,
+                        policy,
+                        access,
+                    });
                 }
-                if reader_ap.writes_description {
-                    ctx.description = reader_ctx.description;
+
+                let parallel_names: Vec<&str> = entries.iter().map(|e| e.name.as_str()).collect();
+                info!("[pipeline] Executing in parallel: {:?}", parallel_names);
+
+                // Split into writer and readers
+                let mut writer_entry: Option<ParallelEntry> = None;
+                let mut reader_entries: Vec<ParallelEntry> = Vec::new();
+                for entry in entries {
+                    if entry.ctx.is_none() {
+                        writer_entry = Some(entry);
+                    } else {
+                        reader_entries.push(entry);
+                    }
                 }
-                // Merge additive metrics
-                ctx.metrics.llm_calls += reader_ctx.metrics.llm_calls;
-                ctx.metrics.summaries_generated += reader_ctx.metrics.summaries_generated;
-                ctx.metrics.total_tokens_generated += reader_ctx.metrics.total_tokens_generated;
-                ctx.metrics.nodes_processed += reader_ctx.metrics.nodes_processed;
-                if reader_ctx.metrics.reasoning_index_time_ms > 0 {
-                    ctx.metrics.record_reasoning_index(
-                        reader_ctx.metrics.reasoning_index_time_ms,
-                        reader_ctx.metrics.topics_indexed,
-                        reader_ctx.metrics.keywords_indexed,
+
+                // Execute writer on main ctx concurrently with readers.
+                // Move each reader's stage+ctx into an owned async block.
+                // All futures are !Send (Box<dyn IndexStage>), but join_all
+                // works fine on the same thread.
+
+                let reader_futs: Vec<
+                    std::pin::Pin<
+                        Box<
+                            dyn std::future::Future<
+                                    Output = (
+                                        ParallelEntry,
+                                        std::result::Result<StageResult, crate::error::Error>,
+                                    ),
+                                > + Send,
+                        >,
+                    >,
+                > = reader_entries
+                    .into_iter()
+                    .map(|mut entry| {
+                        Box::pin(async move {
+                            let res = Self::execute_stage_with_policy(
+                                &mut entry.stage,
+                                entry.ctx.as_mut().unwrap(),
+                            )
+                            .await;
+                            (entry, res)
+                        })
+                            as std::pin::Pin<Box<dyn std::future::Future<Output = _> + Send>>
+                    })
+                    .collect();
+
+                // If there's a tree writer, run it concurrently with readers.
+                // If no tree writer (all readers), just run readers.
+                if let Some(mut we) = writer_entry {
+                    // Run writer + readers concurrently.
+                    // The writer borrows &mut ctx; readers use their own cloned ctxs.
+                    let (writer_res, completed_readers) = tokio::join!(
+                        Self::execute_stage_with_policy(&mut we.stage, &mut ctx),
+                        futures::future::join_all(reader_futs),
                     );
+
+                    // Put writer stage back and handle result
+                    self.stages[we.idx].stage = we.stage;
+                    Self::handle_stage_result(writer_res, &we.name, &we.policy, &mut ctx)?;
+
+                    // Process reader results
+                    for (re, reader_res) in completed_readers {
+                        Self::merge_reader_outputs(&mut ctx, &re);
+                        self.stages[re.idx].stage = re.stage;
+                        Self::handle_stage_result(reader_res, &re.name, &re.policy, &mut ctx)?;
+                    }
+                } else {
+                    // All readers, no writer
+                    let completed_readers = futures::future::join_all(reader_futs).await;
+                    for (re, reader_res) in completed_readers {
+                        Self::merge_reader_outputs(&mut ctx, &re);
+                        self.stages[re.idx].stage = re.stage;
+                        Self::handle_stage_result(reader_res, &re.name, &re.policy, &mut ctx)?;
+                    }
                 }
-                if reader_ctx.metrics.optimize_time_ms > 0 {
-                    ctx.metrics
-                        .record_optimize(reader_ctx.metrics.optimize_time_ms);
-                }
-                ctx.metrics.nodes_merged += reader_ctx.metrics.nodes_merged;
-                ctx.metrics.nodes_skipped += reader_ctx.metrics.nodes_skipped;
             } else {
                 // === Sequential execution (single stage or non-parallel group) ===
                 for &idx in &group.stage_indices {
@@ -676,8 +718,10 @@ impl PipelineOrchestrator {
 
         let total_duration = total_start.elapsed().as_millis() as u64;
         info!(
-            "Orchestrated pipeline completed in {}ms for document {}",
-            total_duration, ctx.name
+            "[pipeline] Complete: {} stages in {}ms for '{}'",
+            ctx.stage_results.len(),
+            total_duration,
+            ctx.name,
         );
 
         // Clear checkpoint on successful completion
@@ -692,6 +736,60 @@ impl PipelineOrchestrator {
         Ok(ctx.finalize())
     }
 
+    /// Merge a reader stage's outputs back into the main context.
+    ///
+    /// Reads the reader's AccessPattern to know which fields to copy,
+    /// and merges additive metrics (LLM calls, tokens, etc.).
+    fn merge_reader_outputs(ctx: &mut IndexContext, reader: &ParallelEntry) {
+        if reader.access.writes_reasoning_index {
+            if let Some(ref rctx) = reader.ctx {
+                ctx.reasoning_index = rctx.reasoning_index.clone();
+            }
+        }
+        if reader.access.writes_navigation_index {
+            if let Some(ref rctx) = reader.ctx {
+                ctx.navigation_index = rctx.navigation_index.clone();
+            }
+        }
+        if reader.access.writes_description {
+            if let Some(ref rctx) = reader.ctx {
+                ctx.description = rctx.description.clone();
+            }
+        }
+        // Merge additive metrics from reader
+        if let Some(ref rctx) = reader.ctx {
+            ctx.metrics.llm_calls += rctx.metrics.llm_calls;
+            ctx.metrics.summaries_generated += rctx.metrics.summaries_generated;
+            ctx.metrics.total_tokens_generated += rctx.metrics.total_tokens_generated;
+            ctx.metrics.nodes_processed += rctx.metrics.nodes_processed;
+            ctx.metrics.nodes_merged += rctx.metrics.nodes_merged;
+            ctx.metrics.nodes_skipped += rctx.metrics.nodes_skipped;
+            if rctx.metrics.reasoning_index_time_ms > 0 {
+                ctx.metrics.record_reasoning_index(
+                    rctx.metrics.reasoning_index_time_ms,
+                    rctx.metrics.topics_indexed,
+                    rctx.metrics.keywords_indexed,
+                );
+            }
+            if rctx.metrics.optimize_time_ms > 0 {
+                ctx.metrics.record_optimize(rctx.metrics.optimize_time_ms);
+            }
+            if rctx.metrics.navigation_index_time_ms > 0 {
+                ctx.metrics.record_navigation_index(
+                    rctx.metrics.navigation_index_time_ms,
+                    rctx.metrics.nav_entries_indexed,
+                    rctx.metrics.child_routes_indexed,
+                );
+            }
+            if rctx.metrics.enhance_time_ms > 0 {
+                ctx.metrics.record_enhance(rctx.metrics.enhance_time_ms);
+            }
+            if rctx.metrics.enrich_time_ms > 0 {
+                ctx.metrics.record_enrich(rctx.metrics.enrich_time_ms);
+            }
+        }
+    }
+
     /// Save a checkpoint of the current pipeline state.
     fn save_checkpoint(ctx: &IndexContext) {
         let checkpoint_dir = match ctx.options.checkpoint_dir {
@@ -753,6 +851,27 @@ impl IndexStage for NopStage {
     }
 }
 
+/// Owned entry for parallel stage execution.
+///
+/// Each stage in a parallel group is swapped out from the orchestrator's
+/// stages vec into this struct, along with its own cloned context.
+/// After execution, the stage is swapped back and outputs are merged.
+struct ParallelEntry {
+    /// Index into orchestrator's stages vec (for swapping back).
+    idx: usize,
+    /// The owned stage implementation.
+    stage: Box<dyn IndexStage>,
+    /// Cloned context for reader stages; None for the tree writer
+    /// (which uses the main ctx directly).
+    ctx: Option<IndexContext>,
+    /// Stage name (captured before swap).
+    name: String,
+    /// Failure policy (captured before swap).
+    policy: FailurePolicy,
+    /// Access pattern (captured before swap).
+    access: crate::index::stages::AccessPattern,
+}
+
 /// Builder for creating custom stage configurations.
 ///
 /// This is a convenience type for configuring custom stages
diff --git a/rust/src/index/stages/build.rs b/rust/src/index/stages/build.rs
index aee0b51c..02b5eda8 100644
--- a/rust/src/index/stages/build.rs
+++ b/rust/src/index/stages/build.rs
@@ -5,7 +5,7 @@
 
 use super::async_trait;
 use std::time::Instant;
-use tracing::info;
+use tracing::{debug, info};
 
 use crate::document::{DocumentTree, NodeId};
 use crate::error::Result;
@@ -259,13 +259,22 @@ impl IndexStage for BuildStage {
         let mut raw_nodes = std::mem::take(&mut ctx.raw_nodes);
 
         if raw_nodes.is_empty() {
+            info!("[build] No raw nodes, skipping");
             return Ok(StageResult::success("build"));
         }
 
-        info!("Building tree from {} raw nodes", raw_nodes.len());
+        info!(
+            "[build] Starting: {} raw nodes, thinning={}",
+            raw_nodes.len(),
+            ctx.options.thinning.enabled
+        );
 
         // Step 1: Calculate total tokens
         Self::calculate_total_tokens(&mut raw_nodes);
+        debug!(
+            "[build] Calculated total tokens for {} nodes",
+            raw_nodes.len()
+        );
 
         // Step 2: Apply thinning if enabled
         let _original_count = raw_nodes.len();
@@ -280,6 +289,14 @@ impl IndexStage for BuildStage {
 
         let skipped = nodes_before_merge - raw_nodes.len();
         ctx.metrics.nodes_skipped += skipped;
+        if skipped > 0 {
+            debug!(
+                "[build] Thinning removed {} nodes ({} → {})",
+                skipped,
+                nodes_before_merge,
+                raw_nodes.len()
+            );
+        }
 
         // Step 3: Build tree
         let mut tree = self.build_tree(raw_nodes, ctx);
@@ -289,6 +306,8 @@ impl IndexStage for BuildStage {
             self.assign_node_ids(&mut tree);
         }
 
+        let node_count = tree.node_count();
+
         // Store tree in context
         ctx.tree = Some(tree);
 
@@ -296,10 +315,8 @@ impl IndexStage for BuildStage {
         ctx.metrics.record_build(duration);
 
         info!(
-            "Built tree with {} nodes (skipped {} via thinning) in {}ms",
-            ctx.tree.as_ref().map(|t| t.node_count()).unwrap_or(0),
-            skipped,
-            duration
+            "[build] Complete: {} nodes (skipped {} via thinning) in {}ms",
+            node_count, skipped, duration
         );
 
         let mut stage_result = StageResult::success("build");
diff --git a/rust/src/index/stages/enhance.rs b/rust/src/index/stages/enhance.rs
index 770fddf9..0223d572 100644
--- a/rust/src/index/stages/enhance.rs
+++ b/rust/src/index/stages/enhance.rs
@@ -67,6 +67,49 @@ impl EnhanceStage {
         self
     }
 
+    /// Parse structured navigation response from LLM.
+    ///
+    /// Expected format:
+    /// ```text
+    /// OVERVIEW: <text>
+    /// QUESTIONS: q1, q2, q3
+    /// TAGS: tag1, tag2, tag3
+    /// ```
+    ///
+    /// Falls back gracefully: if markers are missing, the entire response
+    /// becomes the overview and questions/tags remain empty.
+    fn parse_structured_nav_response(response: &str) -> (String, Vec<String>, Vec<String>) {
+        let mut overview = String::new();
+        let mut questions: Vec<String> = Vec::new();
+        let mut tags: Vec<String> = Vec::new();
+
+        for line in response.lines() {
+            let line = line.trim();
+            if let Some(rest) = line.strip_prefix("OVERVIEW:") {
+                overview = rest.trim().to_string();
+            } else if let Some(rest) = line.strip_prefix("QUESTIONS:") {
+                questions = rest
+                    .split(',')
+                    .map(|s| s.trim().to_string())
+                    .filter(|s| !s.is_empty())
+                    .collect();
+            } else if let Some(rest) = line.strip_prefix("TAGS:") {
+                tags = rest
+                    .split(',')
+                    .map(|s| s.trim().to_string())
+                    .filter(|s| !s.is_empty())
+                    .collect();
+            }
+        }
+
+        // Fallback: if no OVERVIEW marker found, use entire response as overview
+        if overview.is_empty() {
+            overview = response.trim().to_string();
+        }
+
+        (overview, questions, tags)
+    }
+
     /// Check if summary generation is needed based on strategy.
     fn needs_summaries(&self, ctx: &IndexContext) -> bool {
         match &ctx.options.summary_strategy {
@@ -110,7 +153,7 @@ impl IndexStage for EnhanceStage {
         let start = Instant::now();
 
         info!(
-            "EnhanceStage: llm_client={}, strategy={:?}",
+            "[enhance] Starting: llm_client={}, strategy={:?}",
             self.llm_client.is_some(),
             ctx.options.summary_strategy
         );
@@ -118,7 +161,7 @@ impl IndexStage for EnhanceStage {
         // Check if we need summaries
         if !self.needs_summaries(ctx) {
             info!(
-                "Summary generation skipped (strategy: {:?})",
+                "[enhance] Skipped: strategy={:?}",
                 ctx.options.summary_strategy
             );
             return Ok(StageResult::success("enhance"));
@@ -128,7 +171,7 @@ impl IndexStage for EnhanceStage {
         let llm_client = match &self.llm_client {
             Some(client) => client,
             None => {
-                warn!("No LLM client configured, skipping summary generation");
+                warn!("[enhance] No LLM client, skipping summary generation");
                 return Ok(StageResult::success("enhance"));
             }
         };
@@ -137,13 +180,11 @@ impl IndexStage for EnhanceStage {
         let tree = match ctx.tree.as_mut() {
             Some(t) => t,
             None => {
-                warn!("No tree built, skipping enhance stage");
+                warn!("[enhance] No tree built, skipping");
                 return Ok(StageResult::success("enhance"));
             }
         };
 
-        info!("Using summary strategy: {:?}", ctx.options.summary_strategy);
-
         // Create summary generator (shared via Arc for concurrent use)
         let generator = Arc::new(
             LlmSummaryGenerator::new((*llm_client).as_ref().clone())
@@ -168,12 +209,15 @@ impl IndexStage for EnhanceStage {
                 ctx.metrics.increment_summaries();
             }
             info!(
-                "Incremental: {} of {} nodes unchanged, reusing summaries",
+                "[enhance] Incremental: {} of {} nodes unchanged, reusing summaries",
                 applied, total_nodes,
             );
         }
 
-        info!("Processing {} nodes for summary generation", total_nodes);
+        info!(
+            "[enhance] Processing {} nodes for summary generation",
+            total_nodes
+        );
 
         // === Phase 1: Collect pending nodes (cache hits applied immediately) ===
         let strategy = ctx.options.summary_strategy.clone();
@@ -219,7 +263,7 @@ impl IndexStage for EnhanceStage {
                     if !cached.is_empty() {
                         tree.set_summary(node_id, &cached);
                         debug!(
-                            "Using cached summary for node: {} ({} chars)",
+                            "[enhance] Cache hit: '{}' ({} chars)",
                             node.title,
                             cached.len()
                         );
@@ -237,7 +281,7 @@ impl IndexStage for EnhanceStage {
             if shortcut_threshold > 0 && token_count > 0 && token_count <= shortcut_threshold {
                 tree.set_summary(node_id, &node.content);
                 debug!(
-                    "Shortcut: using original content as summary for '{}' ({} tokens)",
+                    "[enhance] Shortcut: '{}' ({} tokens, using original content)",
                     node.title, token_count
                 );
                 ctx.metrics.increment_summaries();
@@ -262,13 +306,13 @@ impl IndexStage for EnhanceStage {
 
         if !pending_llm.is_empty() {
             info!(
-                "Generating summaries for {} nodes (concurrency: {})",
+                "[enhance] Generating summaries for {} nodes (concurrency: {})",
                 pending_llm.len(),
                 concurrency
             );
 
-            // Collect results: (NodeId, Result<String>)
-            let results: Vec<(NodeId, std::result::Result<String, String>)> =
+            // Collect results: (NodeId, is_leaf, Result<String>)
+            let results: Vec<(NodeId, bool, std::result::Result<String, String>)> =
                 futures::stream::iter(pending_llm)
                     .map(|pending| {
                         let generator = Arc::clone(&generator);
@@ -280,7 +324,11 @@ impl IndexStage for EnhanceStage {
                                     pending.is_leaf,
                                 )
                                 .await;
-                            (pending.node_id, result.map_err(|e| e.to_string()))
+                            (
+                                pending.node_id,
+                                pending.is_leaf,
+                                result.map_err(|e| e.to_string()),
+                            )
                         }
                     })
                     .buffer_unordered(concurrency)
@@ -288,22 +336,36 @@ impl IndexStage for EnhanceStage {
                     .await;
 
             // Write results back to tree
-            for (node_id, result) in results {
+            for (node_id, is_leaf, result) in results {
                 ctx.metrics.increment_llm_calls();
                 match result {
-                    Ok(summary) => {
-                        if summary.is_empty() {
+                    Ok(response) => {
+                        if response.is_empty() {
                             failed += 1;
                         } else {
                             ctx.metrics
-                                .add_tokens_generated(crate::utils::estimate_tokens(&summary));
-                            tree.set_summary(node_id, &summary);
+                                .add_tokens_generated(crate::utils::estimate_tokens(&response));
+
+                            if is_leaf {
+                                // Leaf node: response is a plain content summary
+                                tree.set_summary(node_id, &response);
+                            } else {
+                                // Non-leaf node: response is structured (OVERVIEW/QUESTIONS/TAGS)
+                                let (overview, questions, tags) =
+                                    Self::parse_structured_nav_response(&response);
+                                tree.set_summary(node_id, &overview);
+
+                                if let Some(node) = tree.get_mut(node_id) {
+                                    node.question_hints = questions;
+                                    node.routing_keywords = tags;
+                                }
+                            }
                             generated += 1;
                             ctx.metrics.increment_summaries();
                         }
                     }
                     Err(e) => {
-                        warn!("Failed to generate summary: {}", e);
+                        warn!("[enhance] LLM summary failed: {}", e);
                         failed += 1;
                     }
                 }
@@ -317,7 +379,7 @@ impl IndexStage for EnhanceStage {
         }
 
         info!(
-            "Generated {} summaries ({} shortcut, {} failed, {} skipped no content, {} skipped tokens) in {}ms",
+            "[enhance] Complete: {} summaries ({} shortcut, {} failed, {} no-content, {} skipped-tokens) in {}ms",
             generated, shortcut_used, failed, skipped_no_content, skipped_tokens, duration
         );
 
@@ -334,3 +396,54 @@ impl IndexStage for EnhanceStage {
         Ok(stage_result)
     }
 }
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn test_parse_structured_nav_response_full() {
+        let response = "\
+OVERVIEW: This section covers payment integration and billing configuration.
+QUESTIONS: How to set up payments?, What currencies are supported?, How to configure invoices?
+TAGS: payments, billing, invoices, currency";
+
+        let (overview, questions, tags) = EnhanceStage::parse_structured_nav_response(response);
+
+        assert!(overview.contains("payment integration"));
+        assert_eq!(questions.len(), 3);
+        assert!(questions[0].contains("set up payments"));
+        assert_eq!(tags.len(), 4);
+        assert_eq!(tags[0], "payments");
+    }
+
+    #[test]
+    fn test_parse_structured_nav_response_partial() {
+        // Only overview, no questions or tags
+        let response = "OVERVIEW: A general introduction to the system.";
+        let (overview, questions, tags) = EnhanceStage::parse_structured_nav_response(response);
+
+        assert!(overview.contains("general introduction"));
+        assert!(questions.is_empty());
+        assert!(tags.is_empty());
+    }
+
+    #[test]
+    fn test_parse_structured_nav_response_fallback() {
+        // No markers at all — fallback to entire response as overview
+        let response = "This is just a plain summary without any markers.";
+        let (overview, questions, tags) = EnhanceStage::parse_structured_nav_response(response);
+
+        assert_eq!(overview, response.trim());
+        assert!(questions.is_empty());
+        assert!(tags.is_empty());
+    }
+
+    #[test]
+    fn test_parse_structured_nav_response_empty() {
+        let (overview, questions, tags) = EnhanceStage::parse_structured_nav_response("");
+        assert!(overview.is_empty());
+        assert!(questions.is_empty());
+        assert!(tags.is_empty());
+    }
+}
diff --git a/rust/src/index/stages/enrich.rs b/rust/src/index/stages/enrich.rs
index 1f6456ad..88ea8cc1 100644
--- a/rust/src/index/stages/enrich.rs
+++ b/rust/src/index/stages/enrich.rs
@@ -5,7 +5,7 @@
 
 use super::async_trait;
 use std::time::Instant;
-use tracing::info;
+use tracing::{debug, info};
 
 use crate::document::{DocumentTree, NodeId, ReferenceExtractor, TocView};
 use crate::error::Result;
@@ -88,7 +88,7 @@ impl EnrichStage {
             if let Some(root) = tree.get(tree.root()) {
                 if !root.summary.is_empty() {
                     ctx.description = Some(root.summary.clone());
-                    info!("Using root summary as document description");
+                    debug!("[enrich] Using root summary as document description");
                 }
             }
         }
@@ -171,24 +171,30 @@ impl IndexStage for EnrichStage {
             .as_mut()
             .ok_or_else(|| crate::Error::IndexBuild("Tree not built".to_string()))?;
 
+        let node_count = tree.node_count();
+        info!("[enrich] Starting: {} nodes", node_count);
+
         // 1. Calculate page ranges
         Self::calculate_page_ranges(tree);
-        info!("Calculated page ranges for all nodes");
+        debug!("[enrich] Calculated page ranges");
 
         // 2. Generate ToC view (cached in context)
         let toc_view = TocView::new();
         let toc = toc_view.generate(tree);
         let _toc_markdown = toc_view.format_markdown(&toc);
-        // Could store ToC in context if needed
+        debug!("[enrich] Generated ToC ({} children)", toc.children.len());
 
         // 3. Calculate token statistics
-        let (total_tokens, node_count) = Self::calculate_token_stats(tree);
-        info!("Total tokens: {}, nodes: {}", total_tokens, node_count);
+        let (total_tokens, stat_node_count) = Self::calculate_token_stats(tree);
+        debug!(
+            "[enrich] Token stats: {} total tokens across {} nodes",
+            total_tokens, stat_node_count
+        );
 
         // 4. Extract and resolve cross-references
         let resolved_refs = Self::resolve_references(tree);
         if resolved_refs > 0 {
-            info!("Resolved {} cross-references", resolved_refs);
+            info!("[enrich] Resolved {} cross-references", resolved_refs);
         }
 
         // 5. Generate document description
@@ -197,7 +203,10 @@ impl IndexStage for EnrichStage {
         let duration = start.elapsed().as_millis() as u64;
         ctx.metrics.record_enrich(duration);
 
-        info!("Enriched tree metadata in {}ms", duration);
+        info!(
+            "[enrich] Complete: {} tokens, {} refs resolved in {}ms",
+            total_tokens, resolved_refs, duration
+        );
 
         let mut stage_result = StageResult::success("enrich");
         stage_result.duration_ms = duration;
@@ -219,29 +228,6 @@ impl IndexStage for EnrichStage {
 #[cfg(test)]
 mod tests {
     use super::*;
-    use crate::document::RefType;
-
-    #[test]
-    fn test_resolve_references_section_ref() {
-        let mut tree = DocumentTree::new("Root", "root content");
-        let s1 = tree.add_child(tree.root(), "Introduction", "Introduction text.");
-        tree.set_structure(s1, "1");
-        let s2 = tree.add_child(
-            tree.root(),
-            "Details",
-            "For details, see Section 1 for more info",
-        );
-        tree.set_structure(s2, "2");
-
-        let resolved = EnrichStage::resolve_references(&mut tree);
-        assert_eq!(resolved, 1);
-
-        // Verify the reference was stored on s2 and resolved to s1
-        let refs = tree.get(s2).unwrap().references.clone();
-        assert_eq!(refs.len(), 1);
-        assert_eq!(refs[0].ref_type, RefType::Section);
-        assert_eq!(refs[0].target_node, Some(s1));
-    }
 
     #[test]
     fn test_resolve_references_no_refs() {
diff --git a/rust/src/index/stages/mod.rs b/rust/src/index/stages/mod.rs
index f320a525..9a3c405f 100644
--- a/rust/src/index/stages/mod.rs
+++ b/rust/src/index/stages/mod.rs
@@ -6,6 +6,7 @@
 mod build;
 mod enhance;
 mod enrich;
+mod navigation;
 mod optimize;
 mod parse;
 mod reasoning;
@@ -15,6 +16,7 @@ mod validate;
 pub use build::BuildStage;
 pub use enhance::EnhanceStage;
 pub use enrich::EnrichStage;
+pub use navigation::NavigationIndexStage;
 pub use optimize::OptimizeStage;
 pub use parse::ParseStage;
 pub use reasoning::ReasoningIndexStage;
@@ -35,6 +37,8 @@ pub struct AccessPattern {
     pub writes_tree: bool,
     /// Whether this stage writes to `reasoning_index`.
     pub writes_reasoning_index: bool,
+    /// Whether this stage writes to `navigation_index`.
+    pub writes_navigation_index: bool,
     /// Whether this stage writes to `description`.
     pub writes_description: bool,
 }
diff --git a/rust/src/index/stages/navigation.rs b/rust/src/index/stages/navigation.rs
new file mode 100644
index 00000000..0a41517f
--- /dev/null
+++ b/rust/src/index/stages/navigation.rs
@@ -0,0 +1,563 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! Navigation Index Stage — Build the Agent navigation index from the document tree.
+//!
+//! This stage runs after EnrichStage and ReasoningIndexStage. It reads the
+//! enhanced TreeNode fields (summary, description, routing_keywords, leaf_count)
+//! and builds a [`NavigationIndex`] containing compact [`NavEntry`] and
+//! [`ChildRoute`] records for every non-leaf node.
+//!
+//! # No LLM Calls
+//!
+//! This stage performs pure data organization. All LLM-generated content
+//! (summaries, descriptions, keywords) is already on the tree from the
+//! Enhance stage. This stage only reads and restructures that data.
+
+use std::time::Instant;
+use tracing::{debug, info, warn};
+
+use crate::document::{ChildRoute, DocumentTree, NavEntry, NavigationIndex, NodeId};
+use crate::error::Result;
+
+use super::async_trait;
+use super::{AccessPattern, IndexStage, StageResult};
+use crate::index::pipeline::IndexContext;
+
+/// Navigation Index Stage — builds the Agent navigation index.
+///
+/// For every non-leaf node in the tree, this stage creates:
+/// - A [`NavEntry`] with overview, question hints, topic tags, leaf count, and level.
+/// - A list of [`ChildRoute`] entries, one per child, with title, description, and leaf count.
+///
+/// The resulting [`NavigationIndex`] is stored in `ctx.navigation_index` and
+/// serialized as part of [`PersistedDocument`](crate::storage::persistence::PersistedDocument).
+pub struct NavigationIndexStage;
+
+impl NavigationIndexStage {
+    /// Create a new navigation index stage.
+    pub fn new() -> Self {
+        Self
+    }
+
+    /// Count the number of leaf nodes in a subtree rooted at `node_id`.
+    fn count_leaves(tree: &DocumentTree, node_id: NodeId) -> usize {
+        if tree.is_leaf(node_id) {
+            return 1;
+        }
+        let mut count = 0;
+        let mut stack = vec![node_id];
+        while let Some(id) = stack.pop() {
+            if tree.is_leaf(id) {
+                count += 1;
+            } else {
+                for child in tree.children_iter(id) {
+                    stack.push(child);
+                }
+            }
+        }
+        count
+    }
+
+    /// Build a NavEntry for a non-leaf node.
+    fn build_nav_entry(tree: &DocumentTree, node_id: NodeId, leaf_count: usize) -> NavEntry {
+        let node = match tree.get(node_id) {
+            Some(n) => n,
+            None => {
+                return NavEntry {
+                    overview: String::new(),
+                    question_hints: Vec::new(),
+                    topic_tags: Vec::new(),
+                    leaf_count: 0,
+                    level: 0,
+                };
+            }
+        };
+
+        // Overview: use summary if available, otherwise title
+        let overview = if !node.summary.is_empty() {
+            node.summary.clone()
+        } else {
+            node.title.clone()
+        };
+
+        NavEntry {
+            overview,
+            question_hints: node.question_hints.clone(),
+            topic_tags: node.routing_keywords.clone(),
+            leaf_count,
+            level: node.depth,
+        }
+    }
+
+    /// Build a ChildRoute for a single child node.
+    fn build_child_route(tree: &DocumentTree, child_id: NodeId, leaf_count: usize) -> ChildRoute {
+        let node = tree.get(child_id);
+        let title = node.map(|n| n.title.clone()).unwrap_or_default();
+        let description = node
+            .and_then(|n| {
+                // Use summary as description if available; otherwise use a truncated title
+                if !n.summary.is_empty() {
+                    Some(n.summary.clone())
+                } else if !n.content.is_empty() {
+                    // Truncate content as fallback description
+                    let s: String = n.content.chars().take(100).collect();
+                    Some(s)
+                } else {
+                    None
+                }
+            })
+            .unwrap_or_else(|| title.clone());
+
+        ChildRoute {
+            node_id: child_id,
+            title,
+            description,
+            leaf_count,
+        }
+    }
+}
+
+impl Default for NavigationIndexStage {
+    fn default() -> Self {
+        Self::new()
+    }
+}
+
+#[async_trait]
+impl IndexStage for NavigationIndexStage {
+    fn name(&self) -> &'static str {
+        "navigation_index"
+    }
+
+    fn depends_on(&self) -> Vec<&'static str> {
+        vec!["enrich"]
+    }
+
+    fn is_optional(&self) -> bool {
+        true
+    }
+
+    fn access_pattern(&self) -> AccessPattern {
+        AccessPattern {
+            reads_tree: true,
+            writes_navigation_index: true,
+            ..Default::default()
+        }
+    }
+
+    async fn execute(&mut self, ctx: &mut IndexContext) -> Result<StageResult> {
+        let start = Instant::now();
+
+        let tree = match ctx.tree.as_ref() {
+            Some(t) => t,
+            None => {
+                warn!("[navigation_index] No tree, cannot build index");
+                return Ok(StageResult::failure("navigation_index", "Tree not built"));
+            }
+        };
+
+        let all_nodes = tree.traverse();
+        let leaf_count = all_nodes.iter().filter(|&&id| tree.is_leaf(id)).count();
+        let non_leaf_count = all_nodes.len() - leaf_count;
+
+        info!(
+            "[navigation_index] Starting: {} total nodes ({} leaves, {} non-leaf)",
+            all_nodes.len(),
+            leaf_count,
+            non_leaf_count,
+        );
+
+        let mut nav_entries_count = 0usize;
+        let mut child_routes_count = 0usize;
+
+        // Phase 1: Pre-compute leaf counts for all nodes.
+        // We compute once per node to avoid repeated traversals.
+        debug!(
+            "[navigation_index] Phase 1: Pre-computing leaf counts for {} nodes",
+            all_nodes.len()
+        );
+        let mut leaf_counts: std::collections::HashMap<NodeId, usize> =
+            std::collections::HashMap::with_capacity(all_nodes.len());
+        for &node_id in &all_nodes {
+            leaf_counts.insert(node_id, Self::count_leaves(tree, node_id));
+        }
+
+        // Phase 2: Build NavEntry + ChildRoutes for each non-leaf node.
+        debug!(
+            "[navigation_index] Phase 2: Building NavEntry + ChildRoutes for {} non-leaf nodes",
+            non_leaf_count
+        );
+        let mut nav_index = NavigationIndex::new();
+
+        for &node_id in &all_nodes {
+            // Skip leaf nodes — they have no children to navigate to
+            if tree.is_leaf(node_id) {
+                continue;
+            }
+
+            let lc = *leaf_counts.get(&node_id).unwrap_or(&0);
+
+            // Build navigation entry for this non-leaf node
+            let nav_entry = Self::build_nav_entry(tree, node_id, lc);
+            nav_index.add_entry(node_id, nav_entry);
+            nav_entries_count += 1;
+
+            // Build child routes for this node's children
+            let child_ids: Vec<NodeId> = tree.children_iter(node_id).collect();
+            let mut routes = Vec::with_capacity(child_ids.len());
+
+            for child_id in child_ids {
+                let child_lc = *leaf_counts.get(&child_id).unwrap_or(&0);
+                let route = Self::build_child_route(tree, child_id, child_lc);
+                routes.push(route);
+                child_routes_count += 1;
+            }
+
+            debug!(
+                "[navigation_index]   node '{}' → {} child routes ({} leaves in subtree)",
+                tree.get(node_id).map(|n| n.title.as_str()).unwrap_or("?"),
+                routes.len(),
+                lc,
+            );
+
+            nav_index.add_child_routes(node_id, routes);
+        }
+
+        // Phase 3: Build DocCard from root-level data (already computed, zero LLM).
+        // Provides a compact document summary for multi-document Orchestrator Agent.
+        if let Some(root_entry) = nav_index.get_entry(tree.root()) {
+            let sections: Vec<crate::document::SectionCard> = nav_index
+                .get_child_routes(tree.root())
+                .map(|routes| {
+                    routes
+                        .iter()
+                        .map(|r| crate::document::SectionCard {
+                            title: r.title.clone(),
+                            description: r.description.clone(),
+                            leaf_count: r.leaf_count,
+                        })
+                        .collect()
+                })
+                .unwrap_or_default();
+
+            let doc_card = crate::document::DocCard {
+                title: tree
+                    .get(tree.root())
+                    .map(|n| n.title.clone())
+                    .unwrap_or_default(),
+                overview: root_entry.overview.clone(),
+                question_hints: root_entry.question_hints.clone(),
+                topic_tags: root_entry.topic_tags.clone(),
+                sections,
+                total_leaves: root_entry.leaf_count,
+            };
+            nav_index.set_doc_card(doc_card);
+
+            debug!(
+                "[navigation_index] Phase 3: Built DocCard — {} sections, {} total leaves",
+                nav_index.doc_card().map(|c| c.sections.len()).unwrap_or(0),
+                nav_index.doc_card().map(|c| c.total_leaves).unwrap_or(0),
+            );
+        } else {
+            debug!("[navigation_index] Phase 3: Skipped DocCard (no root entry)");
+        }
+
+        let duration = start.elapsed().as_millis() as u64;
+
+        ctx.metrics
+            .record_navigation_index(duration, nav_entries_count, child_routes_count);
+
+        info!(
+            "[navigation_index] Complete: {} nav entries, {} child routes in {}ms",
+            nav_entries_count, child_routes_count, duration,
+        );
+
+        ctx.navigation_index = Some(nav_index);
+
+        let mut stage_result = StageResult::success("navigation_index");
+        stage_result.duration_ms = duration;
+        stage_result.metadata.insert(
+            "nav_entries".to_string(),
+            serde_json::json!(nav_entries_count),
+        );
+        stage_result.metadata.insert(
+            "child_routes".to_string(),
+            serde_json::json!(child_routes_count),
+        );
+
+        Ok(stage_result)
+    }
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::document::DocumentTree;
+
+    fn build_test_tree() -> DocumentTree {
+        let mut tree = DocumentTree::new("Root", "root content");
+        let root = tree.root();
+
+        let sec1 = tree.add_child(root, "Section 1", "section 1 content");
+        let _sec1_1 = tree.add_child(sec1, "Section 1.1", "s1.1 content");
+        let _sec1_2 = tree.add_child(sec1, "Section 1.2", "s1.2 content");
+
+        let sec2 = tree.add_child(root, "Section 2", "section 2 content");
+        let _sec2_1 = tree.add_child(sec2, "Section 2.1", "s2.1 content");
+
+        // Set some summaries
+        tree.set_summary(root, "A comprehensive guide");
+        tree.set_summary(sec1, "Getting started with setup");
+        tree.set_summary(sec2, "Advanced configuration");
+
+        tree
+    }
+
+    #[test]
+    fn test_count_leaves() {
+        let tree = build_test_tree();
+        let root = tree.root();
+
+        // Root has 3 leaves: 1.1, 1.2, 2.1
+        assert_eq!(NavigationIndexStage::count_leaves(&tree, root), 3);
+    }
+
+    #[test]
+    fn test_count_leaves_single_node() {
+        let tree = DocumentTree::new("Root", "content");
+        let root = tree.root();
+
+        assert_eq!(NavigationIndexStage::count_leaves(&tree, root), 1);
+    }
+
+    #[test]
+    fn test_build_nav_entry_with_summary() {
+        let tree = build_test_tree();
+        let root = tree.root();
+
+        let entry = NavigationIndexStage::build_nav_entry(&tree, root, 3);
+        assert_eq!(entry.overview, "A comprehensive guide");
+        assert_eq!(entry.leaf_count, 3);
+        assert_eq!(entry.level, 0);
+    }
+
+    #[test]
+    fn test_build_nav_entry_without_summary() {
+        let tree = DocumentTree::new("Root", "content");
+        let root = tree.root();
+
+        let entry = NavigationIndexStage::build_nav_entry(&tree, root, 1);
+        assert_eq!(entry.overview, "Root");
+    }
+
+    #[test]
+    fn test_build_child_route() {
+        let tree = build_test_tree();
+        let root = tree.root();
+        let children: Vec<_> = tree.children_iter(root).collect();
+
+        let route = NavigationIndexStage::build_child_route(&tree, children[0], 2);
+        assert_eq!(route.title, "Section 1");
+        assert_eq!(route.leaf_count, 2);
+    }
+
+    #[test]
+    fn test_stage_config() {
+        let stage = NavigationIndexStage::new();
+        assert_eq!(stage.name(), "navigation_index");
+        assert!(stage.is_optional());
+        assert_eq!(stage.depends_on(), vec!["enrich"]);
+
+        let ap = stage.access_pattern();
+        assert!(ap.reads_tree);
+        assert!(ap.writes_navigation_index);
+        assert!(!ap.writes_tree);
+        assert!(!ap.writes_reasoning_index);
+    }
+
+    #[tokio::test]
+    async fn test_execute_end_to_end() {
+        // Build a 3-level tree: Root -> [Sec1 -> [1.1, 1.2], Sec2 -> [2.1]]
+        let mut tree = DocumentTree::new("Root", "root content");
+        let root = tree.root();
+        let sec1 = tree.add_child(root, "Section 1", "s1 content");
+        let _sec1_1 = tree.add_child(sec1, "Section 1.1", "s1.1 content");
+        let _sec1_2 = tree.add_child(sec1, "Section 1.2", "s1.2 content");
+        let sec2 = tree.add_child(root, "Section 2", "s2 content");
+        let _sec2_1 = tree.add_child(sec2, "Section 2.1", "s2.1 content");
+
+        tree.set_summary(root, "A comprehensive guide");
+        tree.set_summary(sec1, "Getting started");
+
+        // Build context with the tree
+        let mut ctx = IndexContext::new(
+            crate::index::pipeline::IndexInput::content("test"),
+            crate::index::config::PipelineOptions::default(),
+        );
+        ctx.tree = Some(tree);
+
+        // Execute the stage
+        let mut stage = NavigationIndexStage::new();
+        let result = stage.execute(&mut ctx).await;
+
+        assert!(result.is_ok());
+        let stage_result = result.unwrap();
+        assert!(stage_result.success);
+        assert_eq!(
+            stage_result.metadata["nav_entries"],
+            serde_json::json!(3) // root, sec1, sec2
+        );
+        assert_eq!(
+            stage_result.metadata["child_routes"],
+            serde_json::json!(5) // root→2 + sec1→2 + sec2→1
+        );
+
+        // Verify the index structure
+        let nav_index = ctx.navigation_index.unwrap();
+        assert_eq!(nav_index.entry_count(), 3); // 3 non-leaf nodes
+        assert_eq!(nav_index.total_child_routes(), 5);
+
+        // Root entry
+        let root_id = ctx.tree.as_ref().unwrap().root();
+        let root_entry = nav_index.get_entry(root_id).unwrap();
+        assert_eq!(root_entry.overview, "A comprehensive guide");
+        assert_eq!(root_entry.leaf_count, 3);
+        assert_eq!(root_entry.level, 0);
+
+        // Root child routes
+        let root_routes = nav_index.get_child_routes(root_id).unwrap();
+        assert_eq!(root_routes.len(), 2);
+        assert_eq!(root_routes[0].title, "Section 1");
+        assert_eq!(root_routes[0].leaf_count, 2);
+        assert_eq!(root_routes[1].title, "Section 2");
+        assert_eq!(root_routes[1].leaf_count, 1);
+    }
+
+    #[tokio::test]
+    async fn test_execute_single_leaf_tree() {
+        // Single node = root is leaf → no non-leaf nodes → empty index
+        let tree = DocumentTree::new("Root", "content");
+
+        let mut ctx = IndexContext::new(
+            crate::index::pipeline::IndexInput::content("test"),
+            crate::index::config::PipelineOptions::default(),
+        );
+        ctx.tree = Some(tree);
+
+        let mut stage = NavigationIndexStage::new();
+        let result = stage.execute(&mut ctx).await;
+
+        assert!(result.is_ok());
+        assert!(stage_result_is_success(&result));
+
+        let nav_index = ctx.navigation_index.unwrap();
+        assert_eq!(nav_index.entry_count(), 0);
+        assert_eq!(nav_index.total_child_routes(), 0);
+    }
+
+    #[tokio::test]
+    async fn test_execute_no_tree() {
+        let ctx = IndexContext::new(
+            crate::index::pipeline::IndexInput::content("test"),
+            crate::index::config::PipelineOptions::default(),
+        );
+        // ctx.tree is None
+
+        let mut stage = NavigationIndexStage::new();
+        // Can't move ctx since tree is None, construct manually
+        let mut ctx = ctx;
+        ctx.tree = None;
+
+        let result = stage.execute(&mut ctx).await.unwrap();
+        assert!(!result.success);
+        assert!(ctx.navigation_index.is_none());
+    }
+
+    #[test]
+    fn test_build_child_route_no_summary_has_content() {
+        // Node with content but no summary → description = truncated content
+        let mut tree = DocumentTree::new("Root", "");
+        let root = tree.root();
+        let child = tree.add_child(root, "Child", "this is a long content string that exceeds 100 characters and should be truncated when used as a fallback description for the child route");
+
+        let route = NavigationIndexStage::build_child_route(&tree, child, 1);
+        assert_eq!(route.title, "Child");
+        // description should be truncated content, not the full string
+        assert!(route.description.len() <= 100);
+        assert!(route.description.starts_with("this is a long"));
+    }
+
+    #[test]
+    fn test_build_child_route_no_summary_no_content() {
+        // Node with neither summary nor content → description = title
+        let mut tree = DocumentTree::new("Root", "");
+        let root = tree.root();
+        let child = tree.add_child(root, "Orphan Section", "");
+        // Clear any auto-generated content
+        tree.set_summary(child, "");
+
+        let route = NavigationIndexStage::build_child_route(&tree, child, 1);
+        assert_eq!(route.title, "Orphan Section");
+        // Fallback: description = title when no summary and no content
+        assert_eq!(route.description, "Orphan Section");
+    }
+
+    #[test]
+    fn test_build_child_route_with_summary() {
+        let mut tree = DocumentTree::new("Root", "");
+        let root = tree.root();
+        let child = tree.add_child(root, "Child", "some content");
+        tree.set_summary(child, "A concise summary");
+
+        let route = NavigationIndexStage::build_child_route(&tree, child, 1);
+        assert_eq!(route.description, "A concise summary");
+    }
+
+    #[test]
+    fn test_build_nav_entry_depth_tracking() {
+        // Verify that depth/level is correctly captured from the tree
+        let mut tree = DocumentTree::new("Root", "");
+        let root = tree.root();
+        let sec1 = tree.add_child(root, "S1", "");
+        let sec1_1 = tree.add_child(sec1, "S1.1", "leaf");
+        tree.set_summary(root, "Root overview");
+        tree.set_summary(sec1, "Section overview");
+
+        let root_entry = NavigationIndexStage::build_nav_entry(&tree, root, 3);
+        assert_eq!(root_entry.level, 0);
+
+        let sec1_entry = NavigationIndexStage::build_nav_entry(&tree, sec1, 1);
+        assert_eq!(sec1_entry.level, 1);
+
+        // Leaf node should still return valid NavEntry if called
+        let leaf_entry = NavigationIndexStage::build_nav_entry(&tree, sec1_1, 1);
+        assert_eq!(leaf_entry.level, 2);
+        assert_eq!(leaf_entry.overview, "S1.1"); // no summary → fallback to title
+    }
+
+    #[test]
+    fn test_count_leaves_subtree() {
+        // Verify leaf count is correct for a subtree, not the entire tree
+        let mut tree = DocumentTree::new("Root", "");
+        let root = tree.root();
+        let sec1 = tree.add_child(root, "S1", "");
+        let _s1a = tree.add_child(sec1, "S1.A", "leaf");
+        let _s1b = tree.add_child(sec1, "S1.B", "leaf");
+        let _s1c = tree.add_child(sec1, "S1.C", "leaf");
+        let sec2 = tree.add_child(root, "S2", "");
+        let _s2a = tree.add_child(sec2, "S2.A", "leaf");
+
+        // sec1 subtree has 3 leaves
+        assert_eq!(NavigationIndexStage::count_leaves(&tree, sec1), 3);
+        // sec2 subtree has 1 leaf
+        assert_eq!(NavigationIndexStage::count_leaves(&tree, sec2), 1);
+        // root has 4 leaves total
+        assert_eq!(NavigationIndexStage::count_leaves(&tree, root), 4);
+    }
+
+    /// Helper to check success without destructuring.
+    fn stage_result_is_success(result: &Result<StageResult>) -> bool {
+        result.as_ref().map(|r| r.success).unwrap_or(false)
+    }
+}
diff --git a/rust/src/index/stages/optimize.rs b/rust/src/index/stages/optimize.rs
index 209de7e2..8186d494 100644
--- a/rust/src/index/stages/optimize.rs
+++ b/rust/src/index/stages/optimize.rs
@@ -5,7 +5,7 @@
 
 use super::{AccessPattern, async_trait};
 use std::time::Instant;
-use tracing::info;
+use tracing::{debug, info};
 
 use crate::document::NodeId;
 use crate::error::Result;
@@ -166,7 +166,7 @@ impl IndexStage for OptimizeStage {
     }
 
     fn depends_on(&self) -> Vec<&'static str> {
-        vec!["enrich"]
+        vec!["enrich", "navigation_index"]
     }
 
     fn access_pattern(&self) -> AccessPattern {
@@ -182,7 +182,7 @@ impl IndexStage for OptimizeStage {
 
         let config = &ctx.options.optimization;
         if !config.enabled {
-            info!("Tree optimization disabled, skipping");
+            debug!("[optimize] Disabled, skipping");
             return Ok(StageResult::success("optimize"));
         }
 
@@ -191,25 +191,39 @@ impl IndexStage for OptimizeStage {
             .as_mut()
             .ok_or_else(|| crate::Error::IndexBuild("Tree not built".to_string()))?;
 
+        let node_count = tree.node_count();
+        info!(
+            "[optimize] Starting: {} nodes, merge_threshold={}",
+            node_count, config.merge_leaf_threshold,
+        );
+
         let mut merged_count = 0;
 
         // 1. Merge small leaves
         if config.merge_leaf_threshold > 0 {
             merged_count =
                 Self::merge_small_leaves(tree, config.merge_leaf_threshold, &mut ctx.metrics);
-            info!("Merged {} small leaf nodes", merged_count);
+            if merged_count > 0 {
+                debug!("[optimize] Merged {} small leaf nodes", merged_count);
+            }
         }
 
         // 2. Remove empty intermediate nodes
         let removed_count = Self::remove_empty_nodes(tree);
         if removed_count > 0 {
-            info!("Marked {} empty intermediate nodes", removed_count);
+            debug!(
+                "[optimize] Marked {} empty intermediate nodes",
+                removed_count
+            );
         }
 
         let duration = start.elapsed().as_millis() as u64;
         ctx.metrics.record_optimize(duration);
 
-        info!("Optimized tree in {}ms", duration);
+        info!(
+            "[optimize] Complete: {} merged, {} emptied in {}ms",
+            merged_count, removed_count, duration
+        );
 
         let mut stage_result = StageResult::success("optimize");
         stage_result.duration_ms = duration;
@@ -224,3 +238,218 @@ impl IndexStage for OptimizeStage {
         Ok(stage_result)
     }
 }
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+    use crate::document::DocumentTree;
+    use crate::index::PipelineOptions;
+    use crate::index::pipeline::IndexContext;
+    use crate::index::pipeline::IndexInput;
+
+    /// Create a tree with small leaf children under root for merge tests.
+    ///
+    /// ```text
+    /// Root
+    /// ├── Leaf A (50 tokens)
+    /// ├── Leaf B (30 tokens)   ← should merge with Leaf A
+    /// ├── Leaf C (200 tokens)  ← too large, not merged
+    /// └── Leaf D (40 tokens)   ← no adjacent small sibling
+    /// ```
+    fn make_merge_test_tree() -> DocumentTree {
+        let mut tree = DocumentTree::new("Root", "");
+        let root = tree.root();
+
+        let a = tree.add_child(root, "Leaf A", "content A");
+        let b = tree.add_child(root, "Leaf B", "content B");
+        let c = tree.add_child(root, "Leaf C", "content C long");
+        let d = tree.add_child(root, "Leaf D", "content D");
+
+        // Set token counts
+        if let Some(n) = tree.get_mut(a) {
+            n.token_count = Some(50);
+        }
+        if let Some(n) = tree.get_mut(b) {
+            n.token_count = Some(30);
+        }
+        if let Some(n) = tree.get_mut(c) {
+            n.token_count = Some(200);
+        }
+        if let Some(n) = tree.get_mut(d) {
+            n.token_count = Some(40);
+        }
+
+        tree
+    }
+
+    #[test]
+    fn test_merge_small_leaves_merges_adjacent_pair() {
+        let mut tree = make_merge_test_tree();
+        let root = tree.root();
+        let mut metrics = crate::index::pipeline::IndexMetrics::new();
+
+        // Threshold 100: Leaf A (50) and Leaf B (30) should merge
+        let merged = OptimizeStage::merge_small_leaves(&mut tree, 100, &mut metrics);
+
+        assert_eq!(merged, 1);
+        assert_eq!(metrics.nodes_merged, 1);
+
+        // Leaf B should be marked as merged
+        let children = tree.children(root);
+        let leaf_b = children.iter().find(|&&id| {
+            tree.get(id)
+                .map(|n| n.title.starts_with("[MERGED"))
+                .unwrap_or(false)
+        });
+        assert!(leaf_b.is_some(), "Leaf B should be marked as merged");
+    }
+
+    #[test]
+    fn test_merge_small_leaves_nothing_above_threshold() {
+        let mut tree = make_merge_test_tree();
+        let mut metrics = crate::index::pipeline::IndexMetrics::new();
+
+        // Threshold 10: all leaves are above this, nothing merges
+        let merged = OptimizeStage::merge_small_leaves(&mut tree, 10, &mut metrics);
+        assert_eq!(merged, 0);
+    }
+
+    #[test]
+    fn test_merge_small_leaves_preserves_content() {
+        let mut tree = DocumentTree::new("Root", "");
+        let root = tree.root();
+        let a = tree.add_child(root, "A", "hello");
+        let b = tree.add_child(root, "B", "world");
+        if let Some(n) = tree.get_mut(a) {
+            n.token_count = Some(5);
+        }
+        if let Some(n) = tree.get_mut(b) {
+            n.token_count = Some(5);
+        }
+
+        let mut metrics = crate::index::pipeline::IndexMetrics::new();
+        let _ = OptimizeStage::merge_small_leaves(&mut tree, 100, &mut metrics);
+
+        // Leaf A should now contain both contents with heading prefix
+        let a_node = tree.get(a).unwrap();
+        assert!(a_node.content.contains("hello"));
+        assert!(a_node.content.contains("## B"));
+        assert!(a_node.content.contains("world"));
+        assert_eq!(a_node.token_count, Some(10));
+    }
+
+    #[test]
+    fn test_merge_small_leaves_skips_non_leaf() {
+        let mut tree = DocumentTree::new("Root", "");
+        let root = tree.root();
+
+        // Section is a non-leaf (has a child), should not be merged even if small
+        let section = tree.add_child(root, "Section", "section content");
+        let _sub = tree.add_child(section, "Sub", "sub content");
+        let leaf = tree.add_child(root, "Leaf", "leaf content");
+
+        if let Some(n) = tree.get_mut(section) {
+            n.token_count = Some(5);
+        }
+        if let Some(n) = tree.get_mut(leaf) {
+            n.token_count = Some(5);
+        }
+
+        let mut metrics = crate::index::pipeline::IndexMetrics::new();
+        let merged = OptimizeStage::merge_small_leaves(&mut tree, 100, &mut metrics);
+
+        // Section is non-leaf, only Leaf is a leaf — no adjacent pair of leaves
+        assert_eq!(merged, 0);
+    }
+
+    #[test]
+    fn test_remove_empty_nodes_marks_single_child_empty() {
+        let mut tree = DocumentTree::new("Root", "");
+        let root = tree.root();
+
+        // Empty intermediate with single child
+        let section = tree.add_child(root, "Section", "");
+        let _leaf = tree.add_child(section, "Leaf", "content");
+
+        let removed = OptimizeStage::remove_empty_nodes(&mut tree);
+        assert_eq!(removed, 1);
+
+        let section_node = tree.get(section).unwrap();
+        assert!(section_node.title.starts_with("[EMPTY"));
+    }
+
+    #[test]
+    fn test_remove_empty_nodes_skips_root() {
+        let mut tree = DocumentTree::new("Root", "");
+        let _child = tree.add_child(tree.root(), "Child", "content");
+
+        let removed = OptimizeStage::remove_empty_nodes(&mut tree);
+        assert_eq!(removed, 0);
+    }
+
+    #[test]
+    fn test_remove_empty_nodes_skips_leaves() {
+        let mut tree = DocumentTree::new("Root", "");
+        let root = tree.root();
+        let leaf = tree.add_child(root, "Leaf", "");
+
+        let removed = OptimizeStage::remove_empty_nodes(&mut tree);
+        assert_eq!(removed, 0, "Leaves should not be removed");
+
+        // Verify the leaf is indeed a leaf
+        assert!(tree.is_leaf(leaf));
+    }
+
+    #[test]
+    fn test_remove_empty_nodes_skips_multi_child() {
+        let mut tree = DocumentTree::new("Root", "");
+        let root = tree.root();
+        let section = tree.add_child(root, "Section", "");
+        let _c1 = tree.add_child(section, "C1", "a");
+        let _c2 = tree.add_child(section, "C2", "b");
+
+        let removed = OptimizeStage::remove_empty_nodes(&mut tree);
+        assert_eq!(
+            removed, 0,
+            "Nodes with multiple children should not be removed"
+        );
+    }
+
+    #[test]
+    fn test_remove_empty_nodes_skips_non_empty() {
+        let mut tree = DocumentTree::new("Root", "");
+        let root = tree.root();
+        let section = tree.add_child(root, "Section", "has content");
+        let _leaf = tree.add_child(section, "Leaf", "content");
+
+        let removed = OptimizeStage::remove_empty_nodes(&mut tree);
+        assert_eq!(removed, 0);
+    }
+
+    #[tokio::test]
+    async fn test_optimize_disabled_skips() {
+        let mut stage = OptimizeStage::new();
+        assert_eq!(stage.name(), "optimize");
+        assert!(stage.is_optional());
+        assert_eq!(stage.depends_on(), vec!["enrich", "navigation_index"]);
+
+        let mut options = PipelineOptions::default();
+        options.optimization.enabled = false;
+
+        let input = IndexInput::content("# Test\nHello");
+        let mut ctx = IndexContext::new(input, options);
+        ctx.tree = Some(DocumentTree::new("Root", "content"));
+
+        let result = stage.execute(&mut ctx).await.unwrap();
+        assert!(result.success);
+    }
+
+    #[test]
+    fn test_merge_small_leaves_empty_tree() {
+        let mut tree = DocumentTree::new("Root", "");
+        let mut metrics = crate::index::pipeline::IndexMetrics::new();
+
+        let merged = OptimizeStage::merge_small_leaves(&mut tree, 100, &mut metrics);
+        assert_eq!(merged, 0, "Root with no children should merge nothing");
+    }
+}
diff --git a/rust/src/index/stages/parse.rs b/rust/src/index/stages/parse.rs
index 2ca30a14..b0e542f6 100644
--- a/rust/src/index/stages/parse.rs
+++ b/rust/src/index/stages/parse.rs
@@ -5,7 +5,7 @@
 
 use super::async_trait;
 use std::time::Instant;
-use tracing::info;
+use tracing::{debug, info};
 
 use crate::error::Result;
 use crate::index::parse::DocumentFormat;
@@ -70,9 +70,15 @@ impl IndexStage for ParseStage {
         let format = self.detect_format(ctx)?;
         ctx.format = format;
 
-        info!("Parsing document with format: {:?}", format);
+        let input_type = match &ctx.input {
+            IndexInput::File(_) => "file",
+            IndexInput::Content { .. } => "content",
+            IndexInput::Bytes { .. } => "bytes",
+        };
         info!(
-            "ParseStage llm_client present: {}",
+            "[parse] Starting: format={:?}, input={}, llm={}",
+            format,
+            input_type,
             self.llm_client.is_some()
         );
 
@@ -90,6 +96,8 @@ impl IndexStage for ParseStage {
                     .unwrap_or("document")
                     .to_string();
 
+                debug!("[parse] Reading file: {:?}", ctx.source_path);
+
                 // Parse directly
                 crate::index::parse::parse_file(&path, format, self.llm_client.clone()).await?
             }
@@ -101,6 +109,8 @@ impl IndexStage for ParseStage {
                 // Set name
                 ctx.name = name.clone();
 
+                debug!("[parse] Parsing inline content ({} chars)", content.len());
+
                 // Parse content directly
                 crate::index::parse::parse_content(content, *format, self.llm_client.clone())
                     .await?
@@ -109,6 +119,8 @@ impl IndexStage for ParseStage {
                 // Set name
                 ctx.name = name.clone();
 
+                debug!("[parse] Parsing bytes ({} bytes)", data.len());
+
                 // Parse bytes
                 crate::index::parse::parse_bytes(data, *format, self.llm_client.clone()).await?
             }
@@ -121,6 +133,7 @@ impl IndexStage for ParseStage {
         // Store metadata
         if let Some(page_count) = result.meta.page_count {
             ctx.page_count = Some(page_count);
+            debug!("[parse] Document has {} pages", page_count);
         }
         ctx.line_count = Some(result.meta.line_count);
 
@@ -132,7 +145,7 @@ impl IndexStage for ParseStage {
         ctx.metrics.record_parse(duration);
 
         info!(
-            "Parsed {} nodes from {} ({}ms)",
+            "[parse] Complete: {} nodes from '{}' ({}ms)",
             ctx.raw_nodes.len(),
             ctx.name,
             duration
diff --git a/rust/src/index/stages/reasoning.rs b/rust/src/index/stages/reasoning.rs
index c85a175b..612c0b38 100644
--- a/rust/src/index/stages/reasoning.rs
+++ b/rust/src/index/stages/reasoning.rs
@@ -7,8 +7,9 @@
 //! calculates metadata) and before OptimizeStage. It builds a
 //! [`ReasoningIndex`] from the document tree's TOC, summaries, and keywords.
 
+use std::collections::HashMap;
 use std::time::Instant;
-use tracing::info;
+use tracing::{debug, info, warn};
 
 use crate::document::{
     NodeId, ReasoningIndexBuilder, ReasoningIndexConfig, SectionSummary, SummaryShortcut,
@@ -16,7 +17,7 @@ use crate::document::{
 };
 use crate::error::Result;
 use crate::llm::LlmClient;
-use crate::retrieval::scoring::extract_keywords;
+use crate::scoring::extract_keywords;
 
 use super::async_trait;
 use super::{AccessPattern, IndexStage, StageResult};
@@ -57,9 +58,8 @@ impl ReasoningIndexStage {
     fn build_topic_paths(
         tree: &crate::document::DocumentTree,
         config: &ReasoningIndexConfig,
-    ) -> (std::collections::HashMap<String, Vec<TopicEntry>>, usize) {
-        let mut keyword_nodes: std::collections::HashMap<String, Vec<(NodeId, f32, usize)>> =
-            std::collections::HashMap::new();
+    ) -> (HashMap<String, Vec<TopicEntry>>, usize) {
+        let mut keyword_nodes: HashMap<String, Vec<(NodeId, f32, usize)>> = HashMap::new();
 
         // Walk all nodes and extract keywords from title + summary
         for node_id in tree.traverse() {
@@ -106,13 +106,11 @@ impl ReasoningIndexStage {
         let keyword_count = sorted_keywords.len();
 
         // Build topic_paths: merge duplicate (keyword, node) pairs
-        let mut topic_paths: std::collections::HashMap<String, Vec<TopicEntry>> =
-            std::collections::HashMap::new();
+        let mut topic_paths: HashMap<String, Vec<TopicEntry>> = HashMap::new();
 
         for (keyword, entries) in sorted_keywords {
             // Merge duplicate node entries by summing weights
-            let mut merged: std::collections::HashMap<NodeId, (f32, usize)> =
-                std::collections::HashMap::new();
+            let mut merged: HashMap<NodeId, (f32, usize)> = HashMap::new();
             for (node_id, weight, depth) in entries {
                 let entry = merged.entry(node_id).or_insert((0.0, depth));
                 entry.0 += weight;
@@ -149,10 +147,8 @@ impl ReasoningIndexStage {
     }
 
     /// Build section map from depth-1 nodes.
-    fn build_section_map(
-        tree: &crate::document::DocumentTree,
-    ) -> std::collections::HashMap<String, NodeId> {
-        let mut section_map = std::collections::HashMap::new();
+    fn build_section_map(tree: &crate::document::DocumentTree) -> HashMap<String, NodeId> {
+        let mut section_map = HashMap::new();
         let root = tree.root();
         for child_id in tree.children(root) {
             if let Some(node) = tree.get(child_id) {
@@ -166,13 +162,13 @@ impl ReasoningIndexStage {
         section_map
     }
 
-    /// Expand keywords with LLM-generated synonyms.
+    /// Expand keywords with LLM-generated synonyms (single batch request).
     ///
-    /// For each existing keyword in `topic_paths`, ask the LLM for synonymous
-    /// search terms. Synonym entries inherit the same node mappings but with
+    /// Sends all keywords to the LLM in one request and maps each to its
+    /// synonyms. Synonym entries inherit the same node mappings but with
     /// a reduced weight (0.6x) to reflect the indirect match.
     async fn expand_synonyms(
-        topic_paths: &mut std::collections::HashMap<String, Vec<TopicEntry>>,
+        topic_paths: &mut HashMap<String, Vec<TopicEntry>>,
         llm_client: &LlmClient,
         max_keywords: usize,
     ) -> usize {
@@ -182,54 +178,78 @@ impl ReasoningIndexStage {
         // Pick top keywords by entry count for synonym expansion
         let mut ranked: Vec<(String, usize)> = topic_paths
             .iter()
-            .map(|(k, v)| (k.clone(), v.len()))
+            .map(|(k, v): (&String, &Vec<TopicEntry>)| (k.clone(), v.len()))
             .collect();
         ranked.sort_by(|a, b| b.1.cmp(&a.1));
         ranked.truncate(max_keywords);
 
-        let mut synonym_count = 0;
+        let keyword_count = ranked.len();
+        if keyword_count == 0 {
+            return 0;
+        }
+
+        tracing::info!(
+            "[reasoning_index] Expanding synonyms for {} keywords (single request)",
+            keyword_count,
+        );
+
+        // Snapshot the source entries for each keyword.
+        let source_entries: HashMap<String, Vec<TopicEntry>> = ranked
+            .iter()
+            .map(|(kw, _): &(String, usize)| {
+                (kw.clone(), topic_paths.get(kw).cloned().unwrap_or_default())
+            })
+            .collect();
+
+        let keywords: Vec<String> = ranked.into_iter().map(|(kw, _)| kw).collect();
+
+        let system = "You are a thesaurus assistant. For each keyword, provide up to 5 synonyms \
+            or related search terms. Return ONLY a valid JSON object mapping each keyword to an \
+            array of synonym strings. No explanation, no markdown.";
+        let user_prompt = format!(
+            "Keywords: {}\n\nReturn a JSON object: {{\"keyword\": [\"syn1\", \"syn2\"], ...}}",
+            keywords.join(", ")
+        );
+
+        let synonym_map: HashMap<String, Vec<String>> = match llm_client
+            .complete_json::<HashMap<String, Vec<String>>>(system, &user_prompt)
+            .await
+        {
+            Ok(map) => map
+                .into_iter()
+                .map(|(k, v): (String, Vec<String>)| (k.to_lowercase(), v))
+                .collect(),
+            Err(e) => {
+                tracing::warn!("[reasoning_index] Batch synonym expansion failed: {}", e);
+                return 0;
+            }
+        };
 
-        for (keyword, _) in &ranked {
-            let prompt = format!(
-                "List up to 5 synonyms or related search terms for \"{}\". \
-                 Return only the terms separated by commas, no numbering, no explanation.",
-                keyword
-            );
-
-            match llm_client
-                .complete(
-                    "You are a thesaurus assistant. Return only comma-separated synonyms.",
-                    &prompt,
-                )
-                .await
-            {
-                Ok(response) => {
-                    let synonyms: Vec<String> = response
-                        .to_lowercase()
-                        .split(',')
-                        .map(|s| s.trim().to_string())
-                        .filter(|s| !s.is_empty() && s.len() >= 2 && !existing_keys.contains(s))
-                        .collect();
-
-                    if let Some(entries) = topic_paths.get(keyword) {
-                        let source_entries = entries.clone();
-                        for syn in synonyms {
-                            let synonym_entries: Vec<TopicEntry> = source_entries
-                                .iter()
-                                .map(|e| TopicEntry {
-                                    node_id: e.node_id,
-                                    weight: e.weight * 0.6,
-                                    depth: e.depth,
-                                })
-                                .collect();
-                            topic_paths.insert(syn, synonym_entries);
-                            synonym_count += 1;
+        // Write results back
+        let mut synonym_count = 0;
+        for keyword in &keywords {
+            if let Some(synonyms) = synonym_map.get(keyword) {
+                if let Some(entries) = source_entries.get(keyword) {
+                    for syn in synonyms {
+                        let syn_clean = syn.trim().to_lowercase();
+                        if syn_clean.is_empty()
+                            || syn_clean.len() < 2
+                            || existing_keys.contains(&syn_clean)
+                        {
+                            continue;
                         }
+                        let synonym_entries: Vec<TopicEntry> = entries
+                            .iter()
+                            .map(|e| TopicEntry {
+                                node_id: e.node_id,
+                                weight: e.weight * 0.6,
+                                depth: e.depth,
+                            })
+                            .collect();
+                        topic_paths.insert(syn_clean, synonym_entries);
+                        synonym_count += 1;
                     }
                 }
-                Err(e) => {
-                    tracing::warn!("Synonym expansion failed for '{}': {}", keyword, e);
-                }
             }
         }
 
@@ -311,7 +331,7 @@ impl IndexStage for ReasoningIndexStage {
 
         // Check if enabled via pipeline options
         if !ctx.options.reasoning_index.enabled {
-            info!("Reasoning index stage disabled, skipping");
+            info!("[reasoning_index] Disabled, skipping");
             return Ok(StageResult::success("reasoning_index"));
         }
 
@@ -321,17 +341,26 @@ impl IndexStage for ReasoningIndexStage {
         let tree = match ctx.tree.as_ref() {
             Some(t) => t,
             None => {
+                warn!("[reasoning_index] No tree, cannot build index");
                 return Ok(StageResult::failure("reasoning_index", "Tree not built"));
             }
         };
 
-        info!("Building reasoning index...");
+        info!(
+            "[reasoning_index] Starting: synonyms={}, summary_shortcut={}, max_keywords={}",
+            config.enable_synonym_expansion,
+            config.build_summary_shortcut,
+            config.max_keyword_entries,
+        );
 
         // 1. Build topic-to-path mapping
         let (mut topic_paths, keyword_count) = Self::build_topic_paths(tree, config);
-        let topic_count: usize = topic_paths.values().map(|v| v.len()).sum();
-        info!(
-            "Built topic paths: {} keywords, {} topic entries",
+        let topic_count: usize = topic_paths
+            .values()
+            .map(|v: &Vec<TopicEntry>| v.len())
+            .sum();
+        debug!(
+            "[reasoning_index] Topic paths: {} keywords, {} entries",
             keyword_count, topic_count
         );
 
@@ -340,10 +369,12 @@ impl IndexStage for ReasoningIndexStage {
             if let Some(ref llm_client) = ctx.llm_client {
                 let max_kw = (keyword_count / 4).max(20).min(100);
                 let count = Self::expand_synonyms(&mut topic_paths, llm_client, max_kw).await;
-                info!("Expanded {} synonym keywords", count);
+                if count > 0 {
+                    info!("[reasoning_index] Expanded {} synonym keywords", count);
+                }
                 count
             } else {
-                info!("Synonym expansion enabled but no LLM client available");
+                debug!("[reasoning_index] Synonym expansion enabled but no LLM client");
                 0
             }
         } else {
@@ -352,13 +383,16 @@ impl IndexStage for ReasoningIndexStage {
 
         // 2. Build section map
         let section_map = Self::build_section_map(tree);
-        info!("Built section map with {} entries", section_map.len());
+        debug!(
+            "[reasoning_index] Section map: {} entries",
+            section_map.len()
+        );
 
         // 3. Build summary shortcut
         let summary_shortcut = if config.build_summary_shortcut {
             let shortcut = Self::build_summary_shortcut(tree);
             if shortcut.is_some() {
-                info!("Built summary shortcut");
+                debug!("[reasoning_index] Built summary shortcut");
             }
             shortcut
         } else {
@@ -387,12 +421,12 @@ impl IndexStage for ReasoningIndexStage {
             .record_reasoning_index(duration, topic_count, keyword_count);
 
         info!(
-            "Reasoning index built in {}ms ({} keywords, {} topic entries, {} sections, {} synonyms)",
-            duration,
+            "[reasoning_index] Complete: {} keywords, {} topics, {} sections, {} synonyms in {}ms",
             keyword_count,
             topic_count,
             reasoning_index.section_count(),
             synonym_count,
+            duration,
         );
 
         ctx.reasoning_index = Some(reasoning_index);
@@ -444,4 +478,166 @@ mod tests {
         assert!(stage.is_optional());
         assert_eq!(stage.depends_on(), vec!["enrich"]);
     }
+
+    #[test]
+    fn test_build_topic_paths_basic() {
+        use crate::document::ReasoningIndexConfig;
+
+        let mut tree = crate::document::DocumentTree::new("Root", "");
+        let root = tree.root();
+        let c1 = tree.add_child(root, "Machine Learning Introduction", "");
+        let c2 = tree.add_child(root, "Deep Learning Methods", "");
+
+        // Set summaries for keyword extraction
+        if let Some(n) = tree.get_mut(c1) {
+            n.summary = "An overview of machine learning algorithms".to_string();
+        }
+        if let Some(n) = tree.get_mut(c2) {
+            n.summary = "Advanced deep learning techniques".to_string();
+        }
+
+        let config = ReasoningIndexConfig::default();
+        let (topic_paths, keyword_count) = ReasoningIndexStage::build_topic_paths(&tree, &config);
+
+        assert!(
+            keyword_count > 0,
+            "Should extract keywords from title + summary"
+        );
+        assert!(!topic_paths.is_empty(), "Should build topic paths");
+
+        // "learning" appears in both titles → should be a keyword
+        assert!(
+            topic_paths.contains_key("learning"),
+            "Expected 'learning' in topic paths, got: {:?}",
+            topic_paths.keys().collect::<Vec<_>>()
+        );
+    }
+
+    #[test]
+    fn test_build_topic_paths_weight_normalization() {
+        use crate::document::ReasoningIndexConfig;
+
+        let mut tree = crate::document::DocumentTree::new("Root", "");
+        let root = tree.root();
+        let _c1 = tree.add_child(root, "rust ownership", "rust borrowing rules");
+
+        let config = ReasoningIndexConfig::default();
+        let (topic_paths, _) = ReasoningIndexStage::build_topic_paths(&tree, &config);
+
+        // All weights should be in 0.0-1.0 range
+        for entries in topic_paths.values() {
+            for entry in entries {
+                assert!(
+                    entry.weight >= 0.0 && entry.weight <= 1.0,
+                    "Weight {} out of [0, 1] range",
+                    entry.weight
+                );
+            }
+        }
+    }
+
+    #[test]
+    fn test_build_topic_paths_respects_max_keyword_entries() {
+        use crate::document::ReasoningIndexConfig;
+
+        let mut tree = crate::document::DocumentTree::new("Root", "");
+        let root = tree.root();
+
+        // Create many children with unique keywords
+        for i in 0..50 {
+            let c = tree.add_child(root, &format!("Section {} Alpha Beta Gamma Delta", i), "");
+            if let Some(n) = tree.get_mut(c) {
+                n.summary = format!("keywords unique{} special{} terms{}", i, i, i);
+            }
+        }
+
+        let mut config = ReasoningIndexConfig::default();
+        config.max_keyword_entries = 5;
+        let (topic_paths, keyword_count) = ReasoningIndexStage::build_topic_paths(&tree, &config);
+
+        assert!(
+            keyword_count <= 5,
+            "Should respect max_keyword_entries, got {}",
+            keyword_count
+        );
+        assert_eq!(topic_paths.len(), keyword_count);
+    }
+
+    #[test]
+    fn test_build_section_map() {
+        let mut tree = crate::document::DocumentTree::new("Root", "");
+        let root = tree.root();
+        let c1 = tree.add_child(root, "Introduction", "content");
+        let c2 = tree.add_child(root, "Methods", "content");
+
+        // Set structure indices
+        if let Some(n) = tree.get_mut(c1) {
+            n.structure = "1".to_string();
+        }
+        if let Some(n) = tree.get_mut(c2) {
+            n.structure = "2".to_string();
+        }
+
+        let section_map = ReasoningIndexStage::build_section_map(&tree);
+
+        // Should index by title (lowercase) and structure index
+        assert!(section_map.contains_key("introduction"));
+        assert!(section_map.contains_key("methods"));
+        assert!(section_map.contains_key("1"));
+        assert!(section_map.contains_key("2"));
+        assert_eq!(section_map.len(), 4);
+    }
+
+    #[test]
+    fn test_build_summary_shortcut() {
+        let mut tree = crate::document::DocumentTree::new("Root", "");
+        let root = tree.root();
+        let c1 = tree.add_child(root, "S1", "summary 1");
+        let c2 = tree.add_child(root, "S2", "summary 2");
+
+        // Set root summary (not content — build_summary_shortcut reads summary field)
+        if let Some(n) = tree.get_mut(root) {
+            n.summary = "root summary text".to_string();
+        }
+        if let Some(n) = tree.get_mut(c1) {
+            n.summary = "first section summary".to_string();
+        }
+        if let Some(n) = tree.get_mut(c2) {
+            n.summary = "second section summary".to_string();
+        }
+
+        let shortcut = ReasoningIndexStage::build_summary_shortcut(&tree);
+        assert!(shortcut.is_some());
+
+        let sc = shortcut.unwrap();
+        assert_eq!(sc.root_node, root);
+        assert_eq!(sc.document_summary, "root summary text");
+        assert_eq!(sc.section_summaries.len(), 2);
+    }
+
+    #[test]
+    fn test_build_summary_shortcut_fallback_to_children() {
+        // Root has no summary → fallback to concatenating children
+        let mut tree = crate::document::DocumentTree::new("Root", "");
+        let root = tree.root();
+        let c1 = tree.add_child(root, "S1", "");
+        let c2 = tree.add_child(root, "S2", "");
+
+        if let Some(n) = tree.get_mut(c1) {
+            n.summary = "child summary 1".to_string();
+        }
+        if let Some(n) = tree.get_mut(c2) {
+            n.summary = "child summary 2".to_string();
+        }
+
+        let shortcut = ReasoningIndexStage::build_summary_shortcut(&tree);
+        assert!(shortcut.is_some());
+
+        let sc = shortcut.unwrap();
+        assert!(
+            sc.document_summary.contains("child summary 1"),
+            "Fallback should include child summaries"
+        );
+        assert!(sc.document_summary.contains("S1"));
+    }
 }
diff --git a/rust/src/index/stages/split.rs b/rust/src/index/stages/split.rs
index 2bb54d4a..245688b8 100644
--- a/rust/src/index/stages/split.rs
+++ b/rust/src/index/stages/split.rs
@@ -4,7 +4,7 @@
 //! Split stage - Break large leaf nodes into smaller ones.
 
 use std::time::Instant;
-use tracing::info;
+use tracing::{debug, info};
 
 use crate::document::{DocumentTree, NodeId};
 use crate::error::Result;
@@ -226,6 +226,7 @@ impl IndexStage for SplitStage {
             reads_tree: true,
             writes_tree: true,
             writes_reasoning_index: false,
+            writes_navigation_index: false,
             writes_description: false,
         }
     }
@@ -236,15 +237,22 @@ impl IndexStage for SplitStage {
         let tree = match ctx.tree.as_mut() {
             Some(t) => t,
             None => {
+                info!("[split] No tree, skipping");
                 return Ok(StageResult::success("split"));
             }
         };
 
         let config = &ctx.options.split;
         if !config.enabled {
+            debug!("[split] Disabled, skipping");
             return Ok(StageResult::success("split"));
         }
 
+        info!(
+            "[split] Starting: max_tokens_per_node={}",
+            config.max_tokens_per_node
+        );
+
         let node_count_before = tree.node_count();
         let split_count = Self::split_tree(tree, config);
         let node_count_after = tree.node_count();
@@ -254,7 +262,7 @@ impl IndexStage for SplitStage {
         ctx.metrics.nodes_merged += split_count;
 
         info!(
-            "Split {} oversized nodes ({} → {} total nodes) in {}ms",
+            "[split] Complete: {} nodes split ({} → {} total) in {}ms",
             split_count, node_count_before, node_count_after, duration
         );
 
diff --git a/rust/src/index/stages/validate.rs b/rust/src/index/stages/validate.rs
index f07c32c2..312ff18a 100644
--- a/rust/src/index/stages/validate.rs
+++ b/rust/src/index/stages/validate.rs
@@ -5,7 +5,7 @@
 
 use std::collections::HashSet;
 use std::time::Instant;
-use tracing::{info, warn};
+use tracing::{debug, info, warn};
 
 use crate::error::Result;
 
@@ -237,6 +237,7 @@ impl IndexStage for ValidateStage {
             reads_tree: true,
             writes_tree: false,
             writes_reasoning_index: false,
+            writes_navigation_index: false,
             writes_description: false,
         }
     }
@@ -244,6 +245,9 @@ impl IndexStage for ValidateStage {
     async fn execute(&mut self, ctx: &mut IndexContext) -> Result<StageResult> {
         let start = Instant::now();
 
+        let node_count = ctx.tree.as_ref().map(|t| t.node_count()).unwrap_or(0);
+        info!("[validate] Starting: {} nodes", node_count);
+
         let issues = self.validate_tree(ctx);
 
         let warnings = issues
@@ -263,11 +267,15 @@ impl IndexStage for ValidateStage {
             }
         }
 
+        if warnings == 0 && errors == 0 {
+            debug!("[validate] No issues found");
+        }
+
         let duration = start.elapsed().as_millis() as u64;
         ctx.metrics.record_validate(duration);
 
         info!(
-            "Validated tree: {} warnings, {} errors in {}ms",
+            "[validate] Complete: {} warnings, {} errors in {}ms",
             warnings, errors, duration
         );
 
diff --git a/rust/src/index/summary/strategy.rs b/rust/src/index/summary/strategy.rs
index fb4bb7b0..7937aa74 100644
--- a/rust/src/index/summary/strategy.rs
+++ b/rust/src/index/summary/strategy.rs
@@ -284,14 +284,22 @@ impl SummaryGenerator for LlmSummaryGenerator {
             Focus on the key information and facts presented. \
             Respond with only the summary, no additional text."
         } else {
-            // Non-leaf (branch) nodes: navigation-oriented — "what does this section cover"
-            "You are a document summarization assistant. \
-            Generate a concise overview (2-3 sentences) describing what topics and subtopics \
-            this section covers. This summary will be used as a navigation guide. \
-            Respond with only the summary, no additional text."
+            // Non-leaf (branch) nodes: navigation-oriented with structured output.
+            // Produces OVERVIEW, QUESTIONS, and TAGS sections that EnhanceStage parses.
+            "You are a document navigation assistant. \
+            Generate a structured overview of this section for navigation purposes. \
+            Respond in EXACTLY this format (one section per line):\n\
+            OVERVIEW: <2-3 sentence description of what topics this section covers>\n\
+            QUESTIONS: <comma-separated list of 3-5 typical questions this section can answer>\n\
+            TAGS: <comma-separated list of 2-4 topic keywords>"
         };
 
-        let user_prompt = format!("Title: {}\n\nContent:\n{}", title, content);
+        let user_prompt = if is_leaf {
+            format!("Title: {}\n\nContent:\n{}", title, content)
+        } else {
+            // For non-leaf nodes, include children info for better routing summaries
+            format!("Title: {}\n\nContent:\n{}", title, content)
+        };
 
         let summary = self
             .client
diff --git a/rust/src/lib.rs b/rust/src/lib.rs
index 6cc4f91c..9ddc9b35 100644
--- a/rust/src/lib.rs
+++ b/rust/src/lib.rs
@@ -40,6 +40,7 @@
 
 // ── Modules ──────────────────────────────────────────────────────────────────
 
+mod agent;
 mod client;
 mod config;
 mod document;
@@ -50,7 +51,10 @@ mod metrics;
 
 mod index;
 mod llm;
+mod query;
+mod rerank;
 mod retrieval;
+mod scoring;
 mod storage;
 mod utils;
 
@@ -58,8 +62,9 @@ mod utils;
 
 // Client
 pub use client::{
-    BuildError, DocumentFormat, DocumentInfo, Engine, EngineBuilder, FailedItem, IndexContext,
-    IndexItem, IndexMode, IndexOptions, IndexResult, QueryContext, QueryResult, QueryResultItem,
+    BuildError, Confidence, DocumentFormat, DocumentInfo, Engine, EngineBuilder, EvidenceItem,
+    FailedItem, IndexContext, IndexItem, IndexMode, IndexOptions, IndexResult, QueryContext,
+    QueryMetrics, QueryResult, QueryResultItem,
 };
 
 // Config
diff --git a/rust/src/llm/config.rs b/rust/src/llm/config.rs
index 429cd971..32685e36 100644
--- a/rust/src/llm/config.rs
+++ b/rust/src/llm/config.rs
@@ -36,6 +36,10 @@ pub struct LlmConfig {
     /// Retry configuration.
     #[serde(default)]
     pub retry: RetryConfig,
+
+    /// Per-request timeout. 0 means no timeout (wait indefinitely).
+    #[serde(default)]
+    pub request_timeout_secs: u64,
 }
 
 fn default_max_tokens() -> usize {
@@ -55,6 +59,7 @@ impl Default for LlmConfig {
             max_tokens: default_max_tokens(),
             temperature: default_temperature(),
             retry: RetryConfig::default(),
+            request_timeout_secs: 0,
         }
     }
 }
diff --git a/rust/src/llm/executor.rs b/rust/src/llm/executor.rs
index 77447ba4..409b474e 100644
--- a/rust/src/llm/executor.rs
+++ b/rust/src/llm/executor.rs
@@ -246,10 +246,30 @@ impl LlmExecutor {
                 "Executing LLM request"
             );
 
-            // Step 2: Execute the request
-            let result = self
-                .do_request(&current_model, system, user, max_tokens)
-                .await;
+            // Step 2: Execute the request (with optional timeout)
+            let request_future = self.do_request(&current_model, system, user, max_tokens);
+            let result = if self.config.request_timeout_secs > 0 {
+                let timeout = Duration::from_secs(self.config.request_timeout_secs);
+                match tokio::time::timeout(timeout, request_future).await {
+                    Ok(r) => r,
+                    Err(_) => {
+                        warn!(
+                            timeout_secs = self.config.request_timeout_secs,
+                            model = %current_model,
+                            "LLM request timed out"
+                        );
+                        if let Some(ref metrics) = self.metrics {
+                            metrics.record_llm_timeout();
+                        }
+                        Err(LlmError::Timeout(format!(
+                            "Request timed out after {}s",
+                            self.config.request_timeout_secs
+                        )))
+                    }
+                }
+            } else {
+                request_future.await
+            };
 
             match result {
                 Ok(response) => {
@@ -372,9 +392,10 @@ impl LlmExecutor {
         model: &str,
         system: &str,
         user: &str,
-        max_tokens: Option<u16>,
+        _max_tokens: Option<u16>,
     ) -> LlmResult<String> {
-        // Build request
+        // Build request — only set max_tokens when explicitly provided,
+        // letting the API use its own default otherwise.
         let request = CreateChatCompletionRequestArgs::default()
             .model(model)
             .messages([
@@ -382,10 +403,13 @@ impl LlmExecutor {
                 ChatCompletionRequestUserMessage::from(user).into(),
             ])
             .temperature(self.config.temperature)
-            .max_tokens(max_tokens.unwrap_or(self.config.max_tokens as u16))
             .build()
             .map_err(|e| LlmError::Request(format!("Failed to build request: {}", e)))?;
 
+        // if let Some(mt) = max_tokens {
+        //     request.max_tokens = Some(mt as u32);
+        // }
+
         info!(
             "LLM request → endpoint: {}, model: {}, system: {} chars, user: {} chars",
             self.config.endpoint,
@@ -412,11 +436,39 @@ impl LlmExecutor {
         let prompt_tokens = usage.map(|u| u.prompt_tokens).unwrap_or(0);
         let completion_tokens = usage.map(|u| u.completion_tokens).unwrap_or(0);
 
-        let content = response
-            .choices
-            .first()
-            .and_then(|choice| choice.message.content.clone())
-            .ok_or(LlmError::NoContent)?;
+        let first_choice = response.choices.first();
+
+        if first_choice.is_none() {
+            if let Some(ref metrics) = self.metrics {
+                metrics.record_llm_call(
+                    prompt_tokens as u64,
+                    completion_tokens as u64,
+                    request_elapsed.as_millis() as u64,
+                    false,
+                );
+            }
+            return Err(LlmError::NoContent);
+        }
+
+        let choice = first_choice.unwrap();
+        let content = choice.message.content.clone().unwrap_or_default();
+
+        if content.is_empty() {
+            let has_tool_calls = choice
+                .message
+                .tool_calls
+                .as_ref()
+                .map_or(false, |t| !t.is_empty());
+            let finish_reason = format!("{:?}", choice.finish_reason);
+            warn!(
+                elapsed_ms = request_elapsed.as_millis(),
+                prompt_tokens,
+                completion_tokens,
+                has_tool_calls,
+                finish_reason,
+                "LLM returned empty content field"
+            );
+        }
 
         if let Some(ref metrics) = self.metrics {
             metrics.record_llm_call(
@@ -427,13 +479,20 @@ impl LlmExecutor {
             );
         }
 
-        info!(
-            "LLM response ← {}ms, tokens: {} prompt + {} completion, content: {} chars",
-            request_elapsed.as_millis(),
-            prompt_tokens,
-            completion_tokens,
-            content.len()
-        );
+        if content.is_empty() {
+            warn!(
+                elapsed_ms = request_elapsed.as_millis(),
+                prompt_tokens, completion_tokens, "LLM returned empty response"
+            );
+        } else {
+            info!(
+                "LLM response ← {}ms, tokens: {} prompt + {} completion, content: {} chars",
+                request_elapsed.as_millis(),
+                prompt_tokens,
+                completion_tokens,
+                content.len()
+            );
+        }
 
         Ok(content)
     }
diff --git a/rust/src/llm/memo/mod.rs b/rust/src/llm/memo/mod.rs
index fff44e65..79c9ae78 100644
--- a/rust/src/llm/memo/mod.rs
+++ b/rust/src/llm/memo/mod.rs
@@ -11,4 +11,4 @@ mod store;
 mod types;
 
 pub use store::MemoStore;
-pub use types::{MemoKey, MemoOpType, MemoValue, PilotDecisionValue};
+pub use types::{MemoKey, MemoValue};
diff --git a/rust/src/llm/mod.rs b/rust/src/llm/mod.rs
index 215aba7f..bd65e58a 100644
--- a/rust/src/llm/mod.rs
+++ b/rust/src/llm/mod.rs
@@ -39,5 +39,4 @@ pub(crate) mod throttle;
 
 pub use client::LlmClient;
 pub use error::LlmResult;
-pub use executor::LlmExecutor;
 pub use pool::LlmPool;
diff --git a/rust/src/llm/pool.rs b/rust/src/llm/pool.rs
index 10d9041b..09bb4acd 100644
--- a/rust/src/llm/pool.rs
+++ b/rust/src/llm/pool.rs
@@ -65,6 +65,7 @@ impl LlmPool {
                 max_tokens: slot.max_tokens,
                 temperature: slot.temperature,
                 retry: retry.clone(),
+                request_timeout_secs: 0,
             }
         };
 
diff --git a/rust/src/metrics/index.rs b/rust/src/metrics/index.rs
index 4432e32f..3d1e5569 100644
--- a/rust/src/metrics/index.rs
+++ b/rust/src/metrics/index.rs
@@ -40,6 +40,18 @@ pub struct IndexMetrics {
     #[serde(default)]
     pub reasoning_index_time_ms: u64,
 
+    /// Navigation index build duration (ms).
+    #[serde(default)]
+    pub navigation_index_time_ms: u64,
+
+    /// Number of nav entries in navigation index.
+    #[serde(default)]
+    pub nav_entries_indexed: usize,
+
+    /// Number of child routes in navigation index.
+    #[serde(default)]
+    pub child_routes_indexed: usize,
+
     /// Number of topics indexed in reasoning index.
     #[serde(default)]
     pub topics_indexed: usize,
@@ -125,6 +137,18 @@ impl IndexMetrics {
         self.keywords_indexed = keywords;
     }
 
+    /// Record navigation index build time.
+    pub fn record_navigation_index(
+        &mut self,
+        duration_ms: u64,
+        nav_entries: usize,
+        child_routes: usize,
+    ) {
+        self.navigation_index_time_ms = duration_ms;
+        self.nav_entries_indexed = nav_entries;
+        self.child_routes_indexed = child_routes;
+    }
+
     /// Increment LLM calls.
     pub fn increment_llm_calls(&mut self) {
         self.llm_calls += 1;
@@ -169,6 +193,7 @@ impl IndexMetrics {
             + self.enhance_time_ms
             + self.enrich_time_ms
             + self.reasoning_index_time_ms
+            + self.navigation_index_time_ms
             + self.optimize_time_ms
     }
 }
diff --git a/rust/src/query/mod.rs b/rust/src/query/mod.rs
new file mode 100644
index 00000000..bbe6806d
--- /dev/null
+++ b/rust/src/query/mod.rs
@@ -0,0 +1,45 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! Query understanding and planning.
+//!
+//! Analyzes a user's raw query and produces a structured [`QueryPlan`]
+//! for downstream modules (Orchestrator, Worker).
+//!
+//! # Pipeline
+//!
+//! ```text
+//! raw query string
+//!   → extract keywords            (from scoring/bm25)
+//!   → LLM query understanding     (intent, concepts, complexity)
+//!   → QueryPlan
+//! ```
+//!
+//! LLM understanding is required — this is a pure reasoning engine.
+//! Errors are propagated, not silently degraded.
+
+mod types;
+mod understand;
+
+pub use types::{QueryIntent, QueryPlan};
+
+use crate::llm::LlmClient;
+use crate::scoring::bm25::extract_keywords;
+
+/// Query understanding pipeline.
+///
+/// Produces a [`QueryPlan`] from a raw query string via LLM analysis.
+pub struct QueryPipeline;
+
+impl QueryPipeline {
+    /// Analyze a query and produce a structured plan.
+    ///
+    /// 1. Extract keywords (zero-cost, no LLM)
+    /// 2. LLM deep understanding (intent, concepts, complexity, strategy)
+    ///
+    /// Errors propagate — the caller handles retries or failure.
+    pub async fn understand(query: &str, llm: &LlmClient) -> crate::error::Result<QueryPlan> {
+        let keywords = extract_keywords(query);
+        understand::understand(query, &keywords, llm).await
+    }
+}
diff --git a/rust/src/query/types.rs b/rust/src/query/types.rs
new file mode 100644
index 00000000..f8e025e8
--- /dev/null
+++ b/rust/src/query/types.rs
@@ -0,0 +1,114 @@
+// Copyright (c) 2026 vectorless devices
+// SPDX-License-Identifier: Apache-2.0
+
+//! Core types for query understanding.
+
+use serde::{Deserialize, Serialize};
+
+/// Query intent classification.
+#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Serialize, Deserialize)]
+pub enum QueryIntent {
+    /// Factoid: "What is the Q3 2024 revenue?"
+    Factual,
+    /// Analytical: "Compare market risk vs operational risk"
+    Analytical,
+    /// Navigation: "Find the section on compliance policy"
+    Navigational,
+    /// Summary: "Summarize the main points of this document"
+    Summary,
+}
+
+impl Default for QueryIntent {
+    fn default() -> Self {
+        Self::Factual
+    }
+}
+
+impl std::fmt::Display for QueryIntent {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        match self {
+            QueryIntent::Factual => write!(f, "factual"),
+            QueryIntent::Analytical => write!(f, "analytical"),
+            QueryIntent::Navigational => write!(f, "navigational"),
+            QueryIntent::Summary => write!(f, "summary"),
+        }
+    }
+}
+
+/// Query complexity estimation.
+#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Serialize, Deserialize)]
+pub enum Complexity {
+    /// Single keyword, simple factoid.
+    Simple,
+    /// Multi-concept, requires synthesis.
+    Moderate,
+    /// Cross-document, comparative, or multi-faceted.
+    Complex,
+}
+
+impl Default for Complexity {
+    fn default() -> Self {
+        Self::Simple
+    }
+}
+
+impl std::fmt::Display for Complexity {
+    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
+        match self {
+            Complexity::Simple => write!(f, "simple"),
+            Complexity::Moderate => write!(f, "moderate"),
+            Complexity::Complex => write!(f, "complex"),
+        }
+    }
+}
+
+/// A sub-query produced by decomposition.
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct SubQuery {
+    /// The sub-query text.
+    pub query: String,
+    /// Intent of this sub-query.
+    pub intent: QueryIntent,
+    /// Pre-identified target documents (if any).
+    pub target_docs: Option<Vec<String>>,
+}
+
+/// A structured query plan — the output of the query understanding pipeline.
+///
+/// Produced by `QueryPipeline::understand()`. Consumed by the Orchestrator
+/// and Worker agents for strategy selection.
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct QueryPlan {
+    /// The original raw query string.
+    pub original: String,
+    /// Detected intent.
+    pub intent: QueryIntent,
+    /// Extracted keywords.
+    pub keywords: Vec<String>,
+    /// Key concepts identified by LLM (distinct from keywords).
+    pub key_concepts: Vec<String>,
+    /// Strategy hint for navigation agents.
+    pub strategy_hint: String,
+    /// Estimated complexity.
+    pub complexity: Complexity,
+    /// Rewritten queries (produced by LLM for better matching).
+    pub rewritten: Vec<String>,
+    /// Decomposed sub-queries (for complex/multi-faceted queries).
+    pub sub_queries: Vec<SubQuery>,
+}
+
+impl QueryPlan {
+    /// LLM understanding failed — produce a minimal default plan.
+    pub fn default_for(query: &str, keywords: Vec<String>) -> Self {
+        Self {
+            original: query.to_string(),
+            intent: QueryIntent::Factual,
+            keywords,
+            key_concepts: Vec::new(),
+            strategy_hint: "focused".to_string(),
+            complexity: Complexity::Simple,
+            rewritten: Vec::new(),
+            sub_queries: Vec::new(),
+        }
+    }
+}
diff --git a/rust/src/query/understand.rs b/rust/src/query/understand.rs
new file mode 100644
index 00000000..66a18570
--- /dev/null
+++ b/rust/src/query/understand.rs
@@ -0,0 +1,201 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! LLM-driven query understanding.
+//!
+//! Uses an LLM call to analyze the query and produce a structured [`QueryPlan`].
+//! Falls back to keyword-only analysis on LLM failure.
+
+use serde::Deserialize;
+use tracing::{info, warn};
+
+use crate::llm::LlmClient;
+
+use super::types::{Complexity, QueryIntent, QueryPlan, SubQuery};
+
+/// Structured analysis returned by the LLM.
+#[derive(Deserialize)]
+struct QueryAnalysis {
+    intent: String,
+    key_concepts: Vec<String>,
+    strategy_hint: String,
+    complexity: String,
+    rewritten: Option<String>,
+    sub_queries: Vec<String>,
+}
+
+/// Use LLM to understand the query and produce a QueryPlan.
+///
+/// Propagates LLM errors — no silent degradation. The caller decides
+/// how to handle failure (retry, abort, etc.).
+pub async fn understand(
+    query: &str,
+    keywords: &[String],
+    llm: &LlmClient,
+) -> crate::error::Result<QueryPlan> {
+    let (system, user) = understand_prompt(query, keywords);
+    let response = llm.complete(&system, &user).await?;
+
+    if response.trim().is_empty() {
+        warn!("Query understanding: LLM returned empty response");
+        return Err(crate::error::Error::Config(
+            "Query understanding failed: LLM returned an empty response. \
+             Check your API key, model, and endpoint configuration."
+                .to_string(),
+        ));
+    }
+
+    let analysis = parse_analysis(&response).ok_or_else(|| {
+        warn!(
+            response = &response[..response.len().min(500)],
+            "Query understanding: failed to parse LLM response as JSON"
+        );
+        crate::error::Error::Config(format!(
+            "Query understanding returned unparseable response ({} chars): {}",
+            response.len(),
+            &response[..response.len().min(300)]
+        ))
+    })?;
+
+    info!(
+        intent = %analysis.intent,
+        complexity = %analysis.complexity,
+        concepts = analysis.key_concepts.len(),
+        "Query understanding complete"
+    );
+    Ok(analysis.into_plan(query, keywords))
+}
+
+/// Parse the LLM's JSON response into a QueryAnalysis.
+fn parse_analysis(response: &str) -> Option<QueryAnalysis> {
+    let trimmed = response.trim();
+    // Try to extract JSON from the response (LLM may wrap it in markdown)
+    let json_str = if trimmed.starts_with("```") {
+        // Strip markdown code fences
+        let without_start = trimmed
+            .trim_start_matches(|c| c == '`' || c == 'j' || c == 's' || c == 'o' || c == 'n');
+        let without_end = without_start.trim_end_matches(|c| c == '`');
+        without_end.trim()
+    } else {
+        trimmed
+    };
+
+    serde_json::from_str(json_str).ok()
+}
+
+impl QueryAnalysis {
+    fn into_plan(self, query: &str, keywords: &[String]) -> QueryPlan {
+        QueryPlan {
+            original: query.to_string(),
+            intent: parse_intent(&self.intent),
+            keywords: keywords.to_vec(),
+            key_concepts: self.key_concepts,
+            strategy_hint: self.strategy_hint,
+            complexity: parse_complexity(&self.complexity),
+            rewritten: self.rewritten.into_iter().collect(),
+            sub_queries: self
+                .sub_queries
+                .into_iter()
+                .map(|sq| SubQuery {
+                    query: sq,
+                    intent: QueryIntent::Factual,
+                    target_docs: None,
+                })
+                .collect(),
+        }
+    }
+}
+
+fn parse_intent(s: &str) -> QueryIntent {
+    match s.to_lowercase().as_str() {
+        "analytical" | "analysis" | "compare" | "comparison" => QueryIntent::Analytical,
+        "navigational" | "navigation" | "find" | "locate" => QueryIntent::Navigational,
+        "summary" | "summarize" | "overview" => QueryIntent::Summary,
+        _ => QueryIntent::Factual,
+    }
+}
+
+fn parse_complexity(s: &str) -> Complexity {
+    match s.to_lowercase().as_str() {
+        "complex" | "high" => Complexity::Complex,
+        "moderate" | "medium" => Complexity::Moderate,
+        _ => Complexity::Simple,
+    }
+}
+
+/// Build the LLM prompt for query understanding.
+fn understand_prompt(query: &str, keywords: &[String]) -> (String, String) {
+    let system = r#"You are a query analysis engine. Analyze the user's query and respond with a JSON object containing:
+
+- "intent": one of "factual", "analytical", "navigational", "summary"
+- "key_concepts": array of the main concepts/entities in the query (distinct from keywords)
+- "strategy_hint": one of "focused" (single-topic), "exploratory" (broad scan), "comparative" (cross-reference), or "summary" (aggregate)
+- "complexity": one of "simple", "moderate", "complex"
+- "rewritten": optional rewritten version of the query for better retrieval (null if not needed)
+- "sub_queries": array of sub-query strings if the query can be decomposed (empty array if not)
+
+Respond with ONLY the JSON object, no additional text."#;
+
+    let user = format!(
+        "Query: {}\nExtracted keywords: [{}]",
+        query,
+        keywords.join(", ")
+    );
+
+    (system.to_string(), user)
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn test_parse_intent() {
+        assert_eq!(parse_intent("factual"), QueryIntent::Factual);
+        assert_eq!(parse_intent("analytical"), QueryIntent::Analytical);
+        assert_eq!(parse_intent("analysis"), QueryIntent::Analytical);
+        assert_eq!(parse_intent("navigational"), QueryIntent::Navigational);
+        assert_eq!(parse_intent("summary"), QueryIntent::Summary);
+        assert_eq!(parse_intent("unknown"), QueryIntent::Factual);
+    }
+
+    #[test]
+    fn test_parse_complexity() {
+        assert_eq!(parse_complexity("simple"), Complexity::Simple);
+        assert_eq!(parse_complexity("moderate"), Complexity::Moderate);
+        assert_eq!(parse_complexity("complex"), Complexity::Complex);
+        assert_eq!(parse_complexity("high"), Complexity::Complex);
+        assert_eq!(parse_complexity("unknown"), Complexity::Simple);
+    }
+
+    #[test]
+    fn test_parse_analysis_json() {
+        let response = r#"{"intent":"factual","key_concepts":["revenue","Q3"],"strategy_hint":"focused","complexity":"simple","rewritten":null,"sub_queries":[]}"#;
+        let analysis = parse_analysis(response).unwrap();
+        assert_eq!(analysis.intent, "factual");
+        assert_eq!(analysis.key_concepts.len(), 2);
+        assert!(analysis.rewritten.is_none());
+    }
+
+    #[test]
+    fn test_parse_analysis_markdown_wrapped() {
+        let response = "```json\n{\"intent\":\"analytical\",\"key_concepts\":[\"risk\"],\"strategy_hint\":\"comparative\",\"complexity\":\"moderate\",\"rewritten\":\"compare risks\",\"sub_queries\":[]}\n```";
+        let analysis = parse_analysis(response).unwrap();
+        assert_eq!(analysis.intent, "analytical");
+    }
+
+    #[test]
+    fn test_parse_analysis_invalid() {
+        assert!(parse_analysis("not json").is_none());
+    }
+
+    #[test]
+    fn test_default_plan() {
+        let plan = QueryPlan::default_for("test query", vec!["test".to_string()]);
+        assert_eq!(plan.original, "test query");
+        assert_eq!(plan.intent, QueryIntent::Factual);
+        assert_eq!(plan.keywords.len(), 1);
+        assert!(plan.key_concepts.is_empty());
+        assert!(plan.sub_queries.is_empty());
+    }
+}
diff --git a/rust/src/rerank/dedup.rs b/rust/src/rerank/dedup.rs
new file mode 100644
index 00000000..9b30f75e
--- /dev/null
+++ b/rust/src/rerank/dedup.rs
@@ -0,0 +1,147 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! Evidence deduplication and quality filtering.
+
+use std::collections::HashSet;
+
+use crate::agent::Evidence;
+
+/// Minimum characters for an evidence item to be considered meaningful.
+const MIN_EVIDENCE_CHARS: usize = 50;
+
+/// Jaccard similarity threshold for content dedup.
+const SIMILARITY_THRESHOLD: f64 = 0.8;
+
+/// Filter low-quality and duplicate evidence.
+///
+/// Steps:
+/// 1. Drop evidence with no meaningful content (< MIN_EVIDENCE_CHARS)
+/// 2. Deduplicate by source overlap (same path in same doc)
+/// 3. Deduplicate by content similarity (Jaccard on token sets)
+pub fn dedup(evidence: &[Evidence]) -> Vec<Evidence> {
+    // Step 1: Quality filter
+    let quality: Vec<&Evidence> = evidence
+        .iter()
+        .filter(|e| e.content.len() >= MIN_EVIDENCE_CHARS)
+        .collect();
+
+    // Step 2: Deduplicate by source overlap
+    let mut seen_sources: HashSet<String> = HashSet::new();
+    let source_deduped: Vec<&Evidence> = quality
+        .into_iter()
+        .filter(|e| {
+            let key = format!("{}:{}", e.doc_name.as_deref().unwrap_or(""), e.source_path);
+            seen_sources.insert(key)
+        })
+        .collect();
+
+    // Step 3: Deduplicate by content similarity
+    let mut deduped: Vec<Evidence> = Vec::new();
+    for ev in source_deduped {
+        let tokens = tokenize(&ev.content);
+        let dominated = deduped
+            .iter()
+            .any(|existing| jaccard(&tokens, &tokenize(&existing.content)) >= SIMILARITY_THRESHOLD);
+        if !dominated {
+            deduped.push(ev.clone());
+        }
+    }
+
+    deduped
+}
+
+/// Tokenize text into a set of lowercase words.
+fn tokenize(text: &str) -> HashSet<String> {
+    text.to_lowercase()
+        .split_whitespace()
+        .map(|s| s.to_string())
+        .collect()
+}
+
+/// Compute Jaccard similarity between two sets.
+fn jaccard(a: &HashSet<String>, b: &HashSet<String>) -> f64 {
+    if a.is_empty() && b.is_empty() {
+        return 1.0;
+    }
+    let intersection = a.intersection(b).count() as f64;
+    let union = a.union(b).count() as f64;
+    intersection / union
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    fn make_evidence(title: &str, content: &str) -> Evidence {
+        Evidence {
+            source_path: format!("root/{}", title),
+            node_title: title.to_string(),
+            content: content.to_string(),
+            doc_name: Some("doc".to_string()),
+        }
+    }
+
+    #[test]
+    fn test_quality_filter() {
+        let evidence = vec![
+            make_evidence("A", "short"),         // < 50 chars, filtered
+            make_evidence("B", &"x".repeat(60)), // kept
+        ];
+        let result = dedup(&evidence);
+        assert_eq!(result.len(), 1);
+        assert_eq!(result[0].node_title, "B");
+    }
+
+    #[test]
+    fn test_source_dedup() {
+        let evidence = vec![
+            make_evidence(
+                "A",
+                &"content A with enough text to pass the quality filter threshold".to_string(),
+            ),
+            make_evidence(
+                "A",
+                &"different content A but same source path that is long enough".to_string(),
+            ),
+        ];
+        let result = dedup(&evidence);
+        assert_eq!(result.len(), 1);
+    }
+
+    #[test]
+    fn test_content_similarity_dedup() {
+        let base = "This is a piece of evidence about machine learning algorithms and their applications in real world scenarios".to_string();
+        let similar = "This is a piece of evidence about machine learning algorithms and their applications in real world".to_string();
+        let different =
+            "Completely unrelated content about quantum physics and particle accelerators at CERN"
+                .to_string();
+        let evidence = vec![
+            make_evidence("A", &base),
+            make_evidence("B", &similar), // high similarity, should be deduped
+            make_evidence("C", &different), // different, kept
+        ];
+        let result = dedup(&evidence);
+        assert!(result.len() >= 2); // at least A and C
+    }
+
+    #[test]
+    fn test_empty_input() {
+        let result = dedup(&[]);
+        assert!(result.is_empty());
+    }
+
+    #[test]
+    fn test_jaccard_identical() {
+        let a = tokenize("hello world foo");
+        let b = tokenize("hello world foo");
+        assert!((jaccard(&a, &b) - 1.0).abs() < 0.001);
+    }
+
+    #[test]
+    fn test_jaccard_disjoint() {
+        let a = tokenize("aaa bbb");
+        let b = tokenize("ccc ddd");
+        assert!((jaccard(&a, &b)).abs() < 0.001);
+    }
+}
diff --git a/rust/src/rerank/mod.rs b/rust/src/rerank/mod.rs
new file mode 100644
index 00000000..bc179ec3
--- /dev/null
+++ b/rust/src/rerank/mod.rs
@@ -0,0 +1,104 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! Result reranking — dedup + format.
+//!
+//! Post-processing pipeline that runs after the agent collects raw evidence:
+//!
+//! ```text
+//! agent (collect evidence)
+//!   → rerank::process()
+//!     → dedup (quality filter + dedup)
+//!     → format as answer (no LLM — return original text)
+//!   → Output with final answer
+//! ```
+//!
+//! This is a document retrieval engine. The answer IS the evidence.
+//! No LLM synthesis, no rewriting. Find what you find, return what you find.
+
+pub mod dedup;
+pub mod types;
+
+use tracing::info;
+
+use crate::agent::Evidence;
+use crate::query::QueryIntent;
+use types::RerankOutput;
+
+/// Process agent output through the rerank pipeline.
+///
+/// Deduplicates evidence, then returns the original text as the answer.
+/// No LLM calls — the Worker already retrieved the exact passages.
+pub async fn process(
+    _query: &str,
+    evidence: &[Evidence],
+    _multi_doc: bool,
+    intent: QueryIntent,
+    confidence: f32,
+) -> crate::error::Result<RerankOutput> {
+    let deduped = dedup::dedup(evidence);
+    if deduped.is_empty() {
+        info!("No evidence after dedup");
+        return Ok(RerankOutput {
+            answer: String::new(),
+            llm_calls: 0,
+            confidence: 0.0,
+        });
+    }
+
+    info!(
+        evidence = deduped.len(),
+        intent = %intent,
+        "Evidence after dedup"
+    );
+
+    let answer = match intent {
+        QueryIntent::Navigational => format_locations(&deduped),
+        _ => format_evidence_as_answer(&deduped),
+    };
+
+    info!(
+        evidence = deduped.len(),
+        answer_len = answer.len(),
+        confidence,
+        "Rerank complete"
+    );
+
+    Ok(RerankOutput {
+        answer,
+        llm_calls: 0,
+        confidence,
+    })
+}
+
+/// Format evidence as a location listing for navigational queries.
+fn format_locations(evidence: &[Evidence]) -> String {
+    if evidence.is_empty() {
+        return "No matching locations found.".to_string();
+    }
+    let mut result = "Found at:\n".to_string();
+    for e in evidence {
+        let doc = e.doc_name.as_deref().unwrap_or("unknown");
+        result.push_str(&format!(
+            "- **{}** in {} at {}\n",
+            e.node_title, doc, e.source_path
+        ));
+    }
+    result
+}
+
+/// Format collected evidence directly as the answer.
+fn format_evidence_as_answer(evidence: &[Evidence]) -> String {
+    evidence
+        .iter()
+        .map(|e| {
+            let doc = e.doc_name.as_deref().unwrap_or("");
+            if doc.is_empty() {
+                format!("[{}]\n{}", e.node_title, e.content)
+            } else {
+                format!("[{} — {}]\n{}", e.node_title, doc, e.content)
+            }
+        })
+        .collect::<Vec<_>>()
+        .join("\n\n")
+}
diff --git a/rust/src/rerank/types.rs b/rust/src/rerank/types.rs
new file mode 100644
index 00000000..4b42f351
--- /dev/null
+++ b/rust/src/rerank/types.rs
@@ -0,0 +1,14 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! Rerank result types.
+
+/// Output from the rerank pipeline.
+pub struct RerankOutput {
+    /// Synthesized answer.
+    pub answer: String,
+    /// Number of LLM calls used during synthesis/fusion.
+    pub llm_calls: u32,
+    /// Confidence score (0.0–1.0) — derived from LLM evaluate() result.
+    pub confidence: f32,
+}
diff --git a/rust/src/retrieval/cache/reasoning_cache.rs b/rust/src/retrieval/cache.rs
similarity index 100%
rename from rust/src/retrieval/cache/reasoning_cache.rs
rename to rust/src/retrieval/cache.rs
diff --git a/rust/src/retrieval/cache/hot_tracker.rs b/rust/src/retrieval/cache/hot_tracker.rs
deleted file mode 100644
index a284e065..00000000
--- a/rust/src/retrieval/cache/hot_tracker.rs
+++ /dev/null
@@ -1,185 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Hot node tracker for recording retrieval frequency.
-//!
-//! Thread-safe tracker that records which nodes are frequently retrieved.
-//! Nodes that exceed a configured hit-count threshold are marked as "hot",
-//! which can boost their scores in future retrieval operations.
-
-use std::collections::HashMap;
-use std::sync::RwLock;
-
-use crate::document::HotNodeEntry;
-use crate::document::NodeId;
-
-/// Thread-safe tracker for hot (frequently retrieved) nodes.
-pub struct HotNodeTracker {
-    inner: RwLock<HotNodeTrackerInner>,
-    hot_threshold: u32,
-}
-
-struct HotNodeTrackerInner {
-    hits: HashMap<NodeId, u32>,
-    scores: HashMap<NodeId, f32>,
-}
-
-impl HotNodeTracker {
-    /// Create a new tracker with the given hot threshold.
-    pub fn new(hot_threshold: u32) -> Self {
-        Self {
-            inner: RwLock::new(HotNodeTrackerInner {
-                hits: HashMap::new(),
-                scores: HashMap::new(),
-            }),
-            hot_threshold,
-        }
-    }
-
-    /// Record that a node was retrieved with a given score.
-    pub fn record_hit(&self, node_id: NodeId, score: f32) {
-        if let Ok(mut inner) = self.inner.write() {
-            let hits = *inner.hits.entry(node_id).or_insert(0) + 1;
-            inner.hits.insert(node_id, hits);
-
-            // Update running average score
-            let prev_avg = *inner.scores.entry(node_id).or_insert(0.0);
-            let new_avg = prev_avg + (score - prev_avg) / hits as f32;
-            inner.scores.insert(node_id, new_avg);
-        }
-    }
-
-    /// Record multiple hits at once.
-    pub fn record_hits(&self, hits: &[(NodeId, f32)]) {
-        for &(node_id, score) in hits {
-            self.record_hit(node_id, score);
-        }
-    }
-
-    /// Check if a node is considered "hot".
-    pub fn is_hot(&self, node_id: NodeId) -> bool {
-        self.inner
-            .read()
-            .map(|inner| inner.hits.get(&node_id).copied().unwrap_or(0) >= self.hot_threshold)
-            .unwrap_or(false)
-    }
-
-    /// Get the hit count for a node.
-    pub fn hit_count(&self, node_id: NodeId) -> u32 {
-        self.inner
-            .read()
-            .map(|inner| inner.hits.get(&node_id).copied().unwrap_or(0))
-            .unwrap_or(0)
-    }
-
-    /// Get all hot nodes with their stats.
-    pub fn hot_nodes(&self) -> Vec<(NodeId, u32, f32)> {
-        self.inner
-            .read()
-            .map(|inner| {
-                inner
-                    .hits
-                    .iter()
-                    .filter(|(_, count)| **count >= self.hot_threshold)
-                    .map(|(node_id, count)| {
-                        (
-                            *node_id,
-                            *count,
-                            inner.scores.get(node_id).copied().unwrap_or(0.0),
-                        )
-                    })
-                    .collect()
-            })
-            .unwrap_or_default()
-    }
-
-    /// Export hot node data into HotNodeEntry map for persistence.
-    pub fn export(&self) -> HashMap<NodeId, HotNodeEntry> {
-        self.inner
-            .read()
-            .map(|inner| {
-                inner
-                    .hits
-                    .iter()
-                    .map(|(node_id, hit_count)| {
-                        let avg_score = inner.scores.get(node_id).copied().unwrap_or(0.0);
-                        let is_hot = *hit_count >= self.hot_threshold;
-                        (
-                            *node_id,
-                            HotNodeEntry {
-                                hit_count: *hit_count,
-                                avg_score,
-                                is_hot,
-                            },
-                        )
-                    })
-                    .collect()
-            })
-            .unwrap_or_default()
-    }
-
-    /// Get the hot threshold.
-    pub fn hot_threshold(&self) -> u32 {
-        self.hot_threshold
-    }
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-
-    fn make_node_ids() -> (NodeId, NodeId, NodeId) {
-        let mut tree = crate::document::DocumentTree::new("Root", "content");
-        let a = tree.add_child(tree.root(), "A", "a");
-        let b = tree.add_child(tree.root(), "B", "b");
-        let c = tree.add_child(tree.root(), "C", "c");
-        (a, b, c)
-    }
-
-    #[test]
-    fn test_hot_tracker_basic() {
-        let tracker = HotNodeTracker::new(3);
-
-        let (node, _, _) = make_node_ids();
-        tracker.record_hit(node, 0.8);
-        tracker.record_hit(node, 0.9);
-        assert!(!tracker.is_hot(node));
-        assert_eq!(tracker.hit_count(node), 2);
-
-        tracker.record_hit(node, 0.7);
-        assert!(tracker.is_hot(node));
-        assert_eq!(tracker.hit_count(node), 3);
-    }
-
-    #[test]
-    fn test_hot_tracker_export() {
-        let tracker = HotNodeTracker::new(2);
-
-        let (node_a, node_b, _) = make_node_ids();
-
-        tracker.record_hit(node_a, 0.8);
-        tracker.record_hit(node_a, 0.9);
-        tracker.record_hit(node_b, 0.5);
-
-        let exported = tracker.export();
-        assert!(exported[&node_a].is_hot);
-        assert!(!exported[&node_b].is_hot);
-    }
-
-    #[test]
-    fn test_hot_tracker_multiple_hits() {
-        let tracker = HotNodeTracker::new(1);
-
-        let (node_a, node_b, node_c) = make_node_ids();
-
-        let hits = vec![(node_a, 0.9), (node_b, 0.8), (node_c, 0.7)];
-        tracker.record_hits(&hits);
-
-        assert!(tracker.is_hot(node_a));
-        assert!(tracker.is_hot(node_b));
-        assert!(tracker.is_hot(node_c));
-
-        let hot = tracker.hot_nodes();
-        assert_eq!(hot.len(), 3);
-    }
-}
diff --git a/rust/src/retrieval/cache/mod.rs b/rust/src/retrieval/cache/mod.rs
deleted file mode 100644
index 60eabb56..00000000
--- a/rust/src/retrieval/cache/mod.rs
+++ /dev/null
@@ -1,18 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Caching for retrieval operations.
-//!
-//! Three-tier reasoning cache:
-//! - **L1**: Exact query match — instant cache hit for repeated queries
-//! - **L2**: Path pattern cache — reuse navigation decisions across queries
-//! - **L3**: Strategy score cache — share keyword/BM25 scores across queries
-//!
-//! Legacy `PathCache` remains for backward compatibility.
-
-mod hot_tracker;
-mod path_cache;
-mod reasoning_cache;
-
-pub use hot_tracker::HotNodeTracker;
-pub use reasoning_cache::{CachedCandidate, ReasoningCache};
diff --git a/rust/src/retrieval/cache/path_cache.rs b/rust/src/retrieval/cache/path_cache.rs
deleted file mode 100644
index a394fa1f..00000000
--- a/rust/src/retrieval/cache/path_cache.rs
+++ /dev/null
@@ -1,277 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Path cache implementation for retrieval optimization.
-
-use std::collections::HashMap;
-use std::sync::{Arc, RwLock};
-use std::time::{Duration, Instant};
-
-use super::super::types::SearchPath;
-use crate::config::CacheConfig as AppConfig;
-use crate::document::NodeId;
-
-/// Cache entry for a search path.
-#[derive(Debug, Clone)]
-struct CacheEntry<T> {
-    /// The cached value.
-    value: T,
-    /// When this entry was created.
-    created_at: Instant,
-    /// Number of times this entry has been accessed.
-    access_count: usize,
-}
-
-impl<T> CacheEntry<T> {
-    fn new(value: T) -> Self {
-        Self {
-            value,
-            created_at: Instant::now(),
-            access_count: 0,
-        }
-    }
-
-    fn access(&mut self) -> &T {
-        self.access_count += 1;
-        &self.value
-    }
-}
-
-/// Configuration for the path cache.
-#[derive(Debug, Clone)]
-pub struct CacheConfig {
-    /// Maximum number of entries.
-    pub max_entries: usize,
-    /// Time-to-live for entries.
-    pub ttl: Duration,
-    /// Whether to use LRU eviction.
-    pub use_lru: bool,
-}
-
-impl Default for CacheConfig {
-    fn default() -> Self {
-        Self::from_app_config(&AppConfig::default())
-    }
-}
-
-impl CacheConfig {
-    /// Create from application config.
-    pub fn from_app_config(config: &AppConfig) -> Self {
-        Self {
-            max_entries: config.max_entries,
-            ttl: Duration::from_secs(config.ttl_secs),
-            use_lru: true,
-        }
-    }
-}
-
-/// Path cache for retrieval optimization.
-///
-/// Caches search paths and node scores to avoid redundant
-/// computation for similar queries.
-pub struct PathCache {
-    /// Cached paths by query hash.
-    paths: Arc<RwLock<HashMap<u64, CacheEntry<Vec<SearchPath>>>>>,
-    /// Cached node scores.
-    scores: Arc<RwLock<HashMap<(u64, NodeId), CacheEntry<f32>>>>,
-    /// Configuration.
-    config: CacheConfig,
-}
-
-impl PathCache {
-    /// Create a new path cache.
-    pub fn new() -> Self {
-        Self {
-            paths: Arc::new(RwLock::new(HashMap::new())),
-            scores: Arc::new(RwLock::new(HashMap::new())),
-            config: CacheConfig::default(),
-        }
-    }
-
-    /// Create a path cache with custom config.
-    pub fn with_config(config: CacheConfig) -> Self {
-        Self {
-            paths: Arc::new(RwLock::new(HashMap::new())),
-            scores: Arc::new(RwLock::new(HashMap::new())),
-            config,
-        }
-    }
-
-    /// Create from application config.
-    pub fn from_app_config(config: &AppConfig) -> Self {
-        Self::with_config(CacheConfig::from_app_config(config))
-    }
-
-    /// Hash a query string.
-    fn hash_query(query: &str) -> u64 {
-        use std::collections::hash_map::DefaultHasher;
-        use std::hash::{Hash, Hasher};
-
-        let mut hasher = DefaultHasher::new();
-        query.to_lowercase().hash(&mut hasher);
-        hasher.finish()
-    }
-
-    /// Get cached paths for a query.
-    pub fn get_paths(&self, query: &str) -> Option<Vec<SearchPath>> {
-        let hash = Self::hash_query(query);
-        let mut paths = self.paths.write().ok()?;
-
-        if let Some(entry) = paths.get_mut(&hash) {
-            // Check TTL
-            if entry.created_at.elapsed() > self.config.ttl {
-                paths.remove(&hash);
-                return None;
-            }
-            return Some(entry.access().clone());
-        }
-        None
-    }
-
-    /// Store paths for a query.
-    pub fn store_paths(&self, query: &str, paths: Vec<SearchPath>) {
-        let hash = Self::hash_query(query);
-
-        if let Ok(mut cache) = self.paths.write() {
-            // Evict if at capacity
-            if cache.len() >= self.config.max_entries {
-                self.evict(&mut cache);
-            }
-            cache.insert(hash, CacheEntry::new(paths));
-        }
-    }
-
-    /// Get cached score for a node.
-    pub fn get_score(&self, query: &str, node_id: NodeId) -> Option<f32> {
-        let hash = Self::hash_query(query);
-        let mut scores = self.scores.write().ok()?;
-
-        let key = (hash, node_id);
-        if let Some(entry) = scores.get_mut(&key) {
-            // Check TTL
-            if entry.created_at.elapsed() > self.config.ttl {
-                scores.remove(&key);
-                return None;
-            }
-            return Some(*entry.access());
-        }
-        None
-    }
-
-    /// Store a node score.
-    pub fn store_score(&self, query: &str, node_id: NodeId, score: f32) {
-        let hash = Self::hash_query(query);
-
-        if let Ok(mut cache) = self.scores.write() {
-            let key = (hash, node_id);
-
-            // Evict if at capacity
-            if cache.len() >= self.config.max_entries {
-                self.evict_scores(&mut cache);
-            }
-            cache.insert(key, CacheEntry::new(score));
-        }
-    }
-
-    /// Evict entries using LRU or random strategy.
-    fn evict<K, V>(&self, cache: &mut HashMap<K, CacheEntry<V>>)
-    where
-        K: std::hash::Hash + Eq + Clone,
-    {
-        if self.config.use_lru {
-            // Find entry with lowest access count
-            if let Some((min_key, _)) = cache.iter().min_by_key(|(_, e)| e.access_count) {
-                let key = min_key.clone();
-                cache.remove(&key);
-            }
-        } else {
-            // Remove oldest entry
-            if let Some((oldest_key, _)) = cache.iter().min_by_key(|(_, e)| e.created_at) {
-                let key = oldest_key.clone();
-                cache.remove(&key);
-            }
-        }
-    }
-
-    fn evict_scores<K, V>(&self, cache: &mut HashMap<K, CacheEntry<V>>)
-    where
-        K: std::hash::Hash + Eq + Clone,
-    {
-        self.evict(cache)
-    }
-
-    /// Clear all cached data.
-    pub fn clear(&self) {
-        if let Ok(mut paths) = self.paths.write() {
-            paths.clear();
-        }
-        if let Ok(mut scores) = self.scores.write() {
-            scores.clear();
-        }
-    }
-
-    /// Get cache statistics.
-    pub fn stats(&self) -> CacheStats {
-        let path_count = self.paths.read().map(|p| p.len()).unwrap_or(0);
-        let score_count = self.scores.read().map(|s| s.len()).unwrap_or(0);
-
-        CacheStats {
-            path_entries: path_count,
-            score_entries: score_count,
-            max_entries: self.config.max_entries,
-        }
-    }
-}
-
-impl Default for PathCache {
-    fn default() -> Self {
-        Self::new()
-    }
-}
-
-/// Cache statistics.
-#[derive(Debug, Clone)]
-pub struct CacheStats {
-    /// Number of cached paths.
-    pub path_entries: usize,
-    /// Number of cached scores.
-    pub score_entries: usize,
-    /// Maximum entries allowed.
-    pub max_entries: usize,
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-
-    #[test]
-    fn test_cache_paths() {
-        let cache = PathCache::new();
-
-        let arena = &mut indextree::Arena::new();
-        let path = SearchPath::from_node(NodeId(arena.new_node(0)), 0.8);
-        let paths = vec![path];
-
-        cache.store_paths("test query", paths.clone());
-
-        let cached = cache.get_paths("test query");
-        assert!(cached.is_some());
-        assert_eq!(cached.unwrap().len(), 1);
-    }
-
-    #[test]
-    fn test_cache_case_insensitive() {
-        let cache = PathCache::new();
-
-        let arena = &mut indextree::Arena::new();
-        let path = SearchPath::from_node(NodeId(arena.new_node(0)), 0.8);
-
-        let paths = vec![path];
-
-        cache.store_paths("Test Query", paths);
-
-        // Should find with different case
-        assert!(cache.get_paths("test query").is_some());
-        assert!(cache.get_paths("TEST QUERY").is_some());
-    }
-}
diff --git a/rust/src/retrieval/complexity/detector.rs b/rust/src/retrieval/complexity/detector.rs
deleted file mode 100644
index b93f2795..00000000
--- a/rust/src/retrieval/complexity/detector.rs
+++ /dev/null
@@ -1,328 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Query complexity detector implementation.
-//!
-//! Uses Pilot's LLM client for accurate complexity classification when available.
-//! Falls back to heuristic rules (keyword + word count) when no LLM client.
-
-use std::collections::HashSet;
-
-use super::QueryComplexity;
-use crate::llm::memo::{MemoKey, MemoOpType, MemoStore, MemoValue};
-use crate::utils::fingerprint::Fingerprint;
-
-/// Query complexity detector.
-///
-/// Uses LLM for classification when available; falls back to heuristic rules.
-pub struct ComplexityDetector {
-    /// Optional LLM client for LLM-based detection.
-    llm_client: Option<crate::llm::LlmClient>,
-    /// Memo store for caching complexity detection results.
-    memo_store: Option<MemoStore>,
-}
-
-impl ComplexityDetector {
-    /// Create a new complexity detector (heuristic only).
-    pub fn new() -> Self {
-        Self {
-            llm_client: None,
-            memo_store: None,
-        }
-    }
-
-    /// Create with LLM client for accurate detection.
-    pub fn with_llm_client(client: crate::llm::LlmClient) -> Self {
-        Self {
-            llm_client: Some(client),
-            memo_store: None,
-        }
-    }
-
-    /// Add memo store for caching complexity detection results.
-    pub fn with_memo_store(mut self, store: MemoStore) -> Self {
-        self.memo_store = Some(store);
-        self
-    }
-
-    /// Detect the complexity of a query.
-    ///
-    /// Uses LLM when available; falls back to heuristic rules.
-    pub async fn detect(&self, query: &str) -> QueryComplexity {
-        // Check memo cache
-        if let Some(ref store) = self.memo_store {
-            let cache_key = Self::build_cache_key(query);
-            if let Some(cached) = store.get(&cache_key) {
-                if let Some(complexity) = Self::deserialize_complexity(&cached) {
-                    return complexity;
-                }
-            }
-        }
-
-        let result = if let Some(ref client) = self.llm_client {
-            if let Some(complexity) = crate::retrieval::pilot::detect_with_llm(client, query).await
-            {
-                complexity
-            } else {
-                tracing::warn!("LLM complexity detection failed, falling back to heuristic");
-                self.detect_heuristic(query)
-            }
-        } else {
-            self.detect_heuristic(query)
-        };
-
-        // Cache the result
-        if let Some(ref store) = self.memo_store {
-            let cache_key = Self::build_cache_key(query);
-            store.put_with_tokens(
-                cache_key,
-                MemoValue::Text(format!("{:?}", result)),
-                (query.len() / 4) as u64,
-            );
-        }
-
-        result
-    }
-
-    /// Build a cache key for complexity detection.
-    fn build_cache_key(query: &str) -> MemoKey {
-        let fp = Fingerprint::from_str(query);
-        MemoKey {
-            op_type: MemoOpType::ComplexityDetection,
-            input_fp: fp,
-            model_id: None,
-            version: 1,
-            context_fp: Fingerprint::zero(),
-        }
-    }
-
-    /// Deserialize a QueryComplexity from a MemoValue.
-    fn deserialize_complexity(value: &MemoValue) -> Option<QueryComplexity> {
-        match value {
-            MemoValue::Text(s) => match s.as_str() {
-                "Simple" => Some(QueryComplexity::Simple),
-                "Medium" => Some(QueryComplexity::Medium),
-                "Complex" => Some(QueryComplexity::Complex),
-                _ => None,
-            },
-            _ => None,
-        }
-    }
-
-    /// Heuristic-based fallback: keyword matching + word count.
-    fn detect_heuristic(&self, query: &str) -> QueryComplexity {
-        let query_lower = query.to_lowercase();
-        let word_count = estimate_word_count(query);
-
-        // Complex indicators (English + Chinese)
-        let complex_indicators = [
-            "compare",
-            "contrast",
-            "analyze",
-            "evaluate",
-            "synthesize",
-            "explain why",
-            "how does",
-            "relationship between",
-            "cause and effect",
-            "对比",
-            "分析",
-            "评估",
-            "综合",
-            "为什么",
-            "原因",
-            "关系",
-            "影响",
-            "区别",
-            "异同",
-        ];
-
-        for indicator in &complex_indicators {
-            if query_lower.contains(indicator) {
-                return QueryComplexity::Complex;
-            }
-        }
-
-        // Simple indicators
-        let simple_indicators = [
-            "what is",
-            "define",
-            "list",
-            "who",
-            "when",
-            "where",
-            "什么是",
-            "定义",
-            "列表",
-            "谁",
-            "何时",
-            "哪里",
-            "在哪",
-        ];
-
-        for indicator in &simple_indicators {
-            if query_lower.contains(indicator) && word_count <= 15 {
-                return QueryComplexity::Simple;
-            }
-        }
-
-        // Multiple questions
-        let question_marks = query.matches('?').count() + query.matches('？').count();
-        if question_marks > 1 {
-            return QueryComplexity::Complex;
-        }
-
-        // Word count classification
-        if word_count <= 5 {
-            QueryComplexity::Simple
-        } else if word_count <= 15 {
-            QueryComplexity::Medium
-        } else {
-            QueryComplexity::Complex
-        }
-    }
-
-    /// Get complexity score (0.0 - 1.0).
-    pub fn complexity_score(&self, complexity: QueryComplexity) -> f32 {
-        match complexity {
-            QueryComplexity::Simple => 0.2,
-            QueryComplexity::Medium => 0.5,
-            QueryComplexity::Complex => 0.8,
-        }
-    }
-
-    /// Analyze query features (heuristic only, no LLM call).
-    pub fn analyze(&self, query: &str) -> QueryAnalysis {
-        let words: Vec<&str> = query.split_whitespace().collect();
-        let unique_words: HashSet<&str> = words.iter().copied().collect();
-
-        QueryAnalysis {
-            word_count: words.len(),
-            unique_word_ratio: if words.is_empty() {
-                0.0
-            } else {
-                unique_words.len() as f32 / words.len() as f32
-            },
-            has_question_mark: query.contains('?') || query.contains('？'),
-            question_count: query.matches('?').count() + query.matches('？').count(),
-            complexity: self.detect_heuristic(query),
-            complexity_score: self.complexity_score(self.detect_heuristic(query)),
-        }
-    }
-}
-
-impl Default for ComplexityDetector {
-    fn default() -> Self {
-        Self::new()
-    }
-}
-
-/// Estimate word count, handling both CJK and Latin text.
-fn estimate_word_count(text: &str) -> usize {
-    let mut count = 0usize;
-    let mut in_latin_word = false;
-
-    for ch in text.chars() {
-        if ch.is_whitespace() {
-            if in_latin_word {
-                count += 1;
-                in_latin_word = false;
-            }
-        } else if ch.is_ascii_alphanumeric() {
-            in_latin_word = true;
-        } else if is_cjk_char(ch) {
-            if in_latin_word {
-                count += 1;
-                in_latin_word = false;
-            }
-            count += 1;
-        } else {
-            if in_latin_word {
-                count += 1;
-                in_latin_word = false;
-            }
-        }
-    }
-    if in_latin_word {
-        count += 1;
-    }
-    count
-}
-
-/// Check if a character is CJK (Chinese/Japanese/Korean).
-fn is_cjk_char(ch: char) -> bool {
-    let cp = ch as u32;
-    (0x4E00..=0x9FFF).contains(&cp)
-        || (0x3400..=0x4DBF).contains(&cp)
-        || (0x20000..=0x2A6DF).contains(&cp)
-        || (0x2A700..=0x2B73F).contains(&cp)
-        || (0xF900..=0xFAFF).contains(&cp)
-        || (0x2F800..=0x2FA1F).contains(&cp)
-        || (0x3000..=0x303F).contains(&cp)
-        || (0x3040..=0x309F).contains(&cp)
-        || (0x30A0..=0x30FF).contains(&cp)
-}
-
-/// Analysis result for a query.
-#[derive(Debug, Clone)]
-pub struct QueryAnalysis {
-    /// Total word count.
-    pub word_count: usize,
-    /// Ratio of unique words.
-    pub unique_word_ratio: f32,
-    /// Whether query contains question mark.
-    pub has_question_mark: bool,
-    /// Number of question marks.
-    pub question_count: usize,
-    /// Detected complexity level.
-    pub complexity: QueryComplexity,
-    /// Complexity score (0.0 - 1.0).
-    pub complexity_score: f32,
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-
-    #[test]
-    fn test_simple_queries() {
-        let detector = ComplexityDetector::new();
-
-        assert_eq!(
-            detector.detect_heuristic("What is Rust?"),
-            QueryComplexity::Simple
-        );
-        assert_eq!(
-            detector.detect_heuristic("Define async"),
-            QueryComplexity::Simple
-        );
-        assert_eq!(
-            detector.detect_heuristic("什么是向量检索"),
-            QueryComplexity::Simple
-        );
-    }
-
-    #[test]
-    fn test_complex_queries() {
-        let detector = ComplexityDetector::new();
-
-        assert_eq!(
-            detector.detect_heuristic(
-                "Compare and contrast the different approaches to async programming"
-            ),
-            QueryComplexity::Complex
-        );
-        assert_eq!(
-            detector.detect_heuristic("What is the relationship between ownership and borrowing?"),
-            QueryComplexity::Complex
-        );
-        assert_eq!(
-            detector.detect_heuristic("对比A和B的区别"),
-            QueryComplexity::Complex
-        );
-        assert_eq!(
-            detector.detect_heuristic("分析索引和检索的关系"),
-            QueryComplexity::Complex
-        );
-    }
-}
diff --git a/rust/src/retrieval/complexity/mod.rs b/rust/src/retrieval/complexity/mod.rs
deleted file mode 100644
index 628a1896..00000000
--- a/rust/src/retrieval/complexity/mod.rs
+++ /dev/null
@@ -1,11 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Query complexity detection.
-//!
-//! Determines the complexity level of a query for adaptive strategy selection.
-
-mod detector;
-
-pub use super::types::QueryComplexity;
-pub use detector::ComplexityDetector;
diff --git a/rust/src/retrieval/content/aggregator.rs b/rust/src/retrieval/content/aggregator.rs
deleted file mode 100644
index 7e1ace2d..00000000
--- a/rust/src/retrieval/content/aggregator.rs
+++ /dev/null
@@ -1,403 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Main content aggregator combining all components.
-//!
-//! This module provides the main [`ContentAggregator`] that orchestrates
-//! scoring, budget allocation, and structure building.
-
-use std::collections::HashMap;
-
-use tracing::{debug, info};
-
-use crate::document::{DocumentTree, NodeId};
-use crate::utils::estimate_tokens;
-
-use super::budget::{AllocationStrategy, BudgetAllocator};
-use super::builder::{ContentMetadata, StructureBuilder};
-use super::config::ContentAggregatorConfig;
-use super::scorer::{ContentChunk, RelevanceScorer, ScoringContext};
-
-/// Candidate node from retrieval.
-#[derive(Debug, Clone)]
-pub struct CandidateNode {
-    /// Node ID.
-    pub node_id: NodeId,
-    /// Relevance score from search.
-    pub score: f32,
-    /// Depth in tree.
-    pub depth: usize,
-}
-
-impl CandidateNode {
-    /// Create a new candidate.
-    #[must_use]
-    pub fn new(node_id: NodeId, score: f32, depth: usize) -> Self {
-        Self {
-            node_id,
-            score,
-            depth,
-        }
-    }
-}
-
-/// Result of content aggregation.
-#[derive(Debug, Clone)]
-pub struct AggregationResult {
-    /// Aggregated content string.
-    pub content: String,
-    /// Total tokens used.
-    pub tokens_used: usize,
-    /// Number of nodes included.
-    pub nodes_included: usize,
-    /// Average relevance score.
-    pub avg_score: f32,
-    /// Whether content was truncated due to budget.
-    pub was_truncated: bool,
-    /// Metadata about the aggregation.
-    pub metadata: ContentMetadata,
-}
-
-impl AggregationResult {
-    /// Check if result is empty.
-    #[must_use]
-    pub fn is_empty(&self) -> bool {
-        self.content.is_empty()
-    }
-}
-
-/// Content aggregator combining scoring, allocation, and building.
-#[derive(Debug)]
-pub struct ContentAggregator {
-    /// Configuration.
-    config: ContentAggregatorConfig,
-}
-
-impl ContentAggregator {
-    /// Create a new content aggregator.
-    #[must_use]
-    pub fn new(config: ContentAggregatorConfig) -> Self {
-        Self { config }
-    }
-
-    /// Create aggregator with default configuration.
-    #[must_use]
-    pub fn with_defaults() -> Self {
-        Self::new(ContentAggregatorConfig::default())
-    }
-
-    /// Aggregate content from candidate nodes.
-    ///
-    /// # Arguments
-    ///
-    /// * `candidates` - Candidate nodes from retrieval
-    /// * `tree` - Document tree
-    /// * `query` - Query string for relevance scoring
-    ///
-    /// # Returns
-    ///
-    /// Aggregated content within token budget.
-    #[must_use]
-    pub fn aggregate(
-        &self,
-        candidates: &[CandidateNode],
-        tree: &DocumentTree,
-        query: &str,
-    ) -> AggregationResult {
-        let _start = std::time::Instant::now();
-
-        // Step 1: Collect all content chunks from candidates and their descendants
-        let chunks = self.collect_chunks(candidates, tree);
-        debug!(
-            "Collected {} content chunks from {} candidates",
-            chunks.len(),
-            candidates.len()
-        );
-
-        if chunks.is_empty() {
-            return AggregationResult {
-                content: String::new(),
-                tokens_used: 0,
-                nodes_included: 0,
-                avg_score: 0.0,
-                was_truncated: false,
-                metadata: ContentMetadata::default(),
-            };
-        }
-
-        // Step 2: Score all chunks for relevance
-        let scorer = RelevanceScorer::new(query, self.config.scoring_strategy);
-        let scoring_ctx = self.build_scoring_context(&chunks);
-        let scored = scorer.score_chunks(&chunks, &scoring_ctx);
-
-        // Filter by minimum score
-        let filtered: Vec<_> = scored
-            .into_iter()
-            .filter(|r| r.score >= self.config.min_relevance_score)
-            .collect();
-
-        debug!(
-            "Scored {} chunks, {} passed threshold {:.2}",
-            chunks.len(),
-            filtered.len(),
-            self.config.min_relevance_score
-        );
-
-        if filtered.is_empty() {
-            // Fall back to returning best candidate content
-            return self.fallback_result(candidates, tree);
-        }
-
-        // Step 3: Allocate token budget
-        let max_depth = filtered.iter().map(|r| r.chunk.depth).max().unwrap_or(0);
-        let strategy = self.get_allocation_strategy();
-        let allocator = BudgetAllocator::new(self.config.token_budget).with_strategy(strategy);
-
-        let allocation = allocator.allocate(filtered, max_depth);
-
-        info!(
-            "Allocated {} tokens to {} items (strategy: {:?})",
-            allocation.tokens_used,
-            allocation.selected.len(),
-            self.config.scoring_strategy
-        );
-
-        // Step 4: Build structured output
-        let builder =
-            StructureBuilder::from_config(self.config.output_format, self.config.include_scores);
-
-        let structured = builder.build(allocation.selected.clone(), tree);
-
-        // Build result
-        let was_truncated = allocation.selected.iter().any(|s| s.is_truncated());
-
-        AggregationResult {
-            content: structured.content,
-            tokens_used: allocation.tokens_used,
-            nodes_included: allocation.selected.len(),
-            avg_score: allocation.stats.avg_score,
-            was_truncated,
-            metadata: structured.metadata,
-        }
-    }
-
-    /// Collect content chunks from candidates and descendants.
-    fn collect_chunks(
-        &self,
-        candidates: &[CandidateNode],
-        tree: &DocumentTree,
-    ) -> Vec<ContentChunk> {
-        let mut chunks = Vec::new();
-        let mut visited: HashMap<NodeId, bool> = HashMap::new();
-
-        for candidate in candidates {
-            // Add candidate's own content
-            if let Some(node) = tree.get(candidate.node_id) {
-                if !node.content.is_empty() {
-                    chunks.push(ContentChunk::new(
-                        candidate.node_id,
-                        node.title.clone(),
-                        node.content.clone(),
-                        candidate.depth,
-                    ));
-                    visited.insert(candidate.node_id, true);
-                }
-
-                // Collect leaf descendants
-                self.collect_descendant_chunks(
-                    candidate.node_id,
-                    tree,
-                    candidate.depth,
-                    &mut chunks,
-                    &mut visited,
-                );
-            }
-        }
-
-        chunks
-    }
-
-    /// Collect chunks from descendant nodes.
-    fn collect_descendant_chunks(
-        &self,
-        parent_id: NodeId,
-        tree: &DocumentTree,
-        parent_depth: usize,
-        chunks: &mut Vec<ContentChunk>,
-        visited: &mut HashMap<NodeId, bool>,
-    ) {
-        let children = tree.children(parent_id);
-
-        for child_id in children {
-            if visited.contains_key(&child_id) {
-                continue;
-            }
-            visited.insert(child_id, true);
-
-            if let Some(node) = tree.get(child_id) {
-                let child_depth = parent_depth + 1;
-
-                if tree.is_leaf(child_id) {
-                    // Leaf node - add its content
-                    if !node.content.is_empty() {
-                        chunks.push(ContentChunk::new(
-                            child_id,
-                            node.title.clone(),
-                            node.content.clone(),
-                            child_depth,
-                        ));
-                    }
-                } else {
-                    // Non-leaf - recurse
-                    self.collect_descendant_chunks(child_id, tree, child_depth, chunks, visited);
-                }
-            }
-        }
-    }
-
-    /// Build scoring context from chunks.
-    fn build_scoring_context(&self, chunks: &[ContentChunk]) -> ScoringContext {
-        let total_len: usize = chunks.iter().map(|c| c.content.len()).sum();
-        let avg_len = if chunks.is_empty() {
-            100.0
-        } else {
-            total_len as f32 / chunks.len() as f32
-        };
-
-        // Build document frequency map
-        let mut doc_freq: HashMap<String, usize> = HashMap::new();
-        for chunk in chunks {
-            let mut seen_in_doc = std::collections::HashSet::new();
-            for word in chunk.content.to_lowercase().split_whitespace() {
-                if !seen_in_doc.contains(word) {
-                    *doc_freq.entry(word.to_string()).or_insert(0) += 1;
-                    seen_in_doc.insert(word);
-                }
-            }
-        }
-
-        ScoringContext {
-            avg_doc_len: avg_len,
-            doc_count: chunks.len(),
-            doc_freq,
-            parent_score: None,
-        }
-    }
-
-    /// Get allocation strategy from config.
-    fn get_allocation_strategy(&self) -> AllocationStrategy {
-        AllocationStrategy::Hierarchical {
-            min_per_level: self.config.hierarchical_min_per_level,
-        }
-    }
-
-    /// Fallback result when no content passes threshold.
-    fn fallback_result(
-        &self,
-        candidates: &[CandidateNode],
-        tree: &DocumentTree,
-    ) -> AggregationResult {
-        // Return best candidate's content
-        if let Some(best) = candidates.first() {
-            if let Some(node) = tree.get(best.node_id) {
-                let content = if !node.content.is_empty() {
-                    node.content.clone()
-                } else if !node.summary.is_empty() {
-                    node.summary.clone()
-                } else {
-                    String::new()
-                };
-
-                let tokens = estimate_tokens(&content);
-
-                return AggregationResult {
-                    content: format!("## {}\n\n{}", node.title, content),
-                    tokens_used: tokens,
-                    nodes_included: 1,
-                    avg_score: best.score,
-                    was_truncated: false,
-                    metadata: ContentMetadata {
-                        total_tokens: tokens,
-                        node_count: 1,
-                        avg_score: best.score,
-                        max_depth: best.depth,
-                    },
-                };
-            }
-        }
-
-        AggregationResult {
-            content: String::new(),
-            tokens_used: 0,
-            nodes_included: 0,
-            avg_score: 0.0,
-            was_truncated: false,
-            metadata: ContentMetadata::default(),
-        }
-    }
-}
-
-impl Default for ContentAggregator {
-    fn default() -> Self {
-        Self::with_defaults()
-    }
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-    use indextree::Arena;
-
-    fn make_test_node_id() -> NodeId {
-        let mut arena = Arena::new();
-        let node = crate::document::TreeNode {
-            title: "Test".to_string(),
-            structure: String::new(),
-            content: String::new(),
-            summary: String::new(),
-            depth: 0,
-            start_index: 0,
-            end_index: 0,
-            start_page: None,
-            end_page: None,
-            node_id: None,
-            physical_index: None,
-            token_count: None,
-            references: Vec::new(),
-        };
-        NodeId(arena.new_node(node))
-    }
-
-    #[test]
-    fn test_aggregator_creation() {
-        let config = ContentAggregatorConfig::default();
-        let aggregator = ContentAggregator::new(config);
-        assert_eq!(aggregator.config.token_budget, 4000);
-    }
-
-    #[test]
-    fn test_aggregator_with_defaults() {
-        let aggregator = ContentAggregator::with_defaults();
-        assert_eq!(aggregator.config.token_budget, 4000);
-    }
-
-    #[test]
-    fn test_empty_candidates() {
-        let aggregator = ContentAggregator::with_defaults();
-        let tree = DocumentTree::new("Test", "");
-
-        let result = aggregator.aggregate(&[], &tree, "test query");
-
-        assert!(result.is_empty());
-        assert_eq!(result.tokens_used, 0);
-    }
-
-    #[test]
-    fn test_candidate_node_creation() {
-        let node_id = make_test_node_id();
-        let candidate = CandidateNode::new(node_id, 0.8, 2);
-
-        assert_eq!(candidate.score, 0.8);
-        assert_eq!(candidate.depth, 2);
-    }
-}
diff --git a/rust/src/retrieval/content/budget.rs b/rust/src/retrieval/content/budget.rs
deleted file mode 100644
index 82831603..00000000
--- a/rust/src/retrieval/content/budget.rs
+++ /dev/null
@@ -1,524 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Token budget allocation for content aggregation.
-//!
-//! This module provides budget-aware content selection that optimizes
-//! token usage while maximizing relevance.
-
-use std::collections::HashMap;
-
-use crate::document::NodeId;
-use crate::utils::estimate_tokens;
-
-use super::scorer::ContentRelevance;
-
-/// Allocation strategy for distributing token budget.
-#[derive(Debug, Clone, Copy, PartialEq)]
-pub enum AllocationStrategy {
-    /// Select highest-scoring content first until budget exhausted.
-    Greedy,
-    /// Distribute budget proportionally to relevance scores.
-    Proportional,
-    /// Ensure each depth level has minimum representation.
-    Hierarchical {
-        /// Minimum fraction of budget per level (0.0 - 1.0)
-        min_per_level: f32,
-    },
-}
-
-impl Default for AllocationStrategy {
-    fn default() -> Self {
-        Self::Hierarchical { min_per_level: 0.1 }
-    }
-}
-
-/// Information about content truncation.
-#[derive(Debug, Clone)]
-pub struct TruncationInfo {
-    /// Original content length in characters.
-    pub original_len: usize,
-    /// Truncated content length in characters.
-    pub truncated_len: usize,
-    /// Reason for truncation.
-    pub reason: TruncationReason,
-}
-
-/// Reason for content truncation.
-#[derive(Debug, Clone, Copy, PartialEq, Eq)]
-pub enum TruncationReason {
-    /// Content exceeded remaining budget.
-    BudgetExceeded,
-    /// Content tail had low relevance.
-    LowRelevanceTail,
-}
-
-/// A selected content item after budget allocation.
-#[derive(Debug, Clone)]
-pub struct SelectedContent {
-    /// Node ID.
-    pub node_id: NodeId,
-    /// Node title.
-    pub title: String,
-    /// Selected content text.
-    pub content: String,
-    /// Token count of selected content.
-    pub tokens: usize,
-    /// Relevance score.
-    pub score: f32,
-    /// Depth in tree.
-    pub depth: usize,
-    /// Truncation info if content was truncated.
-    pub truncation: Option<TruncationInfo>,
-}
-
-impl SelectedContent {
-    /// Check if content was truncated.
-    #[must_use]
-    pub fn is_truncated(&self) -> bool {
-        self.truncation.is_some()
-    }
-}
-
-/// Statistics about the allocation process.
-#[derive(Debug, Clone, Default)]
-pub struct AllocationStats {
-    /// Total content items considered.
-    pub items_considered: usize,
-    /// Items selected for output.
-    pub items_selected: usize,
-    /// Items truncated.
-    pub items_truncated: usize,
-    /// Items filtered (below threshold).
-    pub items_filtered: usize,
-    /// Average score of selected items.
-    pub avg_score: f32,
-}
-
-/// Result of budget allocation.
-#[derive(Debug, Clone)]
-pub struct AllocationResult {
-    /// Selected content items.
-    pub selected: Vec<SelectedContent>,
-    /// Total tokens used.
-    pub tokens_used: usize,
-    /// Remaining token budget.
-    pub remaining_budget: usize,
-    /// Allocation statistics.
-    pub stats: AllocationStats,
-}
-
-impl AllocationResult {
-    /// Check if any content was selected.
-    #[must_use]
-    pub fn is_empty(&self) -> bool {
-        self.selected.is_empty()
-    }
-
-    /// Get number of selected items.
-    #[must_use]
-    pub fn len(&self) -> usize {
-        self.selected.len()
-    }
-}
-
-/// Token budget allocator.
-#[derive(Debug)]
-pub struct BudgetAllocator {
-    /// Total token budget.
-    total_budget: usize,
-    /// Minimum reserve budget (for fallback).
-    min_reserve: usize,
-    /// Allocation strategy.
-    strategy: AllocationStrategy,
-    /// Minimum relevance score threshold.
-    min_score: f32,
-}
-
-impl BudgetAllocator {
-    /// Create a new allocator with the specified budget.
-    #[must_use]
-    pub fn new(budget: usize) -> Self {
-        Self {
-            total_budget: budget,
-            min_reserve: budget / 10,
-            strategy: AllocationStrategy::default(),
-            min_score: 0.0,
-        }
-    }
-
-    /// Set the allocation strategy.
-    #[must_use]
-    pub fn with_strategy(mut self, strategy: AllocationStrategy) -> Self {
-        self.strategy = strategy;
-        self
-    }
-
-    /// Set minimum relevance score threshold.
-    #[must_use]
-    pub fn with_min_score(mut self, min_score: f32) -> Self {
-        self.min_score = min_score;
-        self
-    }
-
-    /// Allocate budget to scored content.
-    #[must_use]
-    pub fn allocate(
-        &self,
-        scored_content: Vec<ContentRelevance>,
-        max_depth: usize,
-    ) -> AllocationResult {
-        // Filter by minimum score
-        let filtered: Vec<_> = scored_content
-            .into_iter()
-            .filter(|c| c.score >= self.min_score)
-            .collect();
-
-        let stats = AllocationStats {
-            items_considered: filtered.len(),
-            ..Default::default()
-        };
-
-        match &self.strategy {
-            AllocationStrategy::Greedy => self.allocate_greedy(filtered, stats),
-            AllocationStrategy::Proportional => self.allocate_proportional(filtered, stats),
-            AllocationStrategy::Hierarchical { min_per_level } => {
-                self.allocate_hierarchical(filtered, max_depth, *min_per_level, stats)
-            }
-        }
-    }
-
-    /// Greedy allocation: select highest-scoring content first.
-    fn allocate_greedy(
-        &self,
-        mut content: Vec<ContentRelevance>,
-        mut stats: AllocationStats,
-    ) -> AllocationResult {
-        // Sort by score descending
-        content.sort_by(|a, b| {
-            b.score
-                .partial_cmp(&a.score)
-                .unwrap_or(std::cmp::Ordering::Equal)
-        });
-
-        let mut selected = Vec::new();
-        let mut tokens_used = 0;
-
-        for relevance in content {
-            let tokens = relevance.chunk.token_count();
-
-            if tokens_used + tokens <= self.total_budget {
-                selected.push(SelectedContent {
-                    node_id: relevance.chunk.node_id,
-                    title: relevance.chunk.title,
-                    content: relevance.chunk.content,
-                    tokens,
-                    score: relevance.score,
-                    depth: relevance.chunk.depth,
-                    truncation: None,
-                });
-                tokens_used += tokens;
-            } else {
-                // Try to fit truncated content
-                let remaining = self.total_budget - tokens_used;
-                if remaining >= 50 {
-                    // Minimum useful content
-                    if let Some(truncated) =
-                        self.truncate_content(&relevance.chunk.content, remaining)
-                    {
-                        let truncated_tokens = estimate_tokens(&truncated);
-                        selected.push(SelectedContent {
-                            node_id: relevance.chunk.node_id,
-                            title: relevance.chunk.title,
-                            content: truncated,
-                            tokens: truncated_tokens,
-                            score: relevance.score,
-                            depth: relevance.chunk.depth,
-                            truncation: Some(TruncationInfo {
-                                original_len: relevance.chunk.content.len(),
-                                truncated_len: remaining,
-                                reason: TruncationReason::BudgetExceeded,
-                            }),
-                        });
-                        tokens_used += truncated_tokens;
-                        stats.items_truncated += 1;
-                    }
-                }
-                break;
-            }
-        }
-
-        stats.items_selected = selected.len();
-        stats.avg_score = if selected.is_empty() {
-            0.0
-        } else {
-            selected.iter().map(|s| s.score).sum::<f32>() / selected.len() as f32
-        };
-
-        AllocationResult {
-            selected,
-            tokens_used,
-            remaining_budget: self.total_budget - tokens_used,
-            stats,
-        }
-    }
-
-    /// Proportional allocation: distribute budget by score ratio.
-    fn allocate_proportional(
-        &self,
-        content: Vec<ContentRelevance>,
-        mut stats: AllocationStats,
-    ) -> AllocationResult {
-        let total_score: f32 = content.iter().map(|c| c.score).sum();
-        if total_score == 0.0 {
-            return AllocationResult {
-                selected: Vec::new(),
-                tokens_used: 0,
-                remaining_budget: self.total_budget,
-                stats,
-            };
-        }
-
-        let mut selected = Vec::new();
-        let mut tokens_used = 0;
-
-        for relevance in content {
-            // Calculate proportional budget
-            let proportion = relevance.score / total_score;
-            let allocated_budget = ((self.total_budget as f32 * proportion) as usize).max(50);
-
-            let content_tokens = relevance.chunk.token_count();
-
-            if content_tokens <= allocated_budget {
-                // Full content fits
-                if tokens_used + content_tokens <= self.total_budget {
-                    selected.push(SelectedContent {
-                        node_id: relevance.chunk.node_id,
-                        title: relevance.chunk.title,
-                        content: relevance.chunk.content,
-                        tokens: content_tokens,
-                        score: relevance.score,
-                        depth: relevance.chunk.depth,
-                        truncation: None,
-                    });
-                    tokens_used += content_tokens;
-                }
-            } else {
-                // Truncate to allocated budget
-                let remaining = self.total_budget - tokens_used;
-                if remaining >= 50 && remaining >= allocated_budget / 2 {
-                    if let Some(truncated) = self
-                        .truncate_content(&relevance.chunk.content, remaining.min(allocated_budget))
-                    {
-                        let truncated_tokens = estimate_tokens(&truncated);
-                        let truncated_len = truncated.len();
-                        selected.push(SelectedContent {
-                            node_id: relevance.chunk.node_id,
-                            title: relevance.chunk.title,
-                            content: truncated,
-                            tokens: truncated_tokens,
-                            score: relevance.score,
-                            depth: relevance.chunk.depth,
-                            truncation: Some(TruncationInfo {
-                                original_len: relevance.chunk.content.len(),
-                                truncated_len,
-                                reason: TruncationReason::BudgetExceeded,
-                            }),
-                        });
-                        tokens_used += truncated_tokens;
-                        stats.items_truncated += 1;
-                    }
-                }
-            }
-        }
-
-        stats.items_selected = selected.len();
-        stats.avg_score = if selected.is_empty() {
-            0.0
-        } else {
-            selected.iter().map(|s| s.score).sum::<f32>() / selected.len() as f32
-        };
-
-        AllocationResult {
-            selected,
-            tokens_used,
-            remaining_budget: self.total_budget - tokens_used,
-            stats,
-        }
-    }
-
-    /// Hierarchical allocation: ensure each depth level has representation.
-    fn allocate_hierarchical(
-        &self,
-        content: Vec<ContentRelevance>,
-        max_depth: usize,
-        min_per_level: f32,
-        mut stats: AllocationStats,
-    ) -> AllocationResult {
-        // Group content by depth
-        let mut by_depth: HashMap<usize, Vec<ContentRelevance>> = HashMap::new();
-        for c in content {
-            by_depth.entry(c.chunk.depth).or_default().push(c);
-        }
-
-        // Sort each level by score
-        for (_depth, items) in by_depth.iter_mut() {
-            items.sort_by(|a, b| {
-                b.score
-                    .partial_cmp(&a.score)
-                    .unwrap_or(std::cmp::Ordering::Equal)
-            });
-        }
-
-        let per_level_budget = (self.total_budget as f32 * min_per_level) as usize;
-        let mut selected = Vec::new();
-        let mut tokens_used = 0;
-
-        // Process from shallow to deep
-        for depth in 0..=max_depth {
-            if tokens_used >= self.total_budget {
-                break;
-            }
-
-            if let Some(level_content) = by_depth.get(&depth) {
-                let mut level_used = 0;
-
-                for relevance in level_content {
-                    if tokens_used >= self.total_budget {
-                        break;
-                    }
-
-                    let tokens = relevance.chunk.token_count();
-
-                    // Check if we should include this content
-                    let can_include_full = tokens_used + tokens <= self.total_budget;
-                    let level_budget_ok = level_used < per_level_budget || depth == 0;
-
-                    if can_include_full && level_budget_ok {
-                        selected.push(SelectedContent {
-                            node_id: relevance.chunk.node_id,
-                            title: relevance.chunk.title.clone(),
-                            content: relevance.chunk.content.clone(),
-                            tokens,
-                            score: relevance.score,
-                            depth,
-                            truncation: None,
-                        });
-                        tokens_used += tokens;
-                        level_used += tokens;
-                    } else if level_used < per_level_budget {
-                        // Try truncated version
-                        let remaining =
-                            (self.total_budget - tokens_used).min(per_level_budget - level_used);
-                        if remaining >= 50 {
-                            if let Some(truncated) =
-                                self.truncate_content(&relevance.chunk.content, remaining)
-                            {
-                                let truncated_tokens = estimate_tokens(&truncated);
-                                selected.push(SelectedContent {
-                                    node_id: relevance.chunk.node_id,
-                                    title: relevance.chunk.title.clone(),
-                                    content: truncated,
-                                    tokens: truncated_tokens,
-                                    score: relevance.score,
-                                    depth,
-                                    truncation: Some(TruncationInfo {
-                                        original_len: relevance.chunk.content.len(),
-                                        truncated_len: remaining,
-                                        reason: TruncationReason::BudgetExceeded,
-                                    }),
-                                });
-                                tokens_used += truncated_tokens;
-                                level_used += truncated_tokens;
-                                stats.items_truncated += 1;
-                            }
-                        }
-                    }
-                }
-            }
-        }
-
-        // Second pass: fill remaining budget with highest-scoring content
-        if tokens_used < self.total_budget - self.min_reserve {
-            let mut all_remaining: Vec<_> = by_depth
-                .values()
-                .flat_map(|v| v.iter())
-                .filter(|c| !selected.iter().any(|s| s.node_id == c.chunk.node_id))
-                .collect();
-
-            all_remaining.sort_by(|a, b| {
-                b.score
-                    .partial_cmp(&a.score)
-                    .unwrap_or(std::cmp::Ordering::Equal)
-            });
-
-            for relevance in all_remaining {
-                if tokens_used >= self.total_budget - self.min_reserve {
-                    break;
-                }
-
-                let tokens = relevance.chunk.token_count();
-                if tokens_used + tokens <= self.total_budget {
-                    selected.push(SelectedContent {
-                        node_id: relevance.chunk.node_id,
-                        title: relevance.chunk.title.clone(),
-                        content: relevance.chunk.content.clone(),
-                        tokens,
-                        score: relevance.score,
-                        depth: relevance.chunk.depth,
-                        truncation: None,
-                    });
-                    tokens_used += tokens;
-                }
-            }
-        }
-
-        stats.items_selected = selected.len();
-        stats.avg_score = if selected.is_empty() {
-            0.0
-        } else {
-            selected.iter().map(|s| s.score).sum::<f32>() / selected.len() as f32
-        };
-
-        AllocationResult {
-            selected,
-            tokens_used,
-            remaining_budget: self.total_budget - tokens_used,
-            stats,
-        }
-    }
-
-    /// Truncate content to fit within token budget.
-    fn truncate_content(&self, content: &str, max_tokens: usize) -> Option<String> {
-        if max_tokens < 20 {
-            return None;
-        }
-
-        // Approximate: 1 token ≈ 4 characters (for English)
-        let max_chars = max_tokens * 4;
-
-        if content.len() <= max_chars {
-            return Some(content.to_string());
-        }
-
-        // Try to break at sentence boundary
-        let truncated = &content[..max_chars];
-
-        // Find last sentence boundary
-        if let Some(pos) = truncated.rfind(|c| c == '.' || c == '!' || c == '?') {
-            Some(format!("{}...", &truncated[..=pos]))
-        } else if let Some(pos) = truncated.rfind(' ') {
-            // Fall back to word boundary
-            Some(format!("{}...", &truncated[..pos]))
-        } else {
-            // Hard truncate
-            Some(format!("{}...", truncated))
-        }
-    }
-}
-
-impl Default for BudgetAllocator {
-    fn default() -> Self {
-        Self::new(4000)
-    }
-}
diff --git a/rust/src/retrieval/content/builder.rs b/rust/src/retrieval/content/builder.rs
deleted file mode 100644
index bf652c76..00000000
--- a/rust/src/retrieval/content/builder.rs
+++ /dev/null
@@ -1,514 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Structure builder for aggregated content.
-//!
-//! This module transforms selected content into structured output formats.
-
-use serde::{Deserialize, Serialize};
-
-use crate::document::DocumentTree;
-
-use super::budget::SelectedContent;
-use super::config::OutputFormatConfig;
-
-/// Output format for structured content.
-#[derive(Debug, Clone, Copy, PartialEq, Eq, Default)]
-pub enum OutputFormat {
-    /// Markdown format with headers.
-    #[default]
-    Markdown,
-    /// JSON format.
-    Json,
-    /// Tree format.
-    Tree,
-    /// Flat text format.
-    Flat,
-}
-
-impl From<OutputFormatConfig> for OutputFormat {
-    fn from(config: OutputFormatConfig) -> Self {
-        match config {
-            OutputFormatConfig::Markdown => Self::Markdown,
-            OutputFormatConfig::Json => Self::Json,
-            OutputFormatConfig::Tree => Self::Tree,
-            OutputFormatConfig::Flat => Self::Flat,
-        }
-    }
-}
-
-/// Tree node in the content structure.
-#[derive(Debug, Clone, Serialize, Deserialize)]
-pub struct ContentTreeNode {
-    /// Node title.
-    pub title: String,
-    /// Node content (if any).
-    pub content: Option<String>,
-    /// Relevance score.
-    pub score: f32,
-    /// Child nodes.
-    pub children: Vec<ContentTreeNode>,
-}
-
-impl ContentTreeNode {
-    /// Create a new tree node.
-    #[must_use]
-    pub fn new(title: String) -> Self {
-        Self {
-            title,
-            content: None,
-            score: 0.0,
-            children: Vec::new(),
-        }
-    }
-
-    /// Add content to this node.
-    #[must_use]
-    pub fn with_content(mut self, content: String, score: f32) -> Self {
-        self.content = Some(content);
-        self.score = score;
-        self
-    }
-
-    /// Add a child node.
-    pub fn add_child(&mut self, child: ContentTreeNode) {
-        self.children.push(child);
-    }
-}
-
-/// Content tree structure.
-#[derive(Debug, Clone, Serialize, Deserialize)]
-pub struct ContentTree {
-    /// Root node.
-    pub root: ContentTreeNode,
-    /// Total nodes in tree.
-    pub total_nodes: usize,
-}
-
-impl ContentTree {
-    /// Create a new content tree.
-    #[must_use]
-    pub fn new(root: ContentTreeNode) -> Self {
-        Self {
-            total_nodes: 1,
-            root,
-        }
-    }
-}
-
-/// Metadata about aggregated content.
-#[derive(Debug, Clone, Default, Serialize, Deserialize)]
-pub struct ContentMetadata {
-    /// Total tokens in content.
-    pub total_tokens: usize,
-    /// Number of nodes included.
-    pub node_count: usize,
-    /// Average relevance score.
-    pub avg_score: f32,
-    /// Maximum depth included.
-    pub max_depth: usize,
-}
-
-/// Structured content result.
-#[derive(Debug, Clone)]
-pub struct StructuredContent {
-    /// Formatted content string.
-    pub content: String,
-    /// Optional tree structure.
-    pub structure: Option<ContentTree>,
-    /// Content metadata.
-    pub metadata: ContentMetadata,
-}
-
-impl StructuredContent {
-    /// Check if content is empty.
-    #[must_use]
-    pub fn is_empty(&self) -> bool {
-        self.content.is_empty()
-    }
-
-    /// Get content length in characters.
-    #[must_use]
-    pub fn len(&self) -> usize {
-        self.content.len()
-    }
-}
-
-/// Builder for creating structured content output.
-#[derive(Debug)]
-pub struct StructureBuilder {
-    /// Output format.
-    format: OutputFormat,
-    /// Include metadata in output.
-    include_metadata: bool,
-    /// Include scores in output.
-    include_scores: bool,
-}
-
-impl StructureBuilder {
-    /// Create a new structure builder.
-    #[must_use]
-    pub fn new(format: OutputFormat) -> Self {
-        Self {
-            format,
-            include_metadata: false,
-            include_scores: false,
-        }
-    }
-
-    /// Create builder from config.
-    #[must_use]
-    pub fn from_config(format: OutputFormatConfig, include_scores: bool) -> Self {
-        Self {
-            format: OutputFormat::from(format),
-            include_metadata: false,
-            include_scores,
-        }
-    }
-
-    /// Enable metadata in output.
-    #[must_use]
-    pub fn with_metadata(mut self) -> Self {
-        self.include_metadata = true;
-        self
-    }
-
-    /// Enable scores in output.
-    #[must_use]
-    pub fn with_scores(mut self) -> Self {
-        self.include_scores = true;
-        self
-    }
-
-    /// Build structured content from selected items.
-    #[must_use]
-    pub fn build(&self, selected: Vec<SelectedContent>, tree: &DocumentTree) -> StructuredContent {
-        if selected.is_empty() {
-            return StructuredContent {
-                content: String::new(),
-                structure: None,
-                metadata: ContentMetadata::default(),
-            };
-        }
-
-        // Calculate metadata
-        let total_tokens: usize = selected.iter().map(|s| s.tokens).sum();
-        let avg_score = selected.iter().map(|s| s.score).sum::<f32>() / selected.len() as f32;
-        let max_depth = selected.iter().map(|s| s.depth).max().unwrap_or(0);
-
-        let metadata = ContentMetadata {
-            total_tokens,
-            node_count: selected.len(),
-            avg_score,
-            max_depth,
-        };
-
-        // Build based on format
-        let (content, structure) = match &self.format {
-            OutputFormat::Markdown => self.build_markdown(selected, tree),
-            OutputFormat::Json => self.build_json(selected, tree),
-            OutputFormat::Tree => self.build_tree_format(selected, tree),
-            OutputFormat::Flat => self.build_flat(selected),
-        };
-
-        StructuredContent {
-            content,
-            structure,
-            metadata,
-        }
-    }
-
-    /// Build Markdown format output.
-    fn build_markdown(
-        &self,
-        selected: Vec<SelectedContent>,
-        _tree: &DocumentTree,
-    ) -> (String, Option<ContentTree>) {
-        let mut sections = Vec::new();
-        let mut current_depth = 0;
-
-        // Sort by depth to maintain hierarchy
-        let mut sorted = selected;
-        sorted.sort_by(|a, b| a.depth.cmp(&b.depth));
-
-        for content in sorted {
-            // Adjust heading level based on depth
-            let heading_level = (content.depth + 1).min(6);
-            let heading = "#".repeat(heading_level);
-
-            let mut section = format!("{} {}", heading, content.title);
-
-            if self.include_scores {
-                section.push_str(&format!(" *(score: {:.2})*", content.score));
-            }
-
-            section.push_str("\n\n");
-            section.push_str(&content.content);
-
-            if content.is_truncated() {
-                section.push_str("\n\n*[content truncated]*");
-            }
-
-            sections.push(section);
-            current_depth = current_depth.max(content.depth);
-        }
-
-        (sections.join("\n\n---\n\n"), None)
-    }
-
-    /// Build JSON format output.
-    fn build_json(
-        &self,
-        selected: Vec<SelectedContent>,
-        _tree: &DocumentTree,
-    ) -> (String, Option<ContentTree>) {
-        #[derive(Serialize)]
-        struct JsonOutput<'a> {
-            sections: Vec<JsonSection<'a>>,
-        }
-
-        #[derive(Serialize)]
-        struct JsonSection<'a> {
-            title: &'a str,
-            content: &'a str,
-            score: f32,
-            depth: usize,
-            truncated: bool,
-        }
-
-        let sections: Vec<_> = selected
-            .iter()
-            .map(|s| JsonSection {
-                title: &s.title,
-                content: &s.content,
-                score: s.score,
-                depth: s.depth,
-                truncated: s.is_truncated(),
-            })
-            .collect();
-
-        let output = JsonOutput { sections };
-        let content = serde_json::to_string_pretty(&output).unwrap_or_default();
-
-        (content, None)
-    }
-
-    /// Build tree format output.
-    fn build_tree_format(
-        &self,
-        selected: Vec<SelectedContent>,
-        tree: &DocumentTree,
-    ) -> (String, Option<ContentTree>) {
-        // Build tree structure
-        let mut root = ContentTreeNode::new("Content".to_string());
-        let mut node_count = 0;
-
-        // Group by parent
-        use std::collections::HashMap;
-        let mut by_parent: HashMap<Option<crate::document::NodeId>, Vec<&SelectedContent>> =
-            HashMap::new();
-
-        for content in &selected {
-            let parent = tree.get(content.node_id).and_then(|_| {
-                // Find parent in selected
-                selected
-                    .iter()
-                    .find(|s| s.depth < content.depth)
-                    .map(|s| Some(s.node_id))
-                    .unwrap_or(None)
-            });
-            by_parent.entry(parent).or_default().push(content);
-        }
-
-        // Build tree recursively
-        fn build_node(
-            content: &SelectedContent,
-            all_by_parent: &HashMap<Option<crate::document::NodeId>, Vec<&SelectedContent>>,
-        ) -> ContentTreeNode {
-            let mut node = ContentTreeNode::new(content.title.clone())
-                .with_content(content.content.clone(), content.score);
-
-            if let Some(children) = all_by_parent.get(&Some(content.node_id)) {
-                for child in children {
-                    node.add_child(build_node(child, all_by_parent));
-                }
-            }
-
-            node
-        }
-
-        // Add top-level items
-        if let Some(top_level) = by_parent.get(&None) {
-            for content in top_level {
-                let node = build_node(content, &by_parent);
-                node_count += count_nodes(&node);
-                root.add_child(node);
-            }
-        }
-
-        // Build string representation
-        let content = render_tree(&root, 0);
-
-        let tree_structure = ContentTree {
-            root,
-            total_nodes: node_count,
-        };
-
-        (content, Some(tree_structure))
-    }
-
-    /// Build flat format output.
-    fn build_flat(&self, selected: Vec<SelectedContent>) -> (String, Option<ContentTree>) {
-        let parts: Vec<_> = selected
-            .iter()
-            .map(|c| {
-                let mut part = format!("[{}] {}", c.title, c.content);
-                if self.include_scores {
-                    part = format!("[{}] (score: {:.2}) {}", c.title, c.score, c.content);
-                }
-                part
-            })
-            .collect();
-
-        (parts.join("\n\n"), None)
-    }
-}
-
-impl Default for StructureBuilder {
-    fn default() -> Self {
-        Self::new(OutputFormat::default())
-    }
-}
-
-/// Count nodes in a tree.
-fn count_nodes(node: &ContentTreeNode) -> usize {
-    1 + node.children.iter().map(count_nodes).sum::<usize>()
-}
-
-/// Render tree as string.
-fn render_tree(node: &ContentTreeNode, depth: usize) -> String {
-    let indent = "  ".repeat(depth);
-    let mut result = format!("{}├─ {} (score: {:.2})\n", indent, node.title, node.score);
-
-    if let Some(ref content) = node.content {
-        let preview = if content.len() > 100 {
-            format!("{}...", &content[..100])
-        } else {
-            content.clone()
-        };
-        result.push_str(&format!("{}│  {}\n", indent, preview.replace('\n', " ")));
-    }
-
-    for child in &node.children {
-        result.push_str(&render_tree(child, depth + 1));
-    }
-
-    result
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-    use crate::document::NodeId;
-    use indextree::Arena;
-
-    fn make_test_node_id() -> NodeId {
-        let mut arena = Arena::new();
-        let node = crate::document::TreeNode {
-            title: "Test".to_string(),
-            structure: String::new(),
-            content: String::new(),
-            summary: String::new(),
-            depth: 0,
-            start_index: 0,
-            end_index: 0,
-            start_page: None,
-            end_page: None,
-            node_id: None,
-            physical_index: None,
-            token_count: None,
-            references: Vec::new(),
-        };
-        NodeId(arena.new_node(node))
-    }
-
-    fn make_selected(title: &str, content: &str, score: f32, depth: usize) -> SelectedContent {
-        SelectedContent {
-            node_id: make_test_node_id(),
-            title: title.to_string(),
-            content: content.to_string(),
-            tokens: 50,
-            score,
-            depth,
-            truncation: None,
-        }
-    }
-
-    #[test]
-    fn test_markdown_builder() {
-        let builder = StructureBuilder::new(OutputFormat::Markdown);
-        let selected = vec![
-            make_selected("Section 1", "Content 1", 0.9, 0),
-            make_selected("Section 2", "Content 2", 0.8, 1),
-        ];
-
-        // Create a minimal tree for testing
-        let tree = DocumentTree::new("Test", "");
-
-        let result = builder.build(selected, &tree);
-
-        assert!(!result.is_empty());
-        assert!(result.content.contains("Section 1"));
-        assert!(result.content.contains("Section 2"));
-        assert!(result.content.contains("# Section 1"));
-        assert!(result.content.contains("## Section 2"));
-    }
-
-    #[test]
-    fn test_flat_builder() {
-        let builder = StructureBuilder::new(OutputFormat::Flat);
-        let selected = vec![make_selected("Section 1", "Content 1", 0.9, 0)];
-
-        let tree = DocumentTree::new("Test", "");
-        let result = builder.build(selected, &tree);
-
-        assert!(result.content.contains("[Section 1]"));
-        assert!(result.content.contains("Content 1"));
-    }
-
-    #[test]
-    fn test_builder_with_scores() {
-        let builder = StructureBuilder::new(OutputFormat::Markdown).with_scores();
-
-        let selected = vec![make_selected("Section 1", "Content 1", 0.95, 0)];
-
-        let tree = DocumentTree::new("Test", "");
-        let result = builder.build(selected, &tree);
-
-        assert!(result.content.contains("score: 0.95"));
-    }
-
-    #[test]
-    fn test_empty_selected() {
-        let builder = StructureBuilder::new(OutputFormat::Markdown);
-        let tree = DocumentTree::new("Test", "");
-        let result = builder.build(Vec::new(), &tree);
-
-        assert!(result.is_empty());
-        assert_eq!(result.metadata.node_count, 0);
-    }
-
-    #[test]
-    fn test_content_tree_node() {
-        let mut root =
-            ContentTreeNode::new("Root".to_string()).with_content("Root content".to_string(), 0.9);
-
-        let child = ContentTreeNode::new("Child".to_string())
-            .with_content("Child content".to_string(), 0.8);
-
-        root.add_child(child);
-
-        assert_eq!(root.children.len(), 1);
-        assert_eq!(root.score, 0.9);
-    }
-}
diff --git a/rust/src/retrieval/content/config.rs b/rust/src/retrieval/content/config.rs
deleted file mode 100644
index aa40bc8a..00000000
--- a/rust/src/retrieval/content/config.rs
+++ /dev/null
@@ -1,156 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Configuration types for content aggregation.
-
-use serde::{Deserialize, Serialize};
-
-/// Configuration for content aggregation.
-#[derive(Debug, Clone, Serialize, Deserialize)]
-pub struct ContentAggregatorConfig {
-    /// Maximum tokens to return in aggregated content.
-    pub token_budget: usize,
-
-    /// Minimum relevance score threshold (0.0 - 1.0).
-    /// Content below this threshold will be filtered out.
-    pub min_relevance_score: f32,
-
-    /// Scoring strategy for relevance computation.
-    pub scoring_strategy: ScoringStrategyConfig,
-
-    /// Output format for aggregated content.
-    pub output_format: OutputFormatConfig,
-
-    /// Include relevance scores in output metadata.
-    pub include_scores: bool,
-
-    /// Minimum budget allocation per depth level (for hierarchical strategy).
-    /// Value between 0.0 and 1.0, representing fraction of total budget.
-    pub hierarchical_min_per_level: f32,
-
-    /// Enable content deduplication.
-    pub deduplicate: bool,
-
-    /// Similarity threshold for deduplication (0.0 - 1.0).
-    pub dedup_threshold: f32,
-}
-
-impl Default for ContentAggregatorConfig {
-    fn default() -> Self {
-        Self {
-            token_budget: 4000,
-            min_relevance_score: 0.2,
-            scoring_strategy: ScoringStrategyConfig::KeywordWithBM25,
-            output_format: OutputFormatConfig::Markdown,
-            include_scores: false,
-            hierarchical_min_per_level: 0.1,
-            deduplicate: true,
-            dedup_threshold: 0.9,
-        }
-    }
-}
-
-impl ContentAggregatorConfig {
-    /// Create a new config with default values.
-    #[must_use]
-    pub fn new() -> Self {
-        Self::default()
-    }
-
-    /// Set the token budget.
-    #[must_use]
-    pub fn with_token_budget(mut self, budget: usize) -> Self {
-        self.token_budget = budget;
-        self
-    }
-
-    /// Set the minimum relevance score.
-    #[must_use]
-    pub fn with_min_relevance(mut self, score: f32) -> Self {
-        self.min_relevance_score = score.clamp(0.0, 1.0);
-        self
-    }
-
-    /// Set the scoring strategy.
-    #[must_use]
-    pub fn with_scoring_strategy(mut self, strategy: ScoringStrategyConfig) -> Self {
-        self.scoring_strategy = strategy;
-        self
-    }
-
-    /// Set the output format.
-    #[must_use]
-    pub fn with_output_format(mut self, format: OutputFormatConfig) -> Self {
-        self.output_format = format;
-        self
-    }
-}
-
-/// Scoring strategy configuration.
-#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
-#[serde(rename_all = "snake_case")]
-pub enum ScoringStrategyConfig {
-    /// Fast keyword matching only.
-    KeywordOnly,
-    /// Keyword matching with BM25 scoring.
-    KeywordWithBM25,
-    /// Hybrid: keyword + LLM reranking for top candidates.
-    Hybrid,
-}
-
-impl Default for ScoringStrategyConfig {
-    fn default() -> Self {
-        Self::KeywordWithBM25
-    }
-}
-
-/// Output format configuration.
-#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
-#[serde(rename_all = "snake_case")]
-pub enum OutputFormatConfig {
-    /// Markdown format with headers.
-    Markdown,
-    /// JSON format.
-    Json,
-    /// Tree format.
-    Tree,
-    /// Flat text format.
-    Flat,
-}
-
-impl Default for OutputFormatConfig {
-    fn default() -> Self {
-        Self::Markdown
-    }
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-
-    #[test]
-    fn test_default_config() {
-        let config = ContentAggregatorConfig::default();
-        assert_eq!(config.token_budget, 4000);
-        assert_eq!(config.min_relevance_score, 0.2);
-    }
-
-    #[test]
-    fn test_config_builder() {
-        let config = ContentAggregatorConfig::new()
-            .with_token_budget(2000)
-            .with_min_relevance(0.5);
-
-        assert_eq!(config.token_budget, 2000);
-        assert_eq!(config.min_relevance_score, 0.5);
-    }
-
-    #[test]
-    fn test_min_relevance_clamped() {
-        let config = ContentAggregatorConfig::new().with_min_relevance(1.5);
-        assert_eq!(config.min_relevance_score, 1.0);
-
-        let config = ContentAggregatorConfig::new().with_min_relevance(-0.5);
-        assert_eq!(config.min_relevance_score, 0.0);
-    }
-}
diff --git a/rust/src/retrieval/content/mod.rs b/rust/src/retrieval/content/mod.rs
deleted file mode 100644
index f339f182..00000000
--- a/rust/src/retrieval/content/mod.rs
+++ /dev/null
@@ -1,41 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Content aggregation module for retrieval results.
-//!
-//! This module provides precision-focused, budget-aware content aggregation
-//! that transforms candidate nodes into structured, relevant content.
-//!
-//! # Architecture
-//!
-//! ```text
-//! ┌─────────────────────────────────────────────────────────────┐
-//! │                    Content Aggregator                         │
-//! ├─────────────────────────────────────────────────────────────┤
-//! │  RelevanceScorer → BudgetAllocator → StructureBuilder       │
-//! └─────────────────────────────────────────────────────────────┘
-//! ```
-//!
-//! # Example
-//!
-//! ```rust,ignore
-//! use vectorless::retrieval::content::{ContentAggregator, ContentAggregatorConfig};
-//!
-//! let config = ContentAggregatorConfig {
-//!     token_budget: 4000,
-//!     min_relevance_score: 0.3,
-//!     ..Default::default()
-//! };
-//!
-//! let aggregator = ContentAggregator::new(config);
-//! let result = aggregator.aggregate(&candidates, &tree, &query);
-//! ```
-
-mod aggregator;
-mod budget;
-mod builder;
-mod config;
-mod scorer;
-
-pub use aggregator::{CandidateNode, ContentAggregator};
-pub use config::{ContentAggregatorConfig, OutputFormatConfig, ScoringStrategyConfig};
diff --git a/rust/src/retrieval/content/scorer.rs b/rust/src/retrieval/content/scorer.rs
deleted file mode 100644
index 2f0e66e3..00000000
--- a/rust/src/retrieval/content/scorer.rs
+++ /dev/null
@@ -1,379 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Relevance scoring for content chunks.
-//!
-//! This module provides fine-grained relevance scoring for content,
-//! combining keyword matching, BM25, and optional LLM reranking.
-
-use std::collections::HashMap;
-
-use crate::document::NodeId;
-use crate::retrieval::scoring::{Bm25Params, STOPWORDS, extract_keywords};
-use crate::utils::estimate_tokens;
-
-use super::config::ScoringStrategyConfig;
-
-/// Content chunk for scoring.
-#[derive(Debug, Clone)]
-pub struct ContentChunk {
-    /// Node ID this chunk belongs to.
-    pub node_id: NodeId,
-    /// Title of the node.
-    pub title: String,
-    /// Content text.
-    pub content: String,
-    /// Depth in tree (0 = root level).
-    pub depth: usize,
-}
-
-impl ContentChunk {
-    /// Create a new content chunk.
-    #[must_use]
-    pub fn new(node_id: NodeId, title: String, content: String, depth: usize) -> Self {
-        Self {
-            node_id,
-            title,
-            content,
-            depth,
-        }
-    }
-
-    /// Estimate token count for this chunk.
-    #[must_use]
-    pub fn token_count(&self) -> usize {
-        estimate_tokens(&self.content)
-    }
-}
-
-/// Relevance score components.
-#[derive(Debug, Clone, Default)]
-pub struct ScoreComponents {
-    /// Keyword match score (0.0 - 1.0).
-    pub keyword_score: f32,
-    /// BM25 score (normalized).
-    pub bm25_score: f32,
-    /// Depth penalty (deeper = lower score).
-    pub depth_penalty: f32,
-    /// Path bonus from parent relevance.
-    pub path_bonus: f32,
-    /// Information density score.
-    pub density_score: f32,
-}
-
-impl ScoreComponents {
-    /// Compute final weighted score.
-    #[must_use]
-    pub fn final_score(&self) -> f32 {
-        // Weight formula from design doc
-        let score = self.keyword_score * 0.35
-            + self.bm25_score * 0.25
-            + self.depth_penalty * 0.15
-            + self.path_bonus * 0.10
-            + self.density_score * 0.15;
-
-        score.clamp(0.0, 1.0)
-    }
-}
-
-/// Relevance score result for a content chunk.
-#[derive(Debug, Clone)]
-pub struct ContentRelevance {
-    /// The content chunk that was scored.
-    pub chunk: ContentChunk,
-    /// Final relevance score (0.0 - 1.0).
-    pub score: f32,
-    /// Score breakdown by component.
-    pub components: ScoreComponents,
-}
-
-impl ContentRelevance {
-    /// Create a new relevance result.
-    #[must_use]
-    pub fn new(chunk: ContentChunk, score: f32, components: ScoreComponents) -> Self {
-        Self {
-            chunk,
-            score,
-            components,
-        }
-    }
-}
-
-/// Context for scoring operations.
-#[derive(Debug, Clone)]
-pub struct ScoringContext {
-    /// Average document length for BM25.
-    pub avg_doc_len: f32,
-    /// Total document count for IDF.
-    pub doc_count: usize,
-    /// Document frequency for terms.
-    pub doc_freq: HashMap<String, usize>,
-    /// Parent node score (for path bonus).
-    pub parent_score: Option<f32>,
-}
-
-impl Default for ScoringContext {
-    fn default() -> Self {
-        Self {
-            avg_doc_len: 100.0,
-            doc_count: 1,
-            doc_freq: HashMap::new(),
-            parent_score: None,
-        }
-    }
-}
-
-/// Relevance scorer for content chunks.
-#[derive(Debug)]
-pub struct RelevanceScorer {
-    /// Query keywords extracted from the query.
-    query_keywords: Vec<String>,
-    /// Scoring strategy to use.
-    strategy: ScoringStrategyConfig,
-    /// BM25 parameters.
-    params: Bm25Params,
-}
-
-impl RelevanceScorer {
-    /// Create a new scorer with keywords.
-    #[must_use]
-    pub fn new(query: &str, strategy: ScoringStrategyConfig) -> Self {
-        let query_keywords = extract_keywords(query);
-        Self {
-            query_keywords,
-            strategy,
-            params: Bm25Params::default(),
-        }
-    }
-
-    /// Create a scorer with pre-extracted keywords.
-    #[must_use]
-    pub fn with_keywords(keywords: Vec<String>, strategy: ScoringStrategyConfig) -> Self {
-        Self {
-            query_keywords: keywords,
-            strategy,
-            params: Bm25Params::default(),
-        }
-    }
-
-    /// Score a content chunk.
-    #[must_use]
-    pub fn score_chunk(&self, chunk: &ContentChunk, ctx: &ScoringContext) -> ContentRelevance {
-        let mut components = ScoreComponents::default();
-
-        // 1. Keyword score (content + title + summary combined)
-        components.keyword_score =
-            self.compute_keyword_score(&format!("{} {}", chunk.title, chunk.content));
-
-        // 2. BM25 score (if enabled)
-        if matches!(
-            self.strategy,
-            ScoringStrategyConfig::KeywordWithBM25 | ScoringStrategyConfig::Hybrid
-        ) {
-            components.bm25_score = self.compute_bm25_score(&chunk.content, ctx);
-        }
-
-        // 3. Depth penalty (10% per level)
-        components.depth_penalty = 0.9_f32.powi(chunk.depth as i32);
-
-        // 4. Path bonus
-        components.path_bonus = ctx.parent_score.map(|s| s * 0.2).unwrap_or(0.0);
-
-        // 5. Density score
-        components.density_score = compute_density(&chunk.content);
-
-        let final_score = components.final_score();
-
-        ContentRelevance::new(chunk.clone(), final_score, components)
-    }
-
-    /// Score multiple chunks.
-    pub fn score_chunks<'a>(
-        &self,
-        chunks: &'a [ContentChunk],
-        ctx: &ScoringContext,
-    ) -> Vec<ContentRelevance> {
-        chunks
-            .iter()
-            .map(|chunk| self.score_chunk(chunk, ctx))
-            .collect()
-    }
-
-    /// Compute keyword overlap score.
-    fn compute_keyword_score(&self, content: &str) -> f32 {
-        if self.query_keywords.is_empty() {
-            return 0.5; // Neutral score if no keywords
-        }
-
-        let content_lower = content.to_lowercase();
-        let content_words: std::collections::HashSet<&str> =
-            content_lower.split_whitespace().collect();
-
-        let matches = self
-            .query_keywords
-            .iter()
-            .filter(|kw| {
-                let kw_lower = kw.to_lowercase();
-                content_words.iter().any(|&w| w.contains(&kw_lower))
-                    || content_lower.contains(&kw_lower)
-            })
-            .count();
-
-        matches as f32 / self.query_keywords.len() as f32
-    }
-
-    /// Compute BM25 score.
-    fn compute_bm25_score(&self, content: &str, ctx: &ScoringContext) -> f32 {
-        if self.query_keywords.is_empty() {
-            return 0.0;
-        }
-
-        let doc_len = content.split_whitespace().count() as f32;
-        let mut score = 0.0;
-
-        for term in &self.query_keywords {
-            let term_lower = term.to_lowercase();
-            let tf = content.to_lowercase().matches(&term_lower).count() as f32;
-
-            if tf == 0.0 {
-                continue;
-            }
-
-            // IDF calculation using BM25L variant
-            let df = ctx.doc_freq.get(&term_lower).copied().unwrap_or(1) as f32;
-            let idf = ((ctx.doc_count as f32 - df + 0.5) / (df + 0.5) + 1.0).ln();
-
-            // BM25 formula
-            let k1 = self.params.k1;
-            let b = self.params.b;
-            let numerator = tf * (k1 + 1.0);
-            let denominator = tf + k1 * (1.0 - b + b * doc_len / ctx.avg_doc_len);
-
-            score += idf * numerator / denominator;
-        }
-
-        // Normalize to [0, 1]
-        let max_possible_score = self.query_keywords.len() as f32 * 5.0; // Rough upper bound
-        (score / max_possible_score).clamp(0.0, 1.0)
-    }
-
-    /// Get the query keywords.
-    #[must_use]
-    pub fn keywords(&self) -> &[String] {
-        &self.query_keywords
-    }
-}
-
-/// Compute information density of content.
-fn compute_density(content: &str) -> f32 {
-    let words: Vec<&str> = content.split_whitespace().collect();
-    if words.is_empty() {
-        return 0.0;
-    }
-
-    // Use shared STOPWORDS from bm25 module
-    let stopword_count = words
-        .iter()
-        .filter(|w| STOPWORDS.contains(&w.to_lowercase().as_str()))
-        .count();
-
-    let stopword_ratio = stopword_count as f32 / words.len() as f32;
-
-    // Entity-like ratio (capitalized, numbers, special terms)
-    let entity_count = words
-        .iter()
-        .filter(|w| w.chars().any(|c| c.is_numeric() || c.is_uppercase()))
-        .count();
-
-    let entity_ratio = entity_count as f32 / words.len() as f32;
-
-    // Combined density score
-    (1.0 - stopword_ratio) * 0.7 + entity_ratio * 0.3
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-    use indextree::Arena;
-
-    fn make_test_node_id() -> NodeId {
-        let mut arena = Arena::new();
-        let node = crate::document::TreeNode {
-            title: "Test".to_string(),
-            structure: String::new(),
-            content: String::new(),
-            summary: String::new(),
-            depth: 0,
-            start_index: 0,
-            end_index: 0,
-            start_page: None,
-            end_page: None,
-            node_id: None,
-            physical_index: None,
-            token_count: None,
-            references: Vec::new(),
-        };
-        NodeId(arena.new_node(node))
-    }
-
-    #[test]
-    fn test_keyword_extraction() {
-        let keywords = extract_keywords("What is the architecture of vectorless?");
-        assert!(keywords.contains(&"architecture".to_string()));
-        assert!(keywords.contains(&"vectorless".to_string()));
-        assert!(!keywords.contains(&"what".to_string())); // stopword
-        assert!(!keywords.contains(&"the".to_string())); // stopword
-    }
-
-    #[test]
-    fn test_density_score() {
-        // High density content
-        let high_density = "Rust 1.85+ requires Cargo.toml configuration with [dependencies]";
-        let score = compute_density(high_density);
-        assert!(score > 0.5);
-
-        // Low density content (many stopwords)
-        let low_density = "This is a test of the system with some words in it";
-        let score = compute_density(low_density);
-        assert!(score < 0.7);
-    }
-
-    #[test]
-    fn test_depth_penalty() {
-        let shallow = ContentChunk::new(
-            make_test_node_id(),
-            "Test".to_string(),
-            "Content".to_string(),
-            0,
-        );
-
-        let deep = ContentChunk::new(
-            make_test_node_id(),
-            "Test".to_string(),
-            "Content".to_string(),
-            5,
-        );
-
-        let scorer = RelevanceScorer::new("test", ScoringStrategyConfig::KeywordOnly);
-        let ctx = ScoringContext::default();
-
-        let shallow_score = scorer.score_chunk(&shallow, &ctx);
-        let deep_score = scorer.score_chunk(&deep, &ctx);
-
-        assert!(shallow_score.components.depth_penalty > deep_score.components.depth_penalty);
-    }
-
-    #[test]
-    fn test_score_components_final_score() {
-        let components = ScoreComponents {
-            keyword_score: 0.8,
-            bm25_score: 0.6,
-            depth_penalty: 0.9,
-            path_bonus: 0.1,
-            density_score: 0.5,
-        };
-
-        let final_score = components.final_score();
-        assert!(final_score > 0.0 && final_score <= 1.0);
-    }
-}
diff --git a/rust/src/retrieval/context.rs b/rust/src/retrieval/context.rs
deleted file mode 100644
index 0c9ecc4b..00000000
--- a/rust/src/retrieval/context.rs
+++ /dev/null
@@ -1,636 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Context building for retrieval results.
-//!
-//! This module provides utilities for building context strings
-//! from retrieval results for LLM consumption.
-//!
-//! # Features
-//!
-//! - Multiple pruning strategies (token-based, relevance-based, diversity)
-//! - Configurable token estimation (fast or accurate)
-//! - Async support for large documents
-//!
-//! # Example
-//!
-//! ```rust,ignore
-//! // Synchronous
-//! let context = ContextBuilder::new()
-//!     .with_max_tokens(4000)
-//!     .with_pruning_strategy(PruningStrategy::Hybrid { min_relevance: 0.5 })
-//!     .build(&results);
-//!
-//! // Asynchronous (for large documents)
-//! let context = ContextBuilder::new()
-//!     .with_max_tokens(4000)
-//!     .build_async(&results).await?;
-//! ```
-
-use super::types::RetrievalResult;
-use crate::document::{DocumentTree, NodeId};
-use crate::utils::estimate_tokens;
-use std::collections::HashSet;
-
-/// Pruning strategy for context building.
-#[derive(Debug, Clone, Copy, PartialEq)]
-pub enum PruningStrategy {
-    /// Stop when token limit is reached (default).
-    TokenLimit,
-    /// Keep only results above relevance threshold.
-    RelevanceThreshold(f32),
-    /// Diversity-based: avoid redundant content.
-    Diversity {
-        /// Maximum keyword overlap ratio (0.0-1.0).
-        max_overlap: f32,
-    },
-    /// Combined: token limit with relevance filtering.
-    Hybrid {
-        /// Minimum relevance score to include.
-        min_relevance: f32,
-    },
-}
-
-impl Default for PruningStrategy {
-    fn default() -> Self {
-        Self::TokenLimit
-    }
-}
-
-/// Token estimation mode.
-#[derive(Debug, Clone, Copy, PartialEq, Default)]
-pub enum TokenEstimation {
-    /// Fast estimation: ~4 chars per token.
-    #[default]
-    Fast,
-    /// Accurate estimation using tiktoken.
-    Accurate,
-}
-
-/// Context builder for assembling retrieval results.
-#[derive(Debug)]
-pub struct ContextBuilder {
-    /// Maximum tokens for the context.
-    max_tokens: usize,
-
-    /// Whether to include titles.
-    include_titles: bool,
-
-    /// Whether to include summaries.
-    include_summaries: bool,
-
-    /// Whether to include content.
-    include_content: bool,
-
-    /// Separator between sections.
-    separator: String,
-
-    /// Pruning strategy.
-    pruning_strategy: PruningStrategy,
-
-    /// Token estimation mode.
-    token_estimation: TokenEstimation,
-
-    /// Chunk size for async processing.
-    async_chunk_size: usize,
-}
-
-impl Default for ContextBuilder {
-    fn default() -> Self {
-        Self::new()
-    }
-}
-
-impl ContextBuilder {
-    /// Create a new context builder.
-    pub fn new() -> Self {
-        Self {
-            max_tokens: 4000,
-            include_titles: true,
-            include_summaries: true,
-            include_content: true,
-            separator: "\n\n---\n\n".to_string(),
-            pruning_strategy: PruningStrategy::TokenLimit,
-            token_estimation: TokenEstimation::Fast,
-            async_chunk_size: 100,
-        }
-    }
-
-    /// Set the maximum tokens.
-    pub fn with_max_tokens(mut self, tokens: usize) -> Self {
-        self.max_tokens = tokens;
-        self
-    }
-
-    /// Set whether to include titles.
-    pub fn with_titles(mut self, include: bool) -> Self {
-        self.include_titles = include;
-        self
-    }
-
-    /// Set whether to include summaries.
-    pub fn with_summaries(mut self, include: bool) -> Self {
-        self.include_summaries = include;
-        self
-    }
-
-    /// Set whether to include content.
-    pub fn with_content(mut self, include: bool) -> Self {
-        self.include_content = include;
-        self
-    }
-
-    /// Set the separator.
-    pub fn with_separator(mut self, separator: impl Into<String>) -> Self {
-        self.separator = separator.into();
-        self
-    }
-
-    /// Set the pruning strategy.
-    pub fn with_pruning_strategy(mut self, strategy: PruningStrategy) -> Self {
-        self.pruning_strategy = strategy;
-        self
-    }
-
-    /// Set token estimation mode.
-    pub fn with_token_estimation(mut self, mode: TokenEstimation) -> Self {
-        self.token_estimation = mode;
-        self
-    }
-
-    /// Set chunk size for async processing.
-    pub fn with_async_chunk_size(mut self, size: usize) -> Self {
-        self.async_chunk_size = size;
-        self
-    }
-
-    /// Estimate tokens for a string.
-    fn estimate_tokens(&self, text: &str) -> usize {
-        match self.token_estimation {
-            TokenEstimation::Fast => text.len() / 4,
-            TokenEstimation::Accurate => estimate_tokens(text),
-        }
-    }
-
-    /// Build context from retrieval results (synchronous).
-    pub fn build(&self, results: &[RetrievalResult]) -> String {
-        match self.pruning_strategy {
-            PruningStrategy::TokenLimit => self.build_token_limit(results),
-            PruningStrategy::RelevanceThreshold(min) => self.build_relevance(results, min),
-            PruningStrategy::Diversity { max_overlap } => {
-                self.build_diversity(results, max_overlap)
-            }
-            PruningStrategy::Hybrid { min_relevance } => self.build_hybrid(results, min_relevance),
-        }
-    }
-
-    /// Build context asynchronously for large documents.
-    ///
-    /// Processes results in chunks to avoid blocking.
-    pub async fn build_async(&self, results: &[RetrievalResult]) -> String {
-        // For small result sets, just use sync
-        if results.len() < self.async_chunk_size {
-            return self.build(results);
-        }
-
-        // Process in chunks with yield points
-        let mut sections = Vec::new();
-        let mut estimated_tokens = 0;
-        let separator_tokens = self.estimate_tokens(&self.separator);
-        let mut included_keywords: HashSet<String> = HashSet::new();
-
-        for (i, chunk) in results.chunks(self.async_chunk_size).enumerate() {
-            // Yield to the runtime every chunk
-            if i > 0 {
-                tokio::task::yield_now().await;
-            }
-
-            for result in chunk {
-                // Apply pruning strategy
-                match self.pruning_strategy {
-                    PruningStrategy::RelevanceThreshold(min) => {
-                        if result.score < min {
-                            continue;
-                        }
-                    }
-                    PruningStrategy::Diversity { max_overlap } => {
-                        let keywords = self.extract_keywords(result);
-                        if self.calculate_overlap(&keywords, &included_keywords) > max_overlap {
-                            continue;
-                        }
-                        included_keywords.extend(keywords);
-                    }
-                    PruningStrategy::Hybrid { min_relevance } => {
-                        if result.score < min_relevance {
-                            continue;
-                        }
-                    }
-                    PruningStrategy::TokenLimit => {}
-                }
-
-                let section = self.format_section(result);
-                let section_tokens = self.estimate_tokens(&section);
-
-                if estimated_tokens + section_tokens + separator_tokens > self.max_tokens {
-                    break;
-                }
-
-                estimated_tokens += section_tokens + separator_tokens;
-                sections.push(section);
-            }
-
-            // Early exit if we've hit the token limit
-            if estimated_tokens >= self.max_tokens {
-                break;
-            }
-        }
-
-        sections.join(&self.separator)
-    }
-
-    /// Build with simple token limit.
-    fn build_token_limit(&self, results: &[RetrievalResult]) -> String {
-        let mut sections = Vec::new();
-        let mut estimated_tokens = 0;
-        let separator_tokens = self.estimate_tokens(&self.separator);
-
-        for result in results {
-            let section = self.format_section(result);
-            let section_tokens = self.estimate_tokens(&section);
-
-            if estimated_tokens + section_tokens + separator_tokens > self.max_tokens {
-                break;
-            }
-
-            estimated_tokens += section_tokens + separator_tokens;
-            sections.push(section);
-        }
-
-        sections.join(&self.separator)
-    }
-
-    /// Build with relevance threshold.
-    fn build_relevance(&self, results: &[RetrievalResult], min_score: f32) -> String {
-        let mut sections = Vec::new();
-        let mut estimated_tokens = 0;
-        let separator_tokens = self.estimate_tokens(&self.separator);
-
-        for result in results {
-            if result.score < min_score {
-                continue;
-            }
-
-            let section = self.format_section(result);
-            let section_tokens = self.estimate_tokens(&section);
-
-            if estimated_tokens + section_tokens + separator_tokens > self.max_tokens {
-                break;
-            }
-
-            estimated_tokens += section_tokens + separator_tokens;
-            sections.push(section);
-        }
-
-        sections.join(&self.separator)
-    }
-
-    /// Build with diversity-based pruning.
-    fn build_diversity(&self, results: &[RetrievalResult], max_overlap: f32) -> String {
-        let mut sections = Vec::new();
-        let mut estimated_tokens = 0;
-        let separator_tokens = self.estimate_tokens(&self.separator);
-        let mut included_keywords: HashSet<String> = HashSet::new();
-
-        for result in results {
-            let keywords = self.extract_keywords(result);
-
-            if self.calculate_overlap(&keywords, &included_keywords) > max_overlap {
-                continue;
-            }
-
-            let section = self.format_section(result);
-            let section_tokens = self.estimate_tokens(&section);
-
-            if estimated_tokens + section_tokens + separator_tokens > self.max_tokens {
-                break;
-            }
-
-            estimated_tokens += section_tokens + separator_tokens;
-            included_keywords.extend(keywords);
-            sections.push(section);
-        }
-
-        sections.join(&self.separator)
-    }
-
-    /// Build with hybrid strategy (relevance + token limit).
-    fn build_hybrid(&self, results: &[RetrievalResult], min_relevance: f32) -> String {
-        let mut sections = Vec::new();
-        let mut estimated_tokens = 0;
-        let separator_tokens = self.estimate_tokens(&self.separator);
-
-        for result in results {
-            if result.score < min_relevance {
-                continue;
-            }
-
-            let section = self.format_section(result);
-            let section_tokens = self.estimate_tokens(&section);
-
-            if estimated_tokens + section_tokens + separator_tokens > self.max_tokens {
-                break;
-            }
-
-            estimated_tokens += section_tokens + separator_tokens;
-            sections.push(section);
-        }
-
-        sections.join(&self.separator)
-    }
-
-    /// Extract keywords from a result for diversity checking.
-    fn extract_keywords(&self, result: &RetrievalResult) -> Vec<String> {
-        let mut words = Vec::new();
-
-        // Collect from title
-        words.extend(
-            result
-                .title
-                .to_lowercase()
-                .split_whitespace()
-                .filter(|w| w.len() > 3)
-                .map(|w| w.to_string()),
-        );
-
-        // Collect from summary
-        if let Some(summary) = &result.summary {
-            words.extend(
-                summary
-                    .to_lowercase()
-                    .split_whitespace()
-                    .filter(|w| w.len() > 3)
-                    .map(|w| w.to_string()),
-            );
-        }
-
-        // Limit keywords
-        words.truncate(20);
-        words
-    }
-
-    /// Calculate overlap between keyword sets.
-    fn calculate_overlap(&self, new_keywords: &[String], existing: &HashSet<String>) -> f32 {
-        if new_keywords.is_empty() || existing.is_empty() {
-            return 0.0;
-        }
-
-        let matches = new_keywords
-            .iter()
-            .filter(|k| existing.contains(*k))
-            .count();
-
-        matches as f32 / new_keywords.len() as f32
-    }
-
-    /// Build context from a document tree starting at a node (synchronous).
-    pub fn build_from_tree(
-        &self,
-        tree: &DocumentTree,
-        node_id: NodeId,
-        max_depth: usize,
-    ) -> String {
-        let mut sections = Vec::new();
-        self.collect_sections(tree, node_id, 0, max_depth, &mut sections);
-        sections.join(&self.separator)
-    }
-
-    /// Build context from a document tree asynchronously.
-    pub async fn build_from_tree_async(
-        &self,
-        tree: &DocumentTree,
-        node_id: NodeId,
-        max_depth: usize,
-    ) -> String {
-        let mut sections = Vec::new();
-        self.collect_sections_async(tree, node_id, 0, max_depth, &mut sections)
-            .await;
-        sections.join(&self.separator)
-    }
-
-    fn collect_sections(
-        &self,
-        tree: &DocumentTree,
-        node_id: NodeId,
-        current_depth: usize,
-        max_depth: usize,
-        sections: &mut Vec<String>,
-    ) {
-        if current_depth > max_depth {
-            return;
-        }
-
-        if let Some(node) = tree.get(node_id) {
-            let section = self.format_node_section(node, current_depth);
-            if !section.is_empty() {
-                sections.push(section);
-            }
-
-            for child_id in tree.children_iter(node_id) {
-                self.collect_sections(tree, child_id, current_depth + 1, max_depth, sections);
-            }
-        }
-    }
-
-    async fn collect_sections_async(
-        &self,
-        tree: &DocumentTree,
-        node_id: NodeId,
-        current_depth: usize,
-        max_depth: usize,
-        sections: &mut Vec<String>,
-    ) {
-        if current_depth > max_depth {
-            return;
-        }
-
-        // Yield every few levels to avoid blocking
-        if current_depth > 0 && current_depth.is_multiple_of(3) {
-            tokio::task::yield_now().await;
-        }
-
-        if let Some(node) = tree.get(node_id) {
-            let section = self.format_node_section(node, current_depth);
-            if !section.is_empty() {
-                sections.push(section);
-            }
-
-            for child_id in tree.children_iter(node_id) {
-                Box::pin(self.collect_sections_async(
-                    tree,
-                    child_id,
-                    current_depth + 1,
-                    max_depth,
-                    sections,
-                ))
-                .await;
-            }
-        }
-    }
-
-    fn format_node_section(&self, node: &crate::document::TreeNode, depth: usize) -> String {
-        let mut section = String::new();
-
-        if self.include_titles {
-            let indent = "  ".repeat(depth);
-            section.push_str(&format!("{}# {}\n", indent, node.title));
-        }
-
-        if self.include_summaries && !node.summary.is_empty() {
-            section.push_str(&format!("Summary: {}\n", node.summary));
-        }
-
-        if self.include_content && !node.content.is_empty() {
-            section.push_str(&format!("\n{}\n", node.content));
-        }
-
-        section
-    }
-
-    fn format_section(&self, result: &RetrievalResult) -> String {
-        let mut section = String::new();
-
-        if self.include_titles {
-            section.push_str(&format!("## {}\n", result.title));
-        }
-
-        if self.include_summaries {
-            if let Some(summary) = &result.summary {
-                section.push_str(&format!("Summary: {}\n", summary));
-            }
-        }
-
-        if self.include_content {
-            if let Some(content) = &result.content {
-                section.push_str(&format!("\n{}\n", content));
-            }
-        }
-
-        section
-    }
-}
-
-/// Format retrieval results for LLM consumption.
-pub fn format_for_llm(results: &[RetrievalResult], max_tokens: usize) -> String {
-    ContextBuilder::new()
-        .with_max_tokens(max_tokens)
-        .build(results)
-}
-
-/// Format retrieval results asynchronously.
-pub async fn format_for_llm_async(results: &[RetrievalResult], max_tokens: usize) -> String {
-    ContextBuilder::new()
-        .with_max_tokens(max_tokens)
-        .build_async(results)
-        .await
-}
-
-/// Format a document tree for LLM consumption.
-pub fn format_tree_for_llm(tree: &DocumentTree, max_depth: usize, max_tokens: usize) -> String {
-    ContextBuilder::new()
-        .with_max_tokens(max_tokens)
-        .build_from_tree(tree, tree.root(), max_depth)
-}
-
-/// Format a document tree asynchronously.
-pub async fn format_tree_for_llm_async(
-    tree: &DocumentTree,
-    max_depth: usize,
-    max_tokens: usize,
-) -> String {
-    ContextBuilder::new()
-        .with_max_tokens(max_tokens)
-        .build_from_tree_async(tree, tree.root(), max_depth)
-        .await
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-
-    #[test]
-    fn test_context_builder() {
-        let results = vec![
-            RetrievalResult::new("Section 1").with_content("Content 1"),
-            RetrievalResult::new("Section 2").with_content("Content 2"),
-        ];
-
-        let context = ContextBuilder::new().with_max_tokens(1000).build(&results);
-
-        assert!(context.contains("Section 1"));
-        assert!(context.contains("Content 1"));
-    }
-
-    #[test]
-    fn test_pruning_strategy_relevance() {
-        let results = vec![
-            RetrievalResult::new("High relevance").with_score(0.9),
-            RetrievalResult::new("Low relevance").with_score(0.1),
-        ];
-
-        let context = ContextBuilder::new()
-            .with_max_tokens(1000)
-            .with_pruning_strategy(PruningStrategy::RelevanceThreshold(0.5))
-            .build(&results);
-
-        assert!(context.contains("High relevance"));
-        assert!(!context.contains("Low relevance"));
-    }
-
-    #[test]
-    fn test_token_estimation_modes() {
-        let fast_builder = ContextBuilder::new().with_token_estimation(TokenEstimation::Fast);
-        let accurate_builder =
-            ContextBuilder::new().with_token_estimation(TokenEstimation::Accurate);
-
-        let fast_tokens = fast_builder.estimate_tokens("Hello world test");
-        let accurate_tokens = accurate_builder.estimate_tokens("Hello world test");
-
-        assert!(fast_tokens > 0);
-        assert!(accurate_tokens > 0);
-    }
-
-    #[test]
-    fn test_diversity_pruning() {
-        let results = vec![
-            RetrievalResult::new("Unique topic alpha").with_score(0.9),
-            RetrievalResult::new("Unique topic alpha beta").with_score(0.8), // Similar
-            RetrievalResult::new("Different gamma delta").with_score(0.7),
-        ];
-
-        let context = ContextBuilder::new()
-            .with_max_tokens(1000)
-            .with_pruning_strategy(PruningStrategy::Diversity { max_overlap: 0.3 })
-            .build(&results);
-
-        // Should include first and third, skip second (too similar to first)
-        assert!(context.contains("alpha"));
-        assert!(context.contains("gamma"));
-    }
-
-    #[tokio::test]
-    async fn test_async_build() {
-        let results: Vec<_> = (0..200)
-            .map(|i| {
-                RetrievalResult::new(&format!("Section {}", i))
-                    .with_content(&format!("Content {}", i))
-            })
-            .collect();
-
-        let context = ContextBuilder::new()
-            .with_max_tokens(10000)
-            .build_async(&results)
-            .await;
-
-        assert!(!context.is_empty());
-    }
-}
diff --git a/rust/src/retrieval/decompose.rs b/rust/src/retrieval/decompose.rs
deleted file mode 100644
index ce711f87..00000000
--- a/rust/src/retrieval/decompose.rs
+++ /dev/null
@@ -1,877 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Query decomposition for multi-turn retrieval.
-//!
-//! Complex queries are broken down into simpler sub-queries
-//! that can be processed independently and then combined.
-//!
-//! # Architecture
-//!
-//! ```text
-//! ┌─────────────────────────────────────────────────────────────────┐
-//! │                    Query Decomposition                           │
-//! ├─────────────────────────────────────────────────────────────────┤
-//! │                                                                  │
-//! │   Complex Query ──▶ [Decomposer] ──▶ [Sub-queries]              │
-//! │         │                                  │                     │
-//! │         │                                  ▼                     │
-//! │         │                          ┌───────────────┐             │
-//! │         │                          │ Sub-query 1   │             │
-//! │         │                          │ Sub-query 2   │             │
-//! │         │                          │ Sub-query 3   │             │
-//! │         │                          └───────┬───────┘             │
-//! │         │                                  │                     │
-//! │         └──────────────────────────────────┼─────────────────────┘
-//! │                                            ▼                     │
-//! │                                   [Result Aggregator]             │
-//! │                                            │                     │
-//! │                                            ▼                     │
-//! │                                      [Final Result]              │
-//! └─────────────────────────────────────────────────────────────────┘
-//! ```
-//!
-//! # Example
-//!
-//! ```rust,ignore
-//! use vectorless::retrieval::decompose::{QueryDecomposer, DecompositionConfig};
-//!
-//! let decomposer = QueryDecomposer::new(config);
-//! let result = decomposer.decompose("What is the architecture and how does caching work?").await?;
-//!
-//! for sub_query in &result.sub_queries {
-//!     println!("Sub-query: {}", sub_query.text);
-//! }
-//! ```
-
-use serde::{Deserialize, Serialize};
-use tracing::{debug, info};
-
-use crate::llm::memo::{MemoKey, MemoOpType, MemoStore, MemoValue};
-use crate::llm::{LlmClient, LlmExecutor};
-use crate::utils::fingerprint::Fingerprint;
-
-/// Sub-query resulting from decomposition.
-#[derive(Debug, Clone, Serialize, Deserialize)]
-pub struct SubQuery {
-    /// The sub-query text.
-    pub text: String,
-    /// Estimated complexity of this sub-query.
-    pub complexity: SubQueryComplexity,
-    /// Order of execution (lower = higher priority).
-    pub priority: u8,
-    /// Dependencies on other sub-queries (indices).
-    pub depends_on: Vec<usize>,
-    /// Type of sub-query.
-    pub query_type: SubQueryType,
-    /// Optional structural path constraint extracted from the query
-    /// (e.g. "3.2", "Chapter 5"). When set, the search should start
-    /// from the corresponding tree node instead of searching broadly.
-    pub path_constraint: Option<String>,
-}
-
-/// Complexity level for a sub-query.
-#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
-pub enum SubQueryComplexity {
-    /// Simple keyword lookup.
-    Simple,
-    /// Requires understanding context.
-    Medium,
-    /// Requires synthesis or reasoning.
-    Complex,
-}
-
-impl Default for SubQueryComplexity {
-    fn default() -> Self {
-        Self::Simple
-    }
-}
-
-/// Type of sub-query.
-#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
-pub enum SubQueryType {
-    /// Fact lookup (who, what, when).
-    Fact,
-    /// Explanation (why, how).
-    Explanation,
-    /// Comparison (difference between).
-    Comparison,
-    /// Synthesis (summarize, combine).
-    Synthesis,
-    /// Navigation (where to find).
-    Navigation,
-}
-
-impl Default for SubQueryType {
-    fn default() -> Self {
-        Self::Fact
-    }
-}
-
-/// Result of query decomposition.
-#[derive(Debug, Clone)]
-pub struct DecompositionResult {
-    /// Original query.
-    pub original: String,
-    /// Decomposed sub-queries.
-    pub sub_queries: Vec<SubQuery>,
-    /// Whether decomposition was needed.
-    pub was_decomposed: bool,
-    /// Reason for decomposition decision.
-    pub reason: String,
-    /// Estimated total complexity.
-    pub total_complexity: f32,
-}
-
-impl DecompositionResult {
-    /// Create a result without decomposition (query is simple enough).
-    pub fn no_decomposition(query: &str, reason: &str) -> Self {
-        Self {
-            original: query.to_string(),
-            sub_queries: vec![SubQuery {
-                text: query.to_string(),
-                complexity: SubQueryComplexity::Simple,
-                priority: 0,
-                depends_on: vec![],
-                query_type: SubQueryType::Fact,
-                path_constraint: None,
-            }],
-            was_decomposed: false,
-            reason: reason.to_string(),
-            total_complexity: 0.5,
-        }
-    }
-
-    /// Check if decomposition produced multiple queries.
-    pub fn is_multi_turn(&self) -> bool {
-        self.sub_queries.len() > 1
-    }
-
-    /// Get execution order (topologically sorted).
-    pub fn execution_order(&self) -> Vec<usize> {
-        if self.sub_queries.len() <= 1 {
-            return vec![0];
-        }
-
-        // Simple topological sort based on dependencies and priority
-        let mut order: Vec<usize> = (0..self.sub_queries.len()).collect();
-        order.sort_by(|&a, &b| {
-            // First sort by dependencies (fewer dependencies first)
-            let a_deps = self.sub_queries[a].depends_on.len();
-            let b_deps = self.sub_queries[b].depends_on.len();
-            if a_deps != b_deps {
-                return a_deps.cmp(&b_deps);
-            }
-            // Then by priority (lower priority value first)
-            self.sub_queries[a]
-                .priority
-                .cmp(&self.sub_queries[b].priority)
-        });
-        order
-    }
-}
-
-/// Configuration for query decomposition.
-#[derive(Debug, Clone)]
-pub struct DecompositionConfig {
-    /// Maximum sub-queries to generate.
-    pub max_sub_queries: usize,
-    /// Minimum query length to consider for decomposition.
-    pub min_query_length: usize,
-    /// Enable LLM-based decomposition.
-    pub use_llm: bool,
-    /// Threshold for decomposing (complexity score).
-    pub complexity_threshold: f32,
-    /// Enable dependency detection.
-    pub detect_dependencies: bool,
-}
-
-impl Default for DecompositionConfig {
-    fn default() -> Self {
-        Self {
-            max_sub_queries: 5,
-            min_query_length: 20,
-            use_llm: true,
-            complexity_threshold: 0.7,
-            detect_dependencies: true,
-        }
-    }
-}
-
-/// Query decomposer for multi-turn retrieval.
-pub struct QueryDecomposer {
-    /// Configuration.
-    config: DecompositionConfig,
-    /// LLM client for decomposition (optional).
-    llm_client: Option<LlmClient>,
-    /// LLM executor for unified execution (optional).
-    llm_executor: Option<LlmExecutor>,
-    /// Memo store for caching decomposition results.
-    memo_store: Option<MemoStore>,
-}
-
-impl Default for QueryDecomposer {
-    fn default() -> Self {
-        Self::new(DecompositionConfig::default())
-    }
-}
-
-impl QueryDecomposer {
-    /// Create a new query decomposer.
-    pub fn new(config: DecompositionConfig) -> Self {
-        Self {
-            config,
-            llm_client: None,
-            llm_executor: None,
-            memo_store: None,
-        }
-    }
-
-    /// Add LLM client for enhanced decomposition.
-    pub fn with_llm_client(mut self, client: LlmClient) -> Self {
-        self.llm_client = Some(client);
-        self
-    }
-
-    /// Add LLM executor for unified throttle/retry/fallback.
-    pub fn with_llm_executor(mut self, executor: LlmExecutor) -> Self {
-        self.llm_executor = Some(executor);
-        self
-    }
-
-    /// Add memo store for caching decomposition results.
-    pub fn with_memo_store(mut self, store: MemoStore) -> Self {
-        self.memo_store = Some(store);
-        self
-    }
-
-    /// Decompose a query into sub-queries.
-    pub async fn decompose(&self, query: &str) -> crate::error::Result<DecompositionResult> {
-        // Check if decomposition is needed
-        if !self.should_decompose(query) {
-            return Ok(DecompositionResult::no_decomposition(
-                query,
-                "Query is simple enough, no decomposition needed",
-            ));
-        }
-
-        // Check memo cache
-        if let Some(ref store) = self.memo_store {
-            let cache_key = Self::build_cache_key(query);
-            if let Some(cached) = store.get(&cache_key) {
-                if let Some(result) = Self::deserialize_decomposition(&cached) {
-                    tracing::debug!("Memo cache hit for query decomposition");
-                    return Ok(result);
-                }
-            }
-        }
-
-        info!("Decomposing complex query: '{}'", query);
-
-        // Try LLM-based decomposition if available
-        let result =
-            if self.config.use_llm && (self.llm_client.is_some() || self.llm_executor.is_some()) {
-                match self.llm_decompose(query).await {
-                    Ok(result) => result,
-                    Err(e) => {
-                        debug!(
-                            "LLM decomposition failed, falling back to rule-based: {}",
-                            e
-                        );
-                        self.rule_based_decompose(query)?
-                    }
-                }
-            } else {
-                self.rule_based_decompose(query)?
-            };
-
-        // Cache the result
-        if let Some(ref store) = self.memo_store {
-            let cache_key = Self::build_cache_key(query);
-            if let Ok(json) = serde_json::to_value(&CachedDecomposition::from_result(&result)) {
-                store.put_with_tokens(cache_key, MemoValue::Json(json), (query.len() / 4) as u64);
-            }
-        }
-
-        Ok(result)
-    }
-
-    /// Build a cache key for query decomposition.
-    fn build_cache_key(query: &str) -> MemoKey {
-        let fp = Fingerprint::from_str(query);
-        MemoKey {
-            op_type: MemoOpType::QueryDecomposition,
-            input_fp: fp,
-            model_id: None,
-            version: 1,
-            context_fp: Fingerprint::zero(),
-        }
-    }
-
-    /// Deserialize a DecompositionResult from a MemoValue.
-    fn deserialize_decomposition(value: &MemoValue) -> Option<DecompositionResult> {
-        match value {
-            MemoValue::Json(json) => {
-                let cached: CachedDecomposition = serde_json::from_value(json.clone()).ok()?;
-                Some(cached.into_result())
-            }
-            _ => None,
-        }
-    }
-
-    /// Check if a query should be decomposed.
-    fn should_decompose(&self, query: &str) -> bool {
-        // Skip short queries
-        if query.len() < self.config.min_query_length {
-            return false;
-        }
-
-        // Calculate complexity score
-        let complexity = self.calculate_complexity(query);
-        complexity >= self.config.complexity_threshold
-    }
-
-    /// Calculate complexity score for a query.
-    fn calculate_complexity(&self, query: &str) -> f32 {
-        let mut score = 0.0;
-        let query_lower = query.to_lowercase();
-
-        // 1. Multiple questions (question marks or "and" between questions)
-        let question_count = query.matches('?').count();
-        score += (question_count as f32 * 0.3).min(1.0);
-
-        // 2. Multiple clauses (indicated by conjunctions)
-        let conjunctions = [" and ", " or ", " but ", " also ", " plus "];
-        let conjunction_count = conjunctions
-            .iter()
-            .filter(|c| query_lower.contains(*c))
-            .count();
-        score += (conjunction_count as f32 * 0.2).min(0.6);
-
-        // 3. Complex question words
-        let complex_indicators = [
-            "compare",
-            "contrast",
-            "difference between",
-            "relationship between",
-            "how does",
-            "why does",
-            "explain how",
-            "analyze",
-            "evaluate",
-            "synthesize",
-        ];
-        for indicator in &complex_indicators {
-            if query_lower.contains(indicator) {
-                score += 0.2;
-            }
-        }
-
-        // 4. Length factor
-        let word_count = query.split_whitespace().count();
-        if word_count > 15 {
-            score += 0.1 * ((word_count - 15) as f32 / 10.0).min(1.0);
-        }
-
-        score.min(1.0)
-    }
-
-    /// Rule-based decomposition (no LLM).
-    fn rule_based_decompose(&self, query: &str) -> crate::error::Result<DecompositionResult> {
-        let mut sub_queries = Vec::new();
-        let query_lower = query.to_lowercase();
-
-        // Split on common patterns
-        let patterns = [
-            (" and ", " and "),
-            ("? ", "? "),
-            (" also ", " also "),
-            (" as well as ", " as well as "),
-        ];
-
-        // Check for question splits
-        if query.contains('?') {
-            let parts: Vec<&str> = query.split('?').filter(|s| !s.trim().is_empty()).collect();
-            for (i, part) in parts.iter().enumerate() {
-                let text = format!("{}?", part.trim());
-                sub_queries.push(SubQuery {
-                    text,
-                    complexity: self.estimate_sub_query_complexity(part),
-                    priority: i as u8,
-                    depends_on: vec![],
-                    query_type: self.detect_query_type(part),
-                    path_constraint: None,
-                });
-            }
-        }
-
-        // If no questions found, try conjunction split
-        if sub_queries.is_empty() {
-            for (pattern, _) in &patterns {
-                if query_lower.contains(pattern) {
-                    let parts: Vec<&str> = query
-                        .split(pattern)
-                        .filter(|s| !s.trim().is_empty())
-                        .collect();
-                    if parts.len() > 1 {
-                        for (i, part) in parts.iter().enumerate() {
-                            sub_queries.push(SubQuery {
-                                text: part.trim().to_string(),
-                                complexity: self.estimate_sub_query_complexity(part),
-                                priority: i as u8,
-                                depends_on: if i > 0 && self.config.detect_dependencies {
-                                    vec![i - 1]
-                                } else {
-                                    vec![]
-                                },
-                                query_type: self.detect_query_type(part),
-                                path_constraint: None,
-                            });
-                        }
-                        break;
-                    }
-                }
-            }
-        }
-
-        // If still no decomposition, return original
-        if sub_queries.is_empty() || sub_queries.len() > self.config.max_sub_queries {
-            return Ok(DecompositionResult::no_decomposition(
-                query,
-                "No clear decomposition patterns found",
-            ));
-        }
-
-        Ok(DecompositionResult {
-            original: query.to_string(),
-            sub_queries,
-            was_decomposed: true,
-            reason: "Rule-based decomposition".to_string(),
-            total_complexity: self.calculate_complexity(query),
-        })
-    }
-
-    /// LLM-based decomposition.
-    async fn llm_decompose(&self, query: &str) -> crate::error::Result<DecompositionResult> {
-        let system = r#"You are a query decomposition expert. Break down complex queries into simpler sub-queries.
-
-Rules:
-1. Each sub-query should be answerable independently when possible
-2. Preserve the original intent
-3. Maximum 5 sub-queries
-4. Return JSON format: {"sub_queries": [{"text": "...", "complexity": "simple|medium|complex", "priority": 0-4, "depends_on": [], "query_type": "fact|explanation|comparison|synthesis|navigation"}], "reason": "..."}
-
-If the query is simple enough, return just one sub-query."#;
-
-        let user = format!("Decompose this query: {}", query);
-
-        let response = if let Some(ref executor) = self.llm_executor {
-            executor
-                .complete(system, &user)
-                .await
-                .map_err(|e| crate::error::Error::Llm(format!("LLM executor error: {}", e)))?
-        } else if let Some(ref client) = self.llm_client {
-            client
-                .complete(system, &user)
-                .await
-                .map_err(|e| crate::error::Error::Llm(format!("LLM client error: {}", e)))?
-        } else {
-            return Err(crate::error::Error::Config(
-                "No LLM client or executor configured".to_string(),
-            ));
-        };
-
-        // Parse the JSON response
-        #[derive(Deserialize)]
-        struct DecompositionResponse {
-            sub_queries: Vec<SubQuery>,
-            reason: String,
-        }
-
-        let parsed: DecompositionResponse = serde_json::from_str(&extract_json(&response))
-            .map_err(|e| {
-                crate::error::Error::Llm(format!("Failed to parse decomposition: {}", e))
-            })?;
-
-        if parsed.sub_queries.is_empty() {
-            return Ok(DecompositionResult::no_decomposition(
-                query,
-                "LLM returned empty decomposition",
-            ));
-        }
-
-        let sub_queries: Vec<SubQuery> = parsed
-            .sub_queries
-            .into_iter()
-            .take(self.config.max_sub_queries)
-            .collect();
-
-        Ok(DecompositionResult {
-            original: query.to_string(),
-            sub_queries,
-            was_decomposed: true,
-            reason: parsed.reason,
-            total_complexity: self.calculate_complexity(query),
-        })
-    }
-
-    /// Estimate complexity for a sub-query.
-    fn estimate_sub_query_complexity(&self, text: &str) -> SubQueryComplexity {
-        let text_lower = text.to_lowercase();
-
-        // Check for complex indicators
-        if text_lower.contains("compare")
-            || text_lower.contains("contrast")
-            || text_lower.contains("analyze")
-            || text_lower.contains("evaluate")
-            || text_lower.contains("synthesize")
-        {
-            return SubQueryComplexity::Complex;
-        }
-
-        // Check for medium complexity
-        if text_lower.contains("how")
-            || text_lower.contains("why")
-            || text_lower.contains("explain")
-            || text_lower.contains("describe")
-        {
-            return SubQueryComplexity::Medium;
-        }
-
-        SubQueryComplexity::Simple
-    }
-
-    /// Detect the type of a sub-query.
-    fn detect_query_type(&self, text: &str) -> SubQueryType {
-        let text_lower = text.to_lowercase();
-
-        if text_lower.contains("compare")
-            || text_lower.contains("difference")
-            || text_lower.contains("versus")
-            || text_lower.contains(" vs ")
-        {
-            return SubQueryType::Comparison;
-        }
-
-        if text_lower.contains("why")
-            || text_lower.contains("how")
-            || text_lower.contains("explain")
-        {
-            return SubQueryType::Explanation;
-        }
-
-        if text_lower.contains("summarize")
-            || text_lower.contains("combine")
-            || text_lower.contains("synthesize")
-            || text_lower.contains("overall")
-        {
-            return SubQueryType::Synthesis;
-        }
-
-        if text_lower.contains("where")
-            || text_lower.contains("which section")
-            || text_lower.contains("find")
-        {
-            return SubQueryType::Navigation;
-        }
-
-        SubQueryType::Fact
-    }
-}
-
-/// Extract JSON from a potentially verbose LLM response.
-fn extract_json(text: &str) -> String {
-    // Try to find JSON object
-    if let Some(start) = text.find('{') {
-        if let Some(end) = text.rfind('}') {
-            if end > start {
-                return text[start..=end].to_string();
-            }
-        }
-    }
-    text.to_string()
-}
-
-/// Serializable decomposition result for caching.
-///
-/// Only caches the essential fields needed to reconstruct a DecompositionResult.
-#[derive(Debug, Clone, Serialize, Deserialize)]
-struct CachedSubQuery {
-    text: String,
-    priority: u8,
-    query_type: String,
-}
-
-#[derive(Debug, Clone, Serialize, Deserialize)]
-struct CachedDecomposition {
-    original: String,
-    sub_queries: Vec<CachedSubQuery>,
-    was_decomposed: bool,
-    reason: String,
-}
-
-impl CachedDecomposition {
-    fn from_result(result: &DecompositionResult) -> Self {
-        Self {
-            original: result.original.clone(),
-            sub_queries: result
-                .sub_queries
-                .iter()
-                .map(|sq| CachedSubQuery {
-                    text: sq.text.clone(),
-                    priority: sq.priority,
-                    query_type: format!("{:?}", sq.query_type),
-                })
-                .collect(),
-            was_decomposed: result.was_decomposed,
-            reason: result.reason.clone(),
-        }
-    }
-
-    fn into_result(self) -> DecompositionResult {
-        let sub_queries: Vec<SubQuery> = self
-            .sub_queries
-            .into_iter()
-            .map(|csq| SubQuery {
-                text: csq.text,
-                priority: csq.priority,
-                query_type: match csq.query_type.as_str() {
-                    "Fact" => SubQueryType::Fact,
-                    "Explanation" => SubQueryType::Explanation,
-                    "Comparison" => SubQueryType::Comparison,
-                    "Synthesis" => SubQueryType::Synthesis,
-                    "Navigation" => SubQueryType::Navigation,
-                    _ => SubQueryType::Fact,
-                },
-                complexity: SubQueryComplexity::Simple,
-                depends_on: vec![],
-                path_constraint: None,
-            })
-            .collect();
-        DecompositionResult {
-            original: self.original,
-            sub_queries,
-            was_decomposed: self.was_decomposed,
-            reason: self.reason,
-            total_complexity: 0.5,
-        }
-    }
-}
-
-/// Result aggregator for multi-turn retrieval.
-#[derive(Debug, Clone)]
-pub struct SubQueryResult {
-    /// The sub-query.
-    pub query: SubQuery,
-    /// Retrieved content.
-    pub content: String,
-    /// Relevance score.
-    pub score: f32,
-    /// Nodes that contributed to the result.
-    pub source_nodes: Vec<String>,
-}
-
-/// Aggregator for combining sub-query results.
-pub struct ResultAggregator {
-    /// Maximum tokens in final result.
-    pub max_tokens: usize,
-    /// Weight by query priority.
-    pub priority_weight: f32,
-}
-
-impl Default for ResultAggregator {
-    fn default() -> Self {
-        Self {
-            max_tokens: 4000,
-            priority_weight: 0.3,
-        }
-    }
-}
-
-impl ResultAggregator {
-    /// Create a new result aggregator.
-    pub fn new() -> Self {
-        Self::default()
-    }
-
-    /// Aggregate results from multiple sub-queries.
-    pub fn aggregate(
-        &self,
-        results: &[SubQueryResult],
-        decomposition: &DecompositionResult,
-    ) -> String {
-        if results.is_empty() {
-            return String::new();
-        }
-
-        if results.len() == 1 {
-            return results[0].content.clone();
-        }
-
-        // Sort by execution order and priority
-        let order = decomposition.execution_order();
-        let sorted_results: Vec<_> = order
-            .iter()
-            .filter_map(|&i| {
-                results
-                    .iter()
-                    .find(|r| r.query.text == decomposition.sub_queries[i].text)
-            })
-            .collect();
-
-        // Combine results with section headers
-        let mut combined = String::new();
-        let mut total_tokens = 0;
-
-        for result in sorted_results {
-            let section = format!("\n### {}\n\n{}\n", result.query.text, result.content);
-
-            let section_tokens = section.len() / 4; // Rough estimate
-            if total_tokens + section_tokens > self.max_tokens {
-                // Truncate if needed
-                let remaining = self.max_tokens - total_tokens;
-                if remaining > 100 {
-                    let end_pos = (remaining * 4).min(result.content.len());
-                    combined.push_str(&format!(
-                        "\n### {}\n\n{}\n",
-                        result.query.text,
-                        &result.content[..end_pos]
-                    ));
-                }
-                break;
-            }
-
-            combined.push_str(&section);
-            total_tokens += section_tokens;
-        }
-
-        combined
-    }
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-
-    #[test]
-    fn test_complexity_calculation() {
-        let decomposer = QueryDecomposer::default();
-
-        // Simple query
-        let simple = "What is the architecture?";
-        let simple_score = decomposer.calculate_complexity(simple);
-        assert!(simple_score < 0.5);
-
-        // Complex query
-        let complex = "What is the architecture and how does it compare to other systems?";
-        let complex_score = decomposer.calculate_complexity(complex);
-        assert!(complex_score > simple_score);
-    }
-
-    #[test]
-    fn test_rule_based_decomposition() {
-        let decomposer = QueryDecomposer::default();
-
-        let result = decomposer
-            .rule_based_decompose("What is the architecture? How does caching work?")
-            .unwrap();
-
-        assert!(result.was_decomposed);
-        assert_eq!(result.sub_queries.len(), 2);
-    }
-
-    #[test]
-    fn test_no_decomposition() {
-        let result = DecompositionResult::no_decomposition("What is this?", "Query is simple");
-
-        assert!(!result.was_decomposed);
-        assert!(!result.is_multi_turn());
-    }
-
-    #[test]
-    fn test_execution_order() {
-        let mut result = DecompositionResult::no_decomposition("test", "test");
-        result.sub_queries = vec![
-            SubQuery {
-                text: "First".to_string(),
-                priority: 2,
-                depends_on: vec![],
-                query_type: SubQueryType::Fact,
-                complexity: SubQueryComplexity::Simple,
-                path_constraint: None,
-            },
-            SubQuery {
-                text: "Second".to_string(),
-                priority: 1,
-                depends_on: vec![0],
-                query_type: SubQueryType::Fact,
-                complexity: SubQueryComplexity::Simple,
-                path_constraint: None,
-            },
-        ];
-        result.was_decomposed = true;
-
-        let order = result.execution_order();
-        assert_eq!(order, vec![0, 1]); // First should come before Second
-    }
-
-    #[test]
-    fn test_query_type_detection() {
-        let decomposer = QueryDecomposer::default();
-
-        assert_eq!(
-            decomposer.detect_query_type("Compare A and B"),
-            SubQueryType::Comparison
-        );
-        assert_eq!(
-            decomposer.detect_query_type("Why does this happen?"),
-            SubQueryType::Explanation
-        );
-        assert_eq!(
-            decomposer.detect_query_type("Where is the config?"),
-            SubQueryType::Navigation
-        );
-    }
-
-    #[test]
-    fn test_result_aggregator() {
-        let aggregator = ResultAggregator::new();
-
-        let results = vec![
-            SubQueryResult {
-                query: SubQuery {
-                    text: "First question?".to_string(),
-                    priority: 0,
-                    depends_on: vec![],
-                    query_type: SubQueryType::Fact,
-                    complexity: SubQueryComplexity::Simple,
-                    path_constraint: None,
-                },
-                content: "Answer 1".to_string(),
-                score: 0.9,
-                source_nodes: vec![],
-            },
-            SubQueryResult {
-                query: SubQuery {
-                    text: "Second question?".to_string(),
-                    priority: 1,
-                    depends_on: vec![0],
-                    query_type: SubQueryType::Fact,
-                    complexity: SubQueryComplexity::Simple,
-                    path_constraint: None,
-                },
-                content: "Answer 2".to_string(),
-                score: 0.8,
-                source_nodes: vec![],
-            },
-        ];
-
-        let mut decomposition = DecompositionResult::no_decomposition("test", "test");
-        decomposition.sub_queries = results.iter().map(|r| r.query.clone()).collect();
-        decomposition.was_decomposed = true;
-
-        let combined = aggregator.aggregate(&results, &decomposition);
-        assert!(combined.contains("First question"));
-        assert!(combined.contains("Answer 1"));
-    }
-}
diff --git a/rust/src/retrieval/dispatcher.rs b/rust/src/retrieval/dispatcher.rs
new file mode 100644
index 00000000..0ea3a2ea
--- /dev/null
+++ b/rust/src/retrieval/dispatcher.rs
@@ -0,0 +1,83 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! Retrieval dispatcher — the single entry point for all query operations.
+//!
+//! All queries go through the Orchestrator. There is no separate Worker path.
+//! The Orchestrator internally decides whether to run the full analysis phase
+//! based on user intent:
+//!
+//! - **User specified doc_ids** → Orchestrator skips analysis, spawns N Workers
+//!   directly (N=1 is a normal case, not special).
+//! - **User unspecified (workspace)** → Orchestrator analyzes DocCards, selects
+//!   relevant docs, then spawns Workers.
+//!
+//! Post-processing (synthesis, dedup, rerank) is always unified through the
+//! Orchestrator's output — never duplicated in Worker.
+
+use tracing::info;
+
+use crate::agent::config::{AgentConfig, Scope, WorkspaceContext};
+use crate::agent::orchestrator::Orchestrator;
+use crate::agent::{Agent, EventEmitter, Output};
+use crate::error::{Error, Result};
+use crate::llm::LlmClient;
+use crate::query::QueryPipeline;
+
+/// Dispatch a query to the Orchestrator.
+///
+/// This is the single entry point from the client layer into the retrieval system.
+/// It always goes through the Orchestrator — never directly to Worker.
+///
+/// Flow:
+/// 1. Query understanding via LLM (produces [`QueryPlan`])
+/// 2. Orchestrator dispatch (uses QueryPlan for strategy)
+///
+/// - `Scope::Specified(docs)` → Orchestrator skips analysis, dispatches all docs directly.
+/// - `Scope::Workspace(ws)` → Orchestrator runs full flow (analyze → dispatch → fuse → synthesize).
+pub async fn dispatch(
+    query: &str,
+    scope: Scope<'_>,
+    config: &AgentConfig,
+    llm: &LlmClient,
+    emitter: &EventEmitter,
+) -> Result<Output> {
+    let (ws, skip_analysis) = match scope {
+        Scope::Specified(docs) => {
+            info!(
+                docs = docs.len(),
+                "Dispatch (user-specified, skip analysis)"
+            );
+            (WorkspaceContext::new(docs), true)
+        }
+        Scope::Workspace(ws) => {
+            info!(docs = ws.doc_count(), "Dispatch (workspace, full flow)");
+            (ws, false)
+        }
+    };
+
+    // Step 1: Query understanding — LLM analyzes intent, concepts, complexity.
+    // This is required. "Model fails, we fail." — errors propagate.
+    let query_plan = QueryPipeline::understand(query, llm).await?;
+    info!(
+        intent = %query_plan.intent,
+        complexity = %query_plan.complexity,
+        concepts = query_plan.key_concepts.len(),
+        "Query understanding complete"
+    );
+
+    // Step 2: Dispatch to Orchestrator with the query plan.
+    let orchestrator = Orchestrator::new(
+        query,
+        &ws,
+        config.clone(),
+        llm.clone(),
+        emitter.clone(),
+        skip_analysis,
+        query_plan,
+    );
+    orchestrator
+        .run()
+        .await
+        .map_err(|e| Error::Retrieval(e.to_string()))
+}
diff --git a/rust/src/retrieval/mod.rs b/rust/src/retrieval/mod.rs
index 3a8865fe..e2454d4f 100644
--- a/rust/src/retrieval/mod.rs
+++ b/rust/src/retrieval/mod.rs
@@ -1,78 +1,28 @@
 // Copyright (c) 2026 vectorless developers
 // SPDX-License-Identifier: Apache-2.0
 
-//! Retrieval system for Vectorless document trees.
+//! Retrieval dispatch layer — the entry point for all query operations.
 //!
-//! This module implements a hybrid retrieval architecture combining:
-//! - **Adaptive Strategy Selection**: Automatically chooses between keyword, semantic, and LLM strategies
-//! - **Multi-Path Search**: Beam search and MCTS for exploring multiple tree paths
-//! - **Incremental Retrieval**: Stops early when sufficient information is collected
+//! This module sits between the client API and the agent execution layer.
+//! It is responsible for:
 //!
-//! # Architecture
+//! - **Dispatching** queries to the appropriate agent path (Worker vs Orchestrator)
+//! - **Post-processing** agent output into client-facing results
+//! - **Caching** query results (L1 exact, L2 path patterns, L3 strategy scores)
+//! - **Streaming** retrieval events for async progress monitoring
 //!
+//! Call flow:
 //! ```text
-//! ┌─────────────────────────────────────────────────────────────────┐
-//! │                    RetrievalOrchestrator                         │
-//! │                                                                  │
-//! │  ┌─────────┐    ┌─────────┐    ┌─────────┐    ┌─────────┐      │
-//! │  │ Analyze │───►│  Plan   │───►│ Search  │───►│  Evaluate  │      │
-//! │  └─────────┘    └─────────┘    └─────────┘    └─────────┘      │
-//! │                                     ▲              │             │
-//! │                                     └──────────────┘             │
-//! │                                    (NeedMoreData)               │
-//! └─────────────────────────────────────────────────────────────────┘
-//! ```
-//!
-//! # Pipeline Stages
-//!
-//! | Stage | Description |
-//! |-------|-------------|
-//! | [`AnalyzeStage`] | Query analysis (complexity, keywords, targets) |
-//! | [`PlanStage`] | Strategy and algorithm selection |
-//! | [`SearchStage`] | Execute tree search |
-//! | [`EvaluateStage`] | Sufficiency checking |
-//!
-//! # Quick Start
-//!
-//! ```rust,ignore
-//! use vectorless::retrieval::pipeline::{RetrievalOrchestrator, RetrievalStage};
-//! use vectorless::retrieval::stages::{AnalyzeStage, PlanStage, SearchStage, EvaluateStage};
-//!
-//! let orchestrator = RetrievalOrchestrator::new()
-//!     .stage(AnalyzeStage::new())
-//!     .stage(PlanStage::new())
-//!     .stage(SearchStage::new())
-//!     .stage(EvaluateStage::new());
-//!
-//! let response = orchestrator.execute(tree, query, options).await?;
+//! client → retrieval::dispatch()
+//!   ├── User specified doc_ids → parallel N × Worker
+//!   └── Workspace scope → Orchestrator (analyze → spawn → fusion)
 //! ```
 
-mod context;
-mod decompose;
-mod pipeline_retriever;
-mod reference;
-mod retriever;
+mod cache;
+pub mod dispatcher;
+pub mod postprocessor;
 pub mod stream;
 mod types;
 
-pub mod cache;
-pub mod complexity;
-pub mod content;
-pub mod pilot;
-pub mod pipeline;
-pub mod scoring;
-pub mod search;
-pub mod stages;
-pub mod strategy;
-pub mod sufficiency;
-
-pub use context::{PruningStrategy, TokenEstimation};
-pub use pipeline_retriever::PipelineRetriever;
-pub use retriever::RetrievalContext;
-pub use types::*;
-
-// Sufficiency exports
-pub use sufficiency::SufficiencyLevel;
-
-// Streaming exports
 pub use stream::RetrieveEventReceiver;
+pub use types::*;
diff --git a/rust/src/retrieval/pilot/budget.rs b/rust/src/retrieval/pilot/budget.rs
deleted file mode 100644
index 4776d931..00000000
--- a/rust/src/retrieval/pilot/budget.rs
+++ /dev/null
@@ -1,358 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Budget controller for Pilot LLM calls.
-//!
-//! Tracks token consumption and call counts to enforce budget limits
-//! and control costs during retrieval.
-
-use std::collections::HashMap;
-use std::sync::RwLock;
-use std::sync::atomic::{AtomicUsize, Ordering};
-
-use super::config::BudgetConfig;
-
-/// Budget usage statistics.
-#[derive(Debug, Clone, Default)]
-pub struct BudgetUsage {
-    /// Total input tokens used.
-    pub input_tokens: usize,
-    /// Total output tokens used.
-    pub output_tokens: usize,
-    /// Total LLM calls made.
-    pub calls_made: usize,
-    /// Maximum tokens allowed.
-    pub max_tokens: usize,
-    /// Maximum calls allowed.
-    pub max_calls: usize,
-}
-
-impl BudgetUsage {
-    /// Get total tokens used (input + output).
-    pub fn total_tokens(&self) -> usize {
-        self.input_tokens + self.output_tokens
-    }
-
-    /// Get token utilization (0.0 - 1.0).
-    pub fn token_utilization(&self) -> f32 {
-        if self.max_tokens == 0 {
-            0.0
-        } else {
-            (self.total_tokens() as f32 / self.max_tokens as f32).min(1.0)
-        }
-    }
-
-    /// Get call utilization (0.0 - 1.0).
-    pub fn call_utilization(&self) -> f32 {
-        if self.max_calls == 0 {
-            0.0
-        } else {
-            (self.calls_made as f32 / self.max_calls as f32).min(1.0)
-        }
-    }
-
-    /// Check if budget is exhausted.
-    pub fn is_exhausted(&self) -> bool {
-        self.total_tokens() >= self.max_tokens || self.calls_made >= self.max_calls
-    }
-}
-
-/// Controller for Pilot budget management.
-///
-/// Tracks token usage and call counts per query, enforcing limits
-/// to control costs. Thread-safe for concurrent access.
-///
-/// # Example
-///
-/// ```rust,ignore
-/// use vectorless::retrieval::pilot::{BudgetController, BudgetConfig};
-///
-/// let config = BudgetConfig::default();
-/// let budget = BudgetController::new(config);
-///
-/// // Check if we can make a call
-/// if budget.can_call() {
-///     // Estimate cost first
-///     let estimated = budget.estimate_cost(context);
-///     if budget.can_afford(estimated) {
-///         // Make the call...
-///         budget.record_usage(150, 50, 0);
-///     }
-/// }
-/// ```
-pub struct BudgetController {
-    config: BudgetConfig,
-    /// Total input tokens used.
-    input_tokens: AtomicUsize,
-    /// Total output tokens used.
-    output_tokens: AtomicUsize,
-    /// Total calls made.
-    calls_made: AtomicUsize,
-    /// Calls per level (for level-based limits).
-    level_calls: RwLock<HashMap<usize, usize>>,
-}
-
-impl BudgetController {
-    /// Create a new budget controller with the given config.
-    pub fn new(config: BudgetConfig) -> Self {
-        Self {
-            config,
-            input_tokens: AtomicUsize::new(0),
-            output_tokens: AtomicUsize::new(0),
-            calls_made: AtomicUsize::new(0),
-            level_calls: RwLock::new(HashMap::new()),
-        }
-    }
-
-    /// Create with default configuration.
-    pub fn with_defaults() -> Self {
-        Self::new(BudgetConfig::default())
-    }
-
-    /// Check if a new LLM call is allowed.
-    ///
-    /// Returns `true` if:
-    /// - Token budget not exhausted
-    /// - Call count not exceeded
-    pub fn can_call(&self) -> bool {
-        let tokens = self.total_tokens();
-        let calls = self.calls_made.load(Ordering::Relaxed);
-
-        tokens < self.config.max_tokens_per_query && calls < self.config.max_calls_per_query
-    }
-
-    /// Check if a call is allowed at a specific tree level.
-    pub fn can_call_at_level(&self, level: usize) -> bool {
-        if !self.can_call() {
-            return false;
-        }
-
-        let level_calls = self.level_calls.read().unwrap();
-        let calls = level_calls.get(&level).copied().unwrap_or(0);
-        calls < self.config.max_calls_per_level
-    }
-
-    /// Estimate token cost for a context string.
-    ///
-    /// Uses a simple heuristic:
-    /// - 1 token ≈ 4 chars (English)
-    /// - 1 token ≈ 1.5 chars (Chinese)
-    /// - Plus output reserve (100 tokens)
-    pub fn estimate_cost(&self, context: &str) -> usize {
-        let char_count = context.chars().count();
-
-        // Count Chinese characters
-        let chinese_count = context
-            .chars()
-            .filter(|c| ('\u{4E00}'..='\u{9FFF}').contains(c))
-            .count();
-
-        let english_count = char_count - chinese_count;
-
-        // Estimate tokens
-        let input_tokens =
-            (chinese_count as f32 / 1.5 + english_count as f32 / 4.0).ceil() as usize;
-
-        // Add output reserve
-        input_tokens + 100
-    }
-
-    /// Check if we can afford an estimated cost.
-    pub fn can_afford(&self, estimated_cost: usize) -> bool {
-        let remaining = self.remaining_tokens();
-
-        estimated_cost <= remaining && estimated_cost <= self.config.max_tokens_per_call
-    }
-
-    /// Get remaining token budget.
-    pub fn remaining_tokens(&self) -> usize {
-        self.config
-            .max_tokens_per_query
-            .saturating_sub(self.total_tokens())
-    }
-
-    /// Get remaining call budget.
-    pub fn remaining_calls(&self) -> usize {
-        self.config
-            .max_calls_per_query
-            .saturating_sub(self.calls_made.load(Ordering::Relaxed))
-    }
-
-    /// Record token usage after an LLM call.
-    ///
-    /// # Arguments
-    ///
-    /// * `input_tokens` - Tokens in the prompt
-    /// * `output_tokens` - Tokens in the response
-    /// * `level` - Tree level where call was made
-    pub fn record_usage(&self, input_tokens: usize, output_tokens: usize, level: usize) {
-        self.input_tokens.fetch_add(input_tokens, Ordering::Relaxed);
-        self.output_tokens
-            .fetch_add(output_tokens, Ordering::Relaxed);
-        self.calls_made.fetch_add(1, Ordering::Relaxed);
-
-        // Track level calls
-        {
-            let mut level_calls = self.level_calls.write().unwrap();
-            *level_calls.entry(level).or_insert(0) += 1;
-        }
-    }
-
-    /// Get total tokens used.
-    pub fn total_tokens(&self) -> usize {
-        self.input_tokens.load(Ordering::Relaxed) + self.output_tokens.load(Ordering::Relaxed)
-    }
-
-    /// Get current usage statistics.
-    pub fn usage(&self) -> BudgetUsage {
-        BudgetUsage {
-            input_tokens: self.input_tokens.load(Ordering::Relaxed),
-            output_tokens: self.output_tokens.load(Ordering::Relaxed),
-            calls_made: self.calls_made.load(Ordering::Relaxed),
-            max_tokens: self.config.max_tokens_per_query,
-            max_calls: self.config.max_calls_per_query,
-        }
-    }
-
-    /// Get calls made at a specific level.
-    pub fn calls_at_level(&self, level: usize) -> usize {
-        let level_calls = self.level_calls.read().unwrap();
-        level_calls.get(&level).copied().unwrap_or(0)
-    }
-
-    /// Reset budget state for a new query.
-    pub fn reset(&self) {
-        self.input_tokens.store(0, Ordering::Relaxed);
-        self.output_tokens.store(0, Ordering::Relaxed);
-        self.calls_made.store(0, Ordering::Relaxed);
-        self.level_calls.write().unwrap().clear();
-    }
-
-    /// Get the configuration.
-    pub fn config(&self) -> &BudgetConfig {
-        &self.config
-    }
-
-    /// Check if hard limit is enforced.
-    pub fn is_hard_limit(&self) -> bool {
-        self.config.hard_limit
-    }
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-
-    #[test]
-    fn test_budget_controller_new() {
-        let config = BudgetConfig::default();
-        let max_calls = config.max_calls_per_query;
-        let budget = BudgetController::new(config);
-
-        assert!(budget.can_call());
-        assert_eq!(budget.remaining_calls(), max_calls);
-    }
-
-    #[test]
-    fn test_budget_can_call() {
-        let config = BudgetConfig {
-            max_tokens_per_query: 100,
-            max_calls_per_query: 2,
-            ..Default::default()
-        };
-        let budget = BudgetController::new(config);
-
-        assert!(budget.can_call());
-
-        budget.record_usage(50, 30, 0);
-        assert!(budget.can_call()); // 80 tokens, 1 call
-
-        budget.record_usage(50, 30, 0);
-        assert!(!budget.can_call()); // 160 tokens, 2 calls - exceeded
-    }
-
-    #[test]
-    fn test_budget_level_limit() {
-        let config = BudgetConfig {
-            max_calls_per_query: 10,
-            max_calls_per_level: 2,
-            ..Default::default()
-        };
-        let budget = BudgetController::new(config);
-
-        assert!(budget.can_call_at_level(0));
-
-        budget.record_usage(10, 10, 0);
-        budget.record_usage(10, 10, 0);
-        assert!(!budget.can_call_at_level(0)); // 2 calls at level 0
-        assert!(budget.can_call_at_level(1)); // Can still call at level 1
-    }
-
-    #[test]
-    fn test_budget_estimate_cost() {
-        let budget = BudgetController::with_defaults();
-
-        // English text - 26 chars ≈ 7 tokens + 100 output reserve = ~107
-        let english = "Hello world this is a test";
-        let cost = budget.estimate_cost(english);
-        assert!(
-            cost > 100 && cost < 150,
-            "Expected cost between 100-150, got {}",
-            cost
-        );
-
-        // Chinese text - 6 chars ≈ 4 tokens + 100 output reserve = ~104
-        let chinese = "这是一个测试";
-        let cost_chinese = budget.estimate_cost(chinese);
-        // Both have ~100 token base from output reserve, so just check it's reasonable
-        assert!(
-            cost_chinese > 100,
-            "Expected Chinese cost > 100, got {}",
-            cost_chinese
-        );
-    }
-
-    #[test]
-    fn test_budget_can_afford() {
-        let config = BudgetConfig {
-            max_tokens_per_query: 200,
-            max_tokens_per_call: 100,
-            ..Default::default()
-        };
-        let budget = BudgetController::new(config);
-
-        assert!(budget.can_afford(50));
-        assert!(budget.can_afford(100));
-        assert!(!budget.can_afford(150)); // Exceeds max_tokens_per_call
-
-        budget.record_usage(100, 50, 0); // 150 tokens used
-        assert!(budget.can_afford(50)); // 50 remaining
-        assert!(!budget.can_afford(60)); // Only 50 remaining
-    }
-
-    #[test]
-    fn test_budget_reset() {
-        let budget = BudgetController::with_defaults();
-
-        budget.record_usage(100, 50, 0);
-        assert_eq!(budget.total_tokens(), 150);
-        assert_eq!(budget.calls_made.load(Ordering::Relaxed), 1);
-
-        budget.reset();
-        assert_eq!(budget.total_tokens(), 0);
-        assert_eq!(budget.calls_made.load(Ordering::Relaxed), 0);
-    }
-
-    #[test]
-    fn test_budget_usage_stats() {
-        let budget = BudgetController::with_defaults();
-
-        budget.record_usage(100, 50, 0);
-        let usage = budget.usage();
-
-        assert_eq!(usage.input_tokens, 100);
-        assert_eq!(usage.output_tokens, 50);
-        assert_eq!(usage.calls_made, 1);
-        assert_eq!(usage.total_tokens(), 150);
-    }
-}
diff --git a/rust/src/retrieval/pilot/builder.rs b/rust/src/retrieval/pilot/builder.rs
deleted file mode 100644
index 0be2d338..00000000
--- a/rust/src/retrieval/pilot/builder.rs
+++ /dev/null
@@ -1,854 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Context builder for Pilot LLM calls.
-//!
-//! Constructs the context information sent to the LLM, including:
-//! - Current path in the document tree
-//! - Candidate nodes with their summaries
-//! - TOC view for navigation context
-//!
-//! Token budget is distributed across components:
-//! - Query: 30%
-//! - Current path: 20%
-//! - Candidates: 40%
-//! - Sibling context: 10%
-//!
-//! # Context Modes
-//!
-//! The builder supports different verbosity levels:
-//! - [`Full`](ContextMode::Full): Complete context with all details
-//! - [`Summary`](ContextMode::Summary): Titles and summaries only (default)
-//! - [`Minimal`](ContextMode::Minimal): Minimal context for token efficiency
-//!
-//! # Example
-//!
-//! ```rust,ignore
-//! use vectorless::retrieval::pilot::builder::{ContextBuilder, ContextMode};
-//!
-//! // Summary mode (default) - token efficient
-//! let builder = ContextBuilder::new(500)
-//!     .with_mode(ContextMode::Summary);
-//!
-//! // Full mode - maximum context
-//! let builder = ContextBuilder::new(1000)
-//!     .with_mode(ContextMode::Full);
-//!
-//! // Minimal mode - ultra efficient
-//! let builder = ContextBuilder::new(200)
-//!     .with_mode(ContextMode::Minimal);
-//! ```
-
-use std::collections::HashSet;
-
-use super::SearchState;
-use crate::document::{DocumentTree, NodeId};
-
-/// Context verbosity mode for LLM calls.
-///
-/// Controls how much detail is included in the context sent to the LLM.
-#[derive(Debug, Clone, Copy, PartialEq, Eq, Default)]
-pub enum ContextMode {
-    /// Full context with all details.
-    ///
-    /// - Includes complete content for current node
-    /// - Full summaries for all candidates
-    /// - Complete TOC with summaries
-    ///
-    /// Use when accuracy is more important than token cost.
-    Full,
-
-    /// Summary mode with titles and summaries only (default).
-    ///
-    /// - Only titles for path
-    /// - Titles + short summaries for candidates
-    /// - TOC with titles only
-    ///
-    /// Best balance of context and token efficiency.
-    #[default]
-    Summary,
-
-    /// Minimal context for maximum token efficiency.
-    ///
-    /// - Only essential path info
-    /// - Top candidates with titles only
-    /// - Abbreviated TOC
-    ///
-    /// Use when token budget is very tight.
-    Minimal,
-}
-
-impl ContextMode {
-    /// Get the default token budget for this mode.
-    pub fn default_token_budget(&self) -> usize {
-        match self {
-            ContextMode::Full => 1000,
-            ContextMode::Summary => 500,
-            ContextMode::Minimal => 200,
-        }
-    }
-
-    /// Get the maximum depth for TOC traversal.
-    pub fn max_toc_depth(&self) -> usize {
-        match self {
-            ContextMode::Full => 5,
-            ContextMode::Summary => 3,
-            ContextMode::Minimal => 2,
-        }
-    }
-
-    /// Get the maximum number of candidates to include.
-    pub fn max_candidates(&self) -> usize {
-        match self {
-            ContextMode::Full => 15,
-            ContextMode::Summary => 10,
-            ContextMode::Minimal => 5,
-        }
-    }
-
-    /// Check if summaries should be included for candidates.
-    pub fn include_summaries(&self) -> bool {
-        match self {
-            ContextMode::Full => true,
-            ContextMode::Summary => true,
-            ContextMode::Minimal => false,
-        }
-    }
-
-    /// Get the summary truncation length (in characters).
-    pub fn summary_truncation(&self) -> usize {
-        match self {
-            ContextMode::Full => 500,
-            ContextMode::Summary => 150,
-            ContextMode::Minimal => 50,
-        }
-    }
-}
-
-/// Token budget distribution for context building.
-#[derive(Debug, Clone)]
-pub struct TokenBudget {
-    /// Total tokens available.
-    pub total: usize,
-    /// Tokens for query section.
-    pub query: usize,
-    /// Tokens for current path.
-    pub path: usize,
-    /// Tokens for candidates.
-    pub candidates: usize,
-    /// Tokens for sibling context.
-    pub siblings: usize,
-}
-
-impl TokenBudget {
-    /// Create a new token budget with the given total.
-    pub fn new(total: usize) -> Self {
-        Self {
-            total,
-            query: (total as f32 * 0.30) as usize,
-            path: (total as f32 * 0.20) as usize,
-            candidates: (total as f32 * 0.40) as usize,
-            siblings: (total as f32 * 0.10) as usize,
-        }
-    }
-
-    /// Create budget with custom distribution.
-    pub fn with_distribution(
-        total: usize,
-        query_pct: f32,
-        path_pct: f32,
-        candidates_pct: f32,
-        siblings_pct: f32,
-    ) -> Self {
-        let sum = query_pct + path_pct + candidates_pct + siblings_pct;
-        Self {
-            total,
-            query: (total as f32 * query_pct / sum) as usize,
-            path: (total as f32 * path_pct / sum) as usize,
-            candidates: (total as f32 * candidates_pct / sum) as usize,
-            siblings: (total as f32 * siblings_pct / sum) as usize,
-        }
-    }
-}
-
-impl Default for TokenBudget {
-    fn default() -> Self {
-        Self::new(500)
-    }
-}
-
-/// Built context for LLM call.
-#[derive(Debug, Clone, Default)]
-pub struct PilotContext {
-    /// Formatted query section.
-    pub query_section: String,
-    /// Formatted current path.
-    pub path_section: String,
-    /// Formatted candidates section.
-    pub candidates_section: String,
-    /// Formatted TOC/sibling context.
-    pub toc_section: String,
-    /// Estimated total tokens.
-    pub estimated_tokens: usize,
-}
-
-impl PilotContext {
-    /// Get the full context as a single string.
-    pub fn to_string(&self) -> String {
-        format!(
-            "{}\n{}\n{}\n{}",
-            self.query_section, self.path_section, self.candidates_section, self.toc_section
-        )
-    }
-
-    /// Check if context is empty.
-    pub fn is_empty(&self) -> bool {
-        self.query_section.is_empty()
-            && self.path_section.is_empty()
-            && self.candidates_section.is_empty()
-    }
-
-    /// Get a hash of the query for feedback learning.
-    pub fn query_hash(&self) -> u64 {
-        use std::collections::hash_map::DefaultHasher;
-        use std::hash::{Hash, Hasher};
-        let mut hasher = DefaultHasher::new();
-        self.query_section.hash(&mut hasher);
-        hasher.finish()
-    }
-
-    /// Get a hash of the path for feedback learning.
-    pub fn path_hash(&self) -> u64 {
-        use std::collections::hash_map::DefaultHasher;
-        use std::hash::{Hash, Hasher};
-        let mut hasher = DefaultHasher::new();
-        self.path_section.hash(&mut hasher);
-        hasher.finish()
-    }
-}
-
-/// Context builder for Pilot LLM calls.
-///
-/// Builds structured context from search state, optimized for
-/// token efficiency while providing enough information for
-/// good LLM decisions.
-///
-/// # Context Modes
-///
-/// The builder supports different verbosity levels:
-/// - [`ContextMode::Full`]: Complete context with all details
-/// - [`ContextMode::Summary`]: Titles and summaries only (default)
-/// - [`ContextMode::Minimal`]: Minimal context for token efficiency
-///
-/// # Example
-///
-/// ```rust,ignore
-/// use vectorless::retrieval::pilot::builder::{ContextBuilder, ContextMode};
-///
-/// // Default summary mode
-/// let builder = ContextBuilder::new(500);
-/// let context = builder.build(&state);
-///
-/// // Full mode for maximum context
-/// let builder = ContextBuilder::new(1000).with_mode(ContextMode::Full);
-///
-/// // Minimal mode for tight token budgets
-/// let builder = ContextBuilder::new(200).with_mode(ContextMode::Minimal);
-/// ```
-pub struct ContextBuilder {
-    /// Token budget for context.
-    budget: TokenBudget,
-    /// Context verbosity mode.
-    mode: ContextMode,
-    /// Maximum candidates to include (overrides mode default).
-    max_candidates: Option<usize>,
-    /// Maximum path depth to show (overrides mode default).
-    max_path_depth: Option<usize>,
-    /// Whether to include summaries for candidates (overrides mode default).
-    include_summaries: Option<bool>,
-    /// Maximum TOC depth (overrides mode default).
-    max_toc_depth: Option<usize>,
-    /// Summary truncation length (overrides mode default).
-    summary_truncation: Option<usize>,
-}
-
-impl Default for ContextBuilder {
-    fn default() -> Self {
-        Self::new(500)
-    }
-}
-
-impl ContextBuilder {
-    /// Create a new context builder with the given token budget.
-    ///
-    /// Uses [`ContextMode::Summary`] by default.
-    pub fn new(token_budget: usize) -> Self {
-        Self {
-            budget: TokenBudget::new(token_budget),
-            mode: ContextMode::default(),
-            max_candidates: None,
-            max_path_depth: None,
-            include_summaries: None,
-            max_toc_depth: None,
-            summary_truncation: None,
-        }
-    }
-
-    /// Create with custom budget object.
-    pub fn with_budget(budget: TokenBudget) -> Self {
-        Self {
-            budget,
-            mode: ContextMode::default(),
-            max_candidates: None,
-            max_path_depth: None,
-            include_summaries: None,
-            max_toc_depth: None,
-            summary_truncation: None,
-        }
-    }
-
-    /// Set the context mode.
-    ///
-    /// This controls the verbosity of the context:
-    /// - `Full`: Complete context with all details
-    /// - `Summary`: Titles and summaries only (default)
-    /// - `Minimal`: Minimal context for token efficiency
-    pub fn with_mode(mut self, mode: ContextMode) -> Self {
-        self.mode = mode;
-        // Update budget if not explicitly set
-        if self.budget.total < mode.default_token_budget() {
-            self.budget = TokenBudget::new(mode.default_token_budget());
-        }
-        self
-    }
-
-    /// Set maximum candidates to include (overrides mode default).
-    pub fn with_max_candidates(mut self, max: usize) -> Self {
-        self.max_candidates = Some(max);
-        self
-    }
-
-    /// Set maximum path depth to show (overrides mode default).
-    pub fn with_max_path_depth(mut self, max: usize) -> Self {
-        self.max_path_depth = Some(max);
-        self
-    }
-
-    /// Set whether to include summaries for candidates (overrides mode default).
-    pub fn with_summaries(mut self, include: bool) -> Self {
-        self.include_summaries = Some(include);
-        self
-    }
-
-    /// Set maximum TOC depth (overrides mode default).
-    pub fn with_max_toc_depth(mut self, depth: usize) -> Self {
-        self.max_toc_depth = Some(depth);
-        self
-    }
-
-    /// Set summary truncation length (overrides mode default).
-    pub fn with_summary_truncation(mut self, len: usize) -> Self {
-        self.summary_truncation = Some(len);
-        self
-    }
-
-    /// Get the effective max candidates (mode default or override).
-    fn effective_max_candidates(&self) -> usize {
-        self.max_candidates
-            .unwrap_or_else(|| self.mode.max_candidates())
-    }
-
-    /// Get the effective max path depth (mode default or override).
-    fn effective_max_path_depth(&self) -> usize {
-        self.max_path_depth.unwrap_or(5)
-    }
-
-    /// Get the effective include summaries setting (mode default or override).
-    fn effective_include_summaries(&self) -> bool {
-        self.include_summaries
-            .unwrap_or_else(|| self.mode.include_summaries())
-    }
-
-    /// Get the effective max TOC depth (mode default or override).
-    fn effective_max_toc_depth(&self) -> usize {
-        self.max_toc_depth
-            .unwrap_or_else(|| self.mode.max_toc_depth())
-    }
-
-    /// Get the effective summary truncation length (mode default or override).
-    fn effective_summary_truncation(&self) -> usize {
-        self.summary_truncation
-            .unwrap_or_else(|| self.mode.summary_truncation())
-    }
-
-    /// Get the current mode.
-    pub fn mode(&self) -> ContextMode {
-        self.mode
-    }
-
-    /// Build context from search state.
-    pub fn build(&self, state: &SearchState<'_>) -> PilotContext {
-        let mut ctx = PilotContext::default();
-
-        // Build query section
-        ctx.query_section = self.build_query_section(state.query);
-        ctx.estimated_tokens += self.estimate_tokens(&ctx.query_section);
-
-        // Build path section
-        ctx.path_section = self.build_path_section(state.tree, state.path, state.step_reasons);
-        ctx.estimated_tokens += self.estimate_tokens(&ctx.path_section);
-
-        // Build candidates section
-        ctx.candidates_section = self.build_candidates_section(state.tree, state.candidates);
-        ctx.estimated_tokens += self.estimate_tokens(&ctx.candidates_section);
-
-        // Build TOC section (siblings context)
-        ctx.toc_section = self.build_toc_section(state.tree, state.path);
-        ctx.estimated_tokens += self.estimate_tokens(&ctx.toc_section);
-
-        ctx
-    }
-
-    /// Build context for START intervention point.
-    pub fn build_start_context(&self, tree: &DocumentTree, query: &str) -> PilotContext {
-        let mut ctx = PilotContext::default();
-
-        // Build query section
-        ctx.query_section = self.build_query_section(query);
-        ctx.estimated_tokens += self.estimate_tokens(&ctx.query_section);
-
-        // Build full TOC for start
-        ctx.toc_section = self.build_full_toc(tree);
-        ctx.estimated_tokens += self.estimate_tokens(&ctx.toc_section);
-
-        ctx
-    }
-
-    /// Build context for BACKTRACK intervention point.
-    pub fn build_backtrack_context(
-        &self,
-        state: &SearchState<'_>,
-        failed_path: &[NodeId],
-    ) -> PilotContext {
-        let mut ctx = PilotContext::default();
-
-        // Build query section
-        ctx.query_section = self.build_query_section(state.query);
-        ctx.estimated_tokens += self.estimate_tokens(&ctx.query_section);
-
-        // Show failed path
-        ctx.path_section = format!(
-            "Failed path:\n{}",
-            self.build_path_section(state.tree, failed_path, None)
-        );
-        ctx.estimated_tokens += self.estimate_tokens(&ctx.path_section);
-
-        // Show unvisited alternatives
-        ctx.candidates_section = self.build_unvisited_section(state.tree, state.visited);
-        ctx.estimated_tokens += self.estimate_tokens(&ctx.candidates_section);
-
-        ctx
-    }
-
-    /// Build query section.
-    fn build_query_section(&self, query: &str) -> String {
-        // Truncate if needed
-        let truncated = if query.chars().count() > self.budget.query * 4 {
-            let chars: Vec<char> = query.chars().take(self.budget.query * 4).collect();
-            format!("{}...", chars.into_iter().collect::<String>())
-        } else {
-            query.to_string()
-        };
-
-        format!("User Query:\n{}\n", truncated)
-    }
-
-    /// Build current path section with optional per-step reasoning.
-    fn build_path_section(
-        &self,
-        tree: &DocumentTree,
-        path: &[NodeId],
-        step_reasons: Option<&[Option<String>]>,
-    ) -> String {
-        if path.is_empty() {
-            return "Current Position: Root\n".to_string();
-        }
-
-        let has_reasons = step_reasons
-            .map(|r| r.iter().any(|x| x.is_some()))
-            .unwrap_or(false);
-
-        if !has_reasons {
-            // Original breadcrumb format when no reasoning available
-            let mut result = String::from("Current Path:\n");
-            result.push_str("Root");
-
-            let max_depth = self.effective_max_path_depth();
-            let start = if path.len() > max_depth {
-                path.len() - max_depth
-            } else {
-                0
-            };
-
-            if start > 0 {
-                result.push_str(" → ...");
-            }
-
-            for node_id in path.iter().skip(start) {
-                if let Some(node) = tree.get(*node_id) {
-                    result.push_str(" → ");
-                    result.push_str(&node.title);
-                }
-            }
-
-            result.push('\n');
-            return result;
-        }
-
-        // Enhanced format with per-step reasoning
-        let mut result = String::from("Navigation History:\n");
-        let reasons = step_reasons.unwrap();
-
-        for (i, node_id) in path.iter().enumerate() {
-            if let Some(node) = tree.get(*node_id) {
-                let reason = reasons
-                    .get(i)
-                    .and_then(|r| r.as_deref())
-                    .unwrap_or("(automatic selection)");
-                result.push_str(&format!(
-                    "  Step {}: {} — because: {}\n",
-                    i + 1,
-                    node.title,
-                    reason
-                ));
-            }
-        }
-
-        result
-    }
-
-    /// Build candidates section with dynamic truncation.
-    fn build_candidates_section(&self, tree: &DocumentTree, candidates: &[NodeId]) -> String {
-        if candidates.is_empty() {
-            return "Candidates: (none)\n".to_string();
-        }
-
-        let mut result = String::from("Candidate Nodes:\n");
-        let mut tokens_used = 0;
-        let max_tokens = self.budget.candidates;
-        let max_candidates = self.effective_max_candidates();
-        let include_summaries = self.effective_include_summaries();
-        let summary_trunc = self.effective_summary_truncation();
-
-        for (i, node_id) in candidates.iter().take(max_candidates).enumerate() {
-            if tokens_used >= max_tokens {
-                result.push_str("... (more candidates omitted)\n");
-                break;
-            }
-
-            if let Some(node) = tree.get(*node_id) {
-                let entry = if include_summaries && !node.summary.is_empty() {
-                    let truncated_summary = self.truncate_text(&node.summary, summary_trunc);
-                    format!("{}. {} [{}]\n", i + 1, node.title, truncated_summary)
-                } else {
-                    format!("{}. {}\n", i + 1, node.title)
-                };
-
-                tokens_used += self.estimate_tokens(&entry);
-                result.push_str(&entry);
-            }
-        }
-
-        result
-    }
-
-    /// Build TOC section showing siblings.
-    fn build_toc_section(&self, tree: &DocumentTree, path: &[NodeId]) -> String {
-        if path.is_empty() {
-            return String::new();
-        }
-
-        // Get parent of current node
-        let parent_id = if path.len() >= 2 {
-            path[path.len() - 2]
-        } else {
-            tree.root()
-        };
-
-        let siblings = tree.children(parent_id);
-        if siblings.len() <= 1 {
-            return String::new();
-        }
-
-        let current_id = path[path.len() - 1];
-        let mut result = String::from("Sibling Context:\n");
-
-        for sibling_id in siblings.iter().take(8) {
-            if let Some(node) = tree.get(*sibling_id) {
-                let marker = if *sibling_id == current_id {
-                    "⭐ "
-                } else {
-                    ""
-                };
-                result.push_str(&format!("  {}{}\n", marker, node.title));
-            }
-        }
-
-        result
-    }
-
-    /// Build full TOC for start context.
-    fn build_full_toc(&self, tree: &DocumentTree) -> String {
-        let mut result = String::from("Document Structure:\n");
-        let mut tokens_used = 0;
-        let max_tokens = self.budget.siblings + self.budget.candidates;
-        let max_depth = self.effective_max_toc_depth();
-        let include_summaries = self.effective_include_summaries();
-        let summary_trunc = self.effective_summary_truncation();
-
-        self.build_toc_recursive(
-            tree,
-            tree.root(),
-            0,
-            &mut result,
-            &mut tokens_used,
-            max_tokens,
-            max_depth,
-            include_summaries,
-            summary_trunc,
-        );
-
-        result
-    }
-
-    /// Recursive helper for building TOC.
-    fn build_toc_recursive(
-        &self,
-        tree: &DocumentTree,
-        node_id: NodeId,
-        depth: usize,
-        result: &mut String,
-        tokens_used: &mut usize,
-        max_tokens: usize,
-        max_depth: usize,
-        include_summaries: bool,
-        summary_trunc: usize,
-    ) {
-        if *tokens_used >= max_tokens || depth > max_depth {
-            return;
-        }
-
-        if let Some(node) = tree.get(node_id) {
-            let indent = "  ".repeat(depth);
-            let entry = if include_summaries && !node.summary.is_empty() && depth < 2 {
-                let truncated = self.truncate_text(&node.summary, summary_trunc);
-                format!("{}{} [{}]\n", indent, node.title, truncated)
-            } else {
-                format!("{}{}\n", indent, node.title)
-            };
-            *tokens_used += entry.len() / 4; // Rough estimate
-            result.push_str(&entry);
-
-            // Only show children for first few levels
-            if depth < max_depth {
-                for child_id in tree.children(node_id) {
-                    self.build_toc_recursive(
-                        tree,
-                        child_id,
-                        depth + 1,
-                        result,
-                        tokens_used,
-                        max_tokens,
-                        max_depth,
-                        include_summaries,
-                        summary_trunc,
-                    );
-                }
-            }
-        }
-    }
-
-    /// Build section showing unvisited nodes.
-    fn build_unvisited_section(&self, tree: &DocumentTree, visited: &HashSet<NodeId>) -> String {
-        let mut result = String::from("Unvisited Alternatives:\n");
-        let mut count = 0;
-
-        // Find unvisited nodes from root's children
-        for child_id in tree.children(tree.root()) {
-            if !visited.contains(&child_id) {
-                if let Some(node) = tree.get(child_id) {
-                    result.push_str(&format!("• {} [{}]\n", node.title, node.summary));
-                    count += 1;
-                    if count >= 5 {
-                        break;
-                    }
-                }
-            }
-        }
-
-        if count == 0 {
-            result.push_str("(all branches explored)\n");
-        }
-
-        result
-    }
-
-    /// Truncate text to a maximum character length.
-    ///
-    /// Adds "..." if truncation occurs.
-    fn truncate_text(&self, text: &str, max_chars: usize) -> String {
-        if text.chars().count() <= max_chars {
-            text.to_string()
-        } else {
-            let truncated: String = text.chars().take(max_chars).collect();
-            // Try to break at word boundary
-            if let Some(last_space) = truncated.rfind(' ') {
-                if last_space > max_chars / 2 {
-                    format!("{}...", &truncated[..last_space])
-                } else {
-                    format!("{}...", truncated)
-                }
-            } else {
-                format!("{}...", truncated)
-            }
-        }
-    }
-
-    /// Estimate token count for a string.
-    fn estimate_tokens(&self, text: &str) -> usize {
-        // Rough estimation: 1 token ≈ 4 chars (English) or 1.5 chars (Chinese)
-        let char_count = text.chars().count();
-        let chinese_count = text
-            .chars()
-            .filter(|c| ('\u{4E00}'..='\u{9FFF}').contains(c))
-            .count();
-        let english_count = char_count - chinese_count;
-
-        (chinese_count as f32 / 1.5 + english_count as f32 / 4.0).ceil() as usize
-    }
-
-    /// Get the token budget.
-    pub fn budget(&self) -> &TokenBudget {
-        &self.budget
-    }
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-    use indextree::Arena;
-
-    fn create_test_tree() -> DocumentTree {
-        let mut arena = Arena::new();
-        let root = arena.new_node(crate::document::TreeNode {
-            title: "Root".to_string(),
-            content: "Root content".to_string(),
-            summary: "Root summary".to_string(),
-            depth: 0,
-            ..Default::default()
-        });
-
-        let child1 = arena.new_node(crate::document::TreeNode {
-            title: "Configuration".to_string(),
-            content: "Config content".to_string(),
-            summary: "Configuration options".to_string(),
-            depth: 1,
-            ..Default::default()
-        });
-
-        let child2 = arena.new_node(crate::document::TreeNode {
-            title: "API Reference".to_string(),
-            content: "API content".to_string(),
-            summary: "API documentation".to_string(),
-            depth: 1,
-            ..Default::default()
-        });
-
-        root.append(child1, &mut arena);
-        root.append(child2, &mut arena);
-
-        DocumentTree::from_raw(arena, crate::document::NodeId(root))
-    }
-
-    #[test]
-    fn test_token_budget_distribution() {
-        let budget = TokenBudget::new(500);
-        assert_eq!(budget.query, 150); // 30%
-        assert_eq!(budget.path, 100); // 20%
-        assert_eq!(budget.candidates, 200); // 40%
-        assert_eq!(budget.siblings, 50); // 10%
-    }
-
-    #[test]
-    fn test_context_builder_creation() {
-        let builder = ContextBuilder::new(500);
-        assert_eq!(builder.effective_max_candidates(), 10); // Default from Summary mode
-        assert_eq!(builder.effective_max_path_depth(), 5);
-        assert!(builder.effective_include_summaries());
-    }
-
-    #[test]
-    fn test_build_query_section() {
-        let builder = ContextBuilder::new(500);
-        let result = builder.build_query_section("How to configure PostgreSQL?");
-        assert!(result.contains("How to configure PostgreSQL?"));
-        assert!(result.starts_with("User Query:"));
-    }
-
-    #[test]
-    fn test_build_query_section_truncation() {
-        let builder = ContextBuilder::new(20); // Very small budget - 20 * 0.30 = 6 tokens for query = ~24 chars
-        let long_query = "This is a very long query that should be truncated because it exceeds the token budget";
-        let result = builder.build_query_section(long_query);
-        assert!(
-            result.contains("..."),
-            "Expected truncation, got: {}",
-            result
-        );
-    }
-
-    #[test]
-    fn test_estimate_tokens_english() {
-        let builder = ContextBuilder::new(500);
-        let text = "Hello world"; // 11 chars ≈ 3 tokens
-        let tokens = builder.estimate_tokens(text);
-        assert!(tokens >= 2 && tokens <= 4);
-    }
-
-    #[test]
-    fn test_estimate_tokens_chinese() {
-        let builder = ContextBuilder::new(500);
-        let text = "这是一个测试"; // 6 chars ≈ 4 tokens
-        let tokens = builder.estimate_tokens(text);
-        assert!(tokens >= 3 && tokens <= 5);
-    }
-
-    #[test]
-    fn test_pilot_context_to_string() {
-        let ctx = PilotContext {
-            query_section: "Query".to_string(),
-            path_section: "Path".to_string(),
-            candidates_section: "Candidates".to_string(),
-            toc_section: "TOC".to_string(),
-            estimated_tokens: 100,
-        };
-
-        let result = ctx.to_string();
-        assert!(result.contains("Query"));
-        assert!(result.contains("Path"));
-        assert!(result.contains("Candidates"));
-        assert!(result.contains("TOC"));
-    }
-
-    #[test]
-    fn test_pilot_context_is_empty() {
-        let empty = PilotContext::default();
-        assert!(empty.is_empty());
-
-        let non_empty = PilotContext {
-            query_section: "Query".to_string(),
-            ..Default::default()
-        };
-        assert!(!non_empty.is_empty());
-    }
-}
diff --git a/rust/src/retrieval/pilot/complexity.rs b/rust/src/retrieval/pilot/complexity.rs
deleted file mode 100644
index 8348b4f8..00000000
--- a/rust/src/retrieval/pilot/complexity.rs
+++ /dev/null
@@ -1,68 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! LLM-based query complexity detection.
-//!
-//! Uses the Pilot's LLM client to classify query complexity.
-//! Falls back to heuristic rules when LLM is unavailable or fails.
-
-use serde::Deserialize;
-
-use super::super::complexity::QueryComplexity;
-use crate::llm::LlmClient;
-
-/// LLM response schema for complexity classification.
-#[derive(Debug, Deserialize)]
-struct ComplexityResponse {
-    complexity: String,
-}
-
-/// System prompt for complexity classification.
-const SYSTEM_PROMPT: &str = include_str!("prompts/system_complexity.txt");
-/// User prompt template.
-const USER_PROMPT: &str = include_str!("prompts/user_complexity.txt");
-
-/// Detect query complexity using LLM.
-///
-/// Returns `None` if the LLM call fails (caller should fall back to heuristic).
-pub async fn detect_with_llm(client: &LlmClient, query: &str) -> Option<QueryComplexity> {
-    let user = USER_PROMPT.replace("{query}", query);
-
-    let resp: ComplexityResponse = client
-        .complete_json_with_max_tokens(SYSTEM_PROMPT, &user, 80)
-        .await
-        .ok()?;
-
-    let complexity = match resp.complexity.to_lowercase().as_str() {
-        "simple" => QueryComplexity::Simple,
-        "complex" => QueryComplexity::Complex,
-        _ => QueryComplexity::Medium,
-    };
-
-    tracing::debug!(
-        "LLM complexity detection: query='{}', result={:?}",
-        query,
-        complexity
-    );
-
-    Some(complexity)
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-
-    #[test]
-    fn test_system_prompt_not_empty() {
-        assert!(!SYSTEM_PROMPT.is_empty());
-        assert!(SYSTEM_PROMPT.contains("simple"));
-        assert!(SYSTEM_PROMPT.contains("complex"));
-    }
-
-    #[test]
-    fn test_user_prompt_template() {
-        assert!(USER_PROMPT.contains("{query}"));
-        let filled = USER_PROMPT.replace("{query}", "test query");
-        assert!(filled.contains("test query"));
-    }
-}
diff --git a/rust/src/retrieval/pilot/config.rs b/rust/src/retrieval/pilot/config.rs
deleted file mode 100644
index 1d2c3a04..00000000
--- a/rust/src/retrieval/pilot/config.rs
+++ /dev/null
@@ -1,460 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Configuration types for Pilot.
-//!
-//! This module defines all configuration structures that control
-//! Pilot's behavior, including budget limits, intervention thresholds,
-//! and operation modes.
-
-use serde::{Deserialize, Serialize};
-
-/// Main Pilot configuration.
-///
-/// Controls all aspects of Pilot behavior including budget,
-/// intervention strategy, and feature flags.
-#[derive(Debug, Clone, Serialize, Deserialize)]
-pub struct PilotConfig {
-    /// Operation mode controlling how aggressively Pilot intervenes.
-    pub mode: PilotMode,
-    /// Token and call budget constraints.
-    pub budget: BudgetConfig,
-    /// Intervention threshold settings.
-    pub intervention: InterventionConfig,
-    /// Whether to provide guidance at search start.
-    pub guide_at_start: bool,
-    /// Whether to provide guidance during backtracking.
-    pub guide_at_backtrack: bool,
-    /// Optional path to custom prompt templates.
-    pub prompt_template_path: Option<String>,
-    /// Pre-filtering configuration for reducing candidates before Pilot.
-    pub prefilter: PrefilterConfig,
-    /// Binary pruning configuration for quick relevance filtering.
-    pub prune: PruneConfig,
-}
-
-impl Default for PilotConfig {
-    fn default() -> Self {
-        Self {
-            mode: PilotMode::Balanced,
-            budget: BudgetConfig::default(),
-            intervention: InterventionConfig::default(),
-            guide_at_start: true,
-            guide_at_backtrack: true,
-            prompt_template_path: None,
-            prefilter: PrefilterConfig::default(),
-            prune: PruneConfig::default(),
-        }
-    }
-}
-
-impl PilotConfig {
-    /// Create a new config with specified mode.
-    pub fn with_mode(mode: PilotMode) -> Self {
-        Self {
-            mode,
-            ..Default::default()
-        }
-    }
-
-    /// Create a high-quality config (more LLM calls, generous pre-filter).
-    pub fn high_quality() -> Self {
-        Self {
-            mode: PilotMode::Aggressive,
-            budget: BudgetConfig {
-                max_tokens_per_query: 5000,
-                max_tokens_per_call: 1000,
-                max_calls_per_query: 10,
-                max_calls_per_level: 3,
-                hard_limit: false,
-            },
-            intervention: InterventionConfig {
-                fork_threshold: 2,
-                score_gap_threshold: 0.2,
-                low_score_threshold: 0.4,
-                max_interventions_per_level: 3,
-            },
-            guide_at_start: true,
-            guide_at_backtrack: true,
-            prompt_template_path: None,
-            prefilter: PrefilterConfig {
-                threshold: 20,
-                max_to_pilot: 20,
-                enabled: true,
-            },
-            prune: PruneConfig {
-                enabled: true,
-                threshold: 25,
-                min_keep: 5,
-            },
-        }
-    }
-
-    /// Create a low-cost config (fewer LLM calls, aggressive pre-filter).
-    pub fn low_cost() -> Self {
-        Self {
-            mode: PilotMode::Conservative,
-            budget: BudgetConfig {
-                max_tokens_per_query: 500,
-                max_tokens_per_call: 200,
-                max_calls_per_query: 2,
-                max_calls_per_level: 1,
-                hard_limit: true,
-            },
-            intervention: InterventionConfig {
-                fork_threshold: 5,
-                score_gap_threshold: 0.1,
-                low_score_threshold: 0.2,
-                max_interventions_per_level: 1,
-            },
-            guide_at_start: false,
-            guide_at_backtrack: true,
-            prompt_template_path: None,
-            prefilter: PrefilterConfig {
-                threshold: 8,
-                max_to_pilot: 8,
-                enabled: true,
-            },
-            prune: PruneConfig {
-                enabled: true,
-                threshold: 12,
-                min_keep: 2,
-            },
-        }
-    }
-
-    /// Create a pure algorithm config (no LLM calls).
-    pub fn algorithm_only() -> Self {
-        Self {
-            mode: PilotMode::AlgorithmOnly,
-            prefilter: PrefilterConfig {
-                threshold: 15,
-                max_to_pilot: 15,
-                enabled: false,
-            },
-            prune: PruneConfig {
-                enabled: false,
-                threshold: 20,
-                min_keep: 3,
-            },
-            ..Default::default()
-        }
-    }
-}
-
-/// Pilot operation mode.
-///
-/// Controls the trade-off between LLM usage and algorithm-only search.
-#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize, Default)]
-pub enum PilotMode {
-    /// Aggressive mode: frequent LLM calls for maximum accuracy.
-    Aggressive,
-    /// Balanced mode: LLM calls at key decision points (default).
-    #[default]
-    Balanced,
-    /// Conservative mode: minimal LLM calls, rely more on algorithm.
-    Conservative,
-    /// Pure algorithm mode: no LLM calls at all.
-    AlgorithmOnly,
-}
-
-impl PilotMode {
-    /// Check if this mode uses LLM at all.
-    pub fn uses_llm(&self) -> bool {
-        !matches!(self, PilotMode::AlgorithmOnly)
-    }
-
-    /// Get the fork threshold multiplier for this mode.
-    pub fn fork_threshold_multiplier(&self) -> f32 {
-        match self {
-            PilotMode::Aggressive => 0.5, // Lower threshold = more interventions
-            PilotMode::Balanced => 1.0,
-            PilotMode::Conservative => 2.0, // Higher threshold = fewer interventions
-            PilotMode::AlgorithmOnly => f32::MAX,
-        }
-    }
-}
-
-/// Token and call budget configuration.
-///
-/// Controls resource consumption during retrieval.
-#[derive(Debug, Clone, Serialize, Deserialize)]
-pub struct BudgetConfig {
-    /// Maximum total tokens per query (input + output).
-    pub max_tokens_per_query: usize,
-    /// Maximum tokens per single LLM call.
-    pub max_tokens_per_call: usize,
-    /// Maximum number of LLM calls per query.
-    pub max_calls_per_query: usize,
-    /// Maximum number of LLM calls per tree level.
-    pub max_calls_per_level: usize,
-    /// Whether to enforce hard limits (true) or soft limits with warnings (false).
-    pub hard_limit: bool,
-}
-
-impl Default for BudgetConfig {
-    fn default() -> Self {
-        Self {
-            max_tokens_per_query: 2000,
-            max_tokens_per_call: 500,
-            max_calls_per_query: 5,
-            max_calls_per_level: 2,
-            hard_limit: true,
-        }
-    }
-}
-
-impl BudgetConfig {
-    /// Check if a given token count is within budget.
-    pub fn is_within_budget(&self, used: usize) -> bool {
-        used < self.max_tokens_per_query
-    }
-
-    /// Get remaining tokens given current usage.
-    pub fn remaining_tokens(&self, used: usize) -> usize {
-        self.max_tokens_per_query.saturating_sub(used)
-    }
-}
-
-/// Intervention threshold configuration.
-///
-/// Controls when Pilot decides to intervene in the search process.
-#[derive(Debug, Clone, Serialize, Deserialize)]
-pub struct InterventionConfig {
-    /// Minimum number of candidates to trigger fork intervention.
-    pub fork_threshold: usize,
-    /// Score gap threshold (intervene when top scores are within this range).
-    pub score_gap_threshold: f32,
-    /// Low score threshold (intervene when best score is below this).
-    pub low_score_threshold: f32,
-    /// Maximum interventions allowed per tree level.
-    pub max_interventions_per_level: usize,
-}
-
-impl Default for InterventionConfig {
-    fn default() -> Self {
-        Self {
-            fork_threshold: 3,
-            score_gap_threshold: 0.15,
-            low_score_threshold: 0.3,
-            max_interventions_per_level: 2,
-        }
-    }
-}
-
-impl InterventionConfig {
-    /// Check if the candidate count triggers intervention.
-    pub fn should_intervene_at_fork(&self, candidate_count: usize) -> bool {
-        candidate_count > self.fork_threshold
-    }
-
-    /// Check if scores are too close (algorithm uncertain).
-    pub fn scores_are_close(&self, scores: &[f32]) -> bool {
-        if scores.len() < 2 {
-            return false;
-        }
-        let max_score = scores.iter().cloned().fold(0.0, f32::max);
-        let min_score = scores.iter().cloned().fold(1.0, f32::min);
-        (max_score - min_score) < self.score_gap_threshold
-    }
-
-    /// Check if the best score is too low.
-    pub fn is_low_confidence(&self, best_score: f32) -> bool {
-        best_score < self.low_score_threshold
-    }
-}
-
-/// Configuration for NodeScorer-based pre-filtering before Pilot scoring.
-///
-/// When a node has many children, sending all to the LLM is wasteful.
-/// Pre-filtering uses cheap NodeScorer (keyword/BM25) to narrow the
-/// candidate set before expensive Pilot (LLM) scoring.
-#[derive(Debug, Clone, Serialize, Deserialize)]
-pub struct PrefilterConfig {
-    /// Minimum number of candidates to trigger pre-filtering.
-    ///
-    /// When `candidates.len()` exceeds this threshold, NodeScorer
-    /// pre-filters before sending to Pilot.
-    /// Default: 15.
-    pub threshold: usize,
-
-    /// Maximum number of candidates passed to Pilot after pre-filtering.
-    ///
-    /// NodeScorer's top-N are kept; the rest get NodeScorer-only scores.
-    /// Default: 15.
-    pub max_to_pilot: usize,
-
-    /// Whether pre-filtering is enabled.
-    /// Default: true.
-    pub enabled: bool,
-}
-
-impl Default for PrefilterConfig {
-    fn default() -> Self {
-        Self {
-            threshold: 15,
-            max_to_pilot: 15,
-            enabled: true,
-        }
-    }
-}
-
-impl PrefilterConfig {
-    /// Check if pre-filtering should be applied given the candidate count.
-    pub fn should_prefilter(&self, candidate_count: usize) -> bool {
-        self.enabled && candidate_count > self.threshold
-    }
-}
-
-/// Configuration for binary pruning before full Pilot scoring.
-///
-/// After P2 pre-filtering, if candidates still exceed this threshold,
-/// a lightweight LLM call asks "which are relevant?" before the full
-/// scoring call. This reduces the number of candidates that receive
-/// expensive detailed scoring.
-#[derive(Debug, Clone, Serialize, Deserialize)]
-pub struct PruneConfig {
-    /// Whether binary pruning is enabled.
-    /// Default: true.
-    pub enabled: bool,
-
-    /// Trigger threshold — binary prune activates when the candidate
-    /// count (after P2 pre-filtering) exceeds this value.
-    /// Default: 20.
-    pub threshold: usize,
-
-    /// Minimum candidates to keep after pruning, even if LLM says
-    /// fewer are relevant. Prevents over-aggressive pruning.
-    /// Default: 3.
-    pub min_keep: usize,
-}
-
-impl Default for PruneConfig {
-    fn default() -> Self {
-        Self {
-            enabled: true,
-            threshold: 20,
-            min_keep: 3,
-        }
-    }
-}
-
-impl PruneConfig {
-    /// Check if binary pruning should be applied given the candidate count.
-    pub fn should_prune(&self, candidate_count: usize) -> bool {
-        self.enabled && candidate_count > self.threshold
-    }
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-
-    #[test]
-    fn test_pilot_mode_uses_llm() {
-        assert!(PilotMode::Aggressive.uses_llm());
-        assert!(PilotMode::Balanced.uses_llm());
-        assert!(PilotMode::Conservative.uses_llm());
-        assert!(!PilotMode::AlgorithmOnly.uses_llm());
-    }
-
-    #[test]
-    fn test_budget_config() {
-        let config = BudgetConfig::default();
-        assert!(config.is_within_budget(1000));
-        assert!(!config.is_within_budget(3000));
-        assert_eq!(config.remaining_tokens(1500), 500);
-    }
-
-    #[test]
-    fn test_intervention_config() {
-        let config = InterventionConfig::default();
-
-        // Fork threshold
-        assert!(!config.should_intervene_at_fork(2));
-        assert!(config.should_intervene_at_fork(4));
-
-        // Scores close
-        assert!(config.scores_are_close(&[0.5, 0.55, 0.52]));
-        assert!(!config.scores_are_close(&[0.3, 0.8]));
-
-        // Low confidence
-        assert!(config.is_low_confidence(0.2));
-        assert!(!config.is_low_confidence(0.5));
-    }
-
-    #[test]
-    fn test_pilot_config_presets() {
-        let high = PilotConfig::high_quality();
-        assert_eq!(high.mode, PilotMode::Aggressive);
-        assert!(high.prefilter.enabled);
-        assert_eq!(high.prefilter.threshold, 20);
-
-        let low = PilotConfig::low_cost();
-        assert_eq!(low.mode, PilotMode::Conservative);
-        assert!(low.prefilter.enabled);
-        assert_eq!(low.prefilter.threshold, 8);
-
-        let algo = PilotConfig::algorithm_only();
-        assert_eq!(algo.mode, PilotMode::AlgorithmOnly);
-        assert!(!algo.prefilter.enabled);
-    }
-
-    #[test]
-    fn test_prefilter_config_default() {
-        let cfg = PrefilterConfig::default();
-        assert!(cfg.enabled);
-        assert_eq!(cfg.threshold, 15);
-        assert_eq!(cfg.max_to_pilot, 15);
-    }
-
-    #[test]
-    fn test_prefilter_should_prefilter() {
-        let cfg = PrefilterConfig::default();
-        assert!(!cfg.should_prefilter(15)); // at threshold
-        assert!(!cfg.should_prefilter(10)); // below
-        assert!(cfg.should_prefilter(16)); // above
-
-        let disabled = PrefilterConfig {
-            enabled: false,
-            ..Default::default()
-        };
-        assert!(!disabled.should_prefilter(100));
-    }
-
-    #[test]
-    fn test_prune_config_default() {
-        let cfg = PruneConfig::default();
-        assert!(cfg.enabled);
-        assert_eq!(cfg.threshold, 20);
-        assert_eq!(cfg.min_keep, 3);
-    }
-
-    #[test]
-    fn test_prune_should_prune() {
-        let cfg = PruneConfig::default();
-        assert!(!cfg.should_prune(20)); // at threshold
-        assert!(!cfg.should_prune(15)); // below
-        assert!(cfg.should_prune(21)); // above
-
-        let disabled = PruneConfig {
-            enabled: false,
-            ..Default::default()
-        };
-        assert!(!disabled.should_prune(100));
-    }
-
-    #[test]
-    fn test_pilot_config_presets_prune() {
-        let high = PilotConfig::high_quality();
-        assert!(high.prune.enabled);
-        assert_eq!(high.prune.threshold, 25);
-
-        let low = PilotConfig::low_cost();
-        assert!(low.prune.enabled);
-        assert_eq!(low.prune.threshold, 12);
-
-        let algo = PilotConfig::algorithm_only();
-        assert!(!algo.prune.enabled);
-    }
-}
diff --git a/rust/src/retrieval/pilot/decision.rs b/rust/src/retrieval/pilot/decision.rs
deleted file mode 100644
index 31ee5677..00000000
--- a/rust/src/retrieval/pilot/decision.rs
+++ /dev/null
@@ -1,322 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Decision types for Pilot navigation.
-//!
-//! This module defines the types that represent Pilot's navigation decisions,
-//! including direction recommendations, candidate rankings, and intervention points.
-
-use serde::{Deserialize, Serialize};
-
-use crate::document::NodeId;
-
-/// Pilot's navigation decision result.
-///
-/// Contains all information about where to go next and why.
-#[derive(Debug, Clone, Serialize, Deserialize)]
-pub struct PilotDecision {
-    /// Ranked list of candidate nodes (most relevant first).
-    pub ranked_candidates: Vec<RankedCandidate>,
-    /// Recommended search direction.
-    pub direction: SearchDirection,
-    /// Confidence level of this decision (0.0 - 1.0).
-    pub confidence: f32,
-    /// Human-readable explanation of the decision.
-    pub reasoning: String,
-    /// The intervention point that triggered this decision.
-    pub intervention_point: InterventionPoint,
-}
-
-impl Default for PilotDecision {
-    fn default() -> Self {
-        Self {
-            ranked_candidates: Vec::new(),
-            direction: SearchDirection::GoDeeper {
-                reason: "Default decision".to_string(),
-            },
-            confidence: 0.0,
-            reasoning: "No specific guidance available".to_string(),
-            intervention_point: InterventionPoint::Evaluate,
-        }
-    }
-}
-
-impl PilotDecision {
-    /// Create a new decision with the given candidates and direction.
-    pub fn new(
-        ranked_candidates: Vec<RankedCandidate>,
-        direction: SearchDirection,
-        confidence: f32,
-        reasoning: String,
-    ) -> Self {
-        Self {
-            ranked_candidates,
-            direction,
-            confidence,
-            reasoning,
-            intervention_point: InterventionPoint::Fork,
-        }
-    }
-
-    /// Create a decision that preserves original order (no-op).
-    pub fn preserve_order(candidates: &[NodeId]) -> Self {
-        Self {
-            ranked_candidates: candidates
-                .iter()
-                .enumerate()
-                .map(|(i, &id)| RankedCandidate {
-                    node_id: id,
-                    score: 1.0 - (i as f32 * 0.1),
-                    reason: None,
-                })
-                .collect(),
-            direction: SearchDirection::GoDeeper {
-                reason: "Preserving original order".to_string(),
-            },
-            confidence: 0.0,
-            reasoning: "No intervention performed".to_string(),
-            intervention_point: InterventionPoint::Fork,
-        }
-    }
-
-    /// Check if this decision has any ranked candidates.
-    pub fn has_candidates(&self) -> bool {
-        !self.ranked_candidates.is_empty()
-    }
-
-    /// Get the top-ranked candidate.
-    pub fn top_candidate(&self) -> Option<&RankedCandidate> {
-        self.ranked_candidates.first()
-    }
-
-    /// Get node IDs in ranked order.
-    pub fn ranked_node_ids(&self) -> Vec<NodeId> {
-        self.ranked_candidates.iter().map(|c| c.node_id).collect()
-    }
-
-    /// Check if the decision indicates an answer was found.
-    pub fn found_answer(&self) -> bool {
-        matches!(self.direction, SearchDirection::FoundAnswer { .. })
-    }
-
-    /// Check if the decision indicates backtracking is needed.
-    pub fn needs_backtrack(&self) -> bool {
-        matches!(self.direction, SearchDirection::Backtrack { .. })
-    }
-}
-
-/// A ranked candidate node with score and optional reason.
-#[derive(Debug, Clone, Serialize, Deserialize)]
-pub struct RankedCandidate {
-    /// The node ID.
-    pub node_id: NodeId,
-    /// Relevance score (0.0 - 1.0).
-    pub score: f32,
-    /// Optional reason for this ranking.
-    pub reason: Option<String>,
-}
-
-impl RankedCandidate {
-    /// Create a new ranked candidate.
-    pub fn new(node_id: NodeId, score: f32) -> Self {
-        Self {
-            node_id,
-            score,
-            reason: None,
-        }
-    }
-
-    /// Create with a reason.
-    pub fn with_reason(node_id: NodeId, score: f32, reason: impl Into<String>) -> Self {
-        Self {
-            node_id,
-            score,
-            reason: Some(reason.into()),
-        }
-    }
-
-    /// Set the reason for this ranking.
-    pub fn reason(mut self, reason: impl Into<String>) -> Self {
-        self.reason = Some(reason.into());
-        self
-    }
-}
-
-/// Search direction recommendation from Pilot.
-///
-/// Indicates where the search should go next.
-#[derive(Debug, Clone, Serialize, Deserialize)]
-pub enum SearchDirection {
-    /// Continue deeper into the current branch.
-    GoDeeper {
-        /// Reason for going deeper.
-        reason: String,
-    },
-    /// Explore sibling nodes at the same level.
-    ExploreSiblings {
-        /// Recommended siblings to explore.
-        recommended: Vec<NodeId>,
-    },
-    /// Backtrack to parent and try other branches.
-    Backtrack {
-        /// Reason for backtracking.
-        reason: String,
-        /// Alternative branches to try.
-        alternative_branches: Vec<NodeId>,
-    },
-    /// Jump to a non-local node (global navigation).
-    JumpTo {
-        /// Target node to jump to.
-        target: NodeId,
-        /// Reason for the jump.
-        reason: String,
-    },
-    /// Current node contains the answer.
-    FoundAnswer {
-        /// Confidence that this is the answer.
-        confidence: f32,
-    },
-}
-
-impl SearchDirection {
-    /// Create a GoDeeper direction.
-    pub fn go_deeper(reason: impl Into<String>) -> Self {
-        Self::GoDeeper {
-            reason: reason.into(),
-        }
-    }
-
-    /// Create a Backtrack direction.
-    pub fn backtrack(reason: impl Into<String>, alternatives: Vec<NodeId>) -> Self {
-        Self::Backtrack {
-            reason: reason.into(),
-            alternative_branches: alternatives,
-        }
-    }
-
-    /// Create a JumpTo direction.
-    pub fn jump_to(target: NodeId, reason: impl Into<String>) -> Self {
-        Self::JumpTo {
-            target,
-            reason: reason.into(),
-        }
-    }
-
-    /// Create a FoundAnswer direction.
-    pub fn found_answer(confidence: f32) -> Self {
-        Self::FoundAnswer { confidence }
-    }
-}
-
-/// The point in search where Pilot intervenes.
-#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize, Default)]
-pub enum InterventionPoint {
-    /// Before search begins (initial guidance).
-    Start,
-    /// At a fork with multiple candidates.
-    #[default]
-    Fork,
-    /// During backtracking after a dead end.
-    Backtrack,
-    /// Evaluating a specific node for relevance.
-    Evaluate,
-    /// Binary pruning — quick yes/no relevance filter for wide nodes.
-    Prune,
-}
-
-impl InterventionPoint {
-    /// Get a human-readable name for this point.
-    pub fn name(&self) -> &'static str {
-        match self {
-            Self::Start => "start",
-            Self::Fork => "fork",
-            Self::Backtrack => "backtrack",
-            Self::Evaluate => "evaluate",
-            Self::Prune => "prune",
-        }
-    }
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-    use indextree::Arena;
-
-    fn create_test_node_ids(count: usize) -> Vec<NodeId> {
-        let mut arena = Arena::new();
-        let mut ids = Vec::new();
-        for i in 0..count {
-            let node = crate::document::TreeNode {
-                title: format!("Node {}", i),
-                structure: String::new(),
-                content: String::new(),
-                summary: String::new(),
-                depth: 0,
-                start_index: 1,
-                end_index: 1,
-                start_page: None,
-                end_page: None,
-                node_id: None,
-                physical_index: None,
-                token_count: None,
-                references: Vec::new(),
-            };
-            ids.push(NodeId(arena.new_node(node)));
-        }
-        ids
-    }
-
-    #[test]
-    fn test_pilot_decision_default() {
-        let decision = PilotDecision::default();
-        assert!(!decision.has_candidates());
-        assert!(decision.top_candidate().is_none());
-        assert!(!decision.found_answer());
-        assert!(!decision.needs_backtrack());
-    }
-
-    #[test]
-    fn test_pilot_decision_preserve_order() {
-        let node_ids = create_test_node_ids(2);
-        let decision = PilotDecision::preserve_order(&node_ids);
-
-        assert!(decision.has_candidates());
-        assert_eq!(decision.ranked_candidates.len(), 2);
-        assert_eq!(decision.confidence, 0.0);
-    }
-
-    #[test]
-    fn test_ranked_candidate() {
-        let node_ids = create_test_node_ids(1);
-        let candidate = RankedCandidate::new(node_ids[0], 0.8);
-        assert_eq!(candidate.score, 0.8);
-        assert!(candidate.reason.is_none());
-
-        let candidate_with_reason = RankedCandidate::with_reason(node_ids[0], 0.9, "test reason");
-        assert_eq!(candidate_with_reason.score, 0.9);
-        assert_eq!(
-            candidate_with_reason.reason,
-            Some("test reason".to_string())
-        );
-    }
-
-    #[test]
-    fn test_search_direction_constructors() {
-        let deeper = SearchDirection::go_deeper("test");
-        assert!(matches!(deeper, SearchDirection::GoDeeper { .. }));
-
-        let found = SearchDirection::found_answer(0.9);
-        assert!(matches!(
-            found,
-            SearchDirection::FoundAnswer { confidence: 0.9 }
-        ));
-    }
-
-    #[test]
-    fn test_intervention_point() {
-        assert_eq!(InterventionPoint::Start.name(), "start");
-        assert_eq!(InterventionPoint::Fork.name(), "fork");
-        assert_eq!(InterventionPoint::Backtrack.name(), "backtrack");
-        assert_eq!(InterventionPoint::Evaluate.name(), "evaluate");
-    }
-}
diff --git a/rust/src/retrieval/pilot/decision_scorer.rs b/rust/src/retrieval/pilot/decision_scorer.rs
deleted file mode 100644
index 0169169a..00000000
--- a/rust/src/retrieval/pilot/decision_scorer.rs
+++ /dev/null
@@ -1,338 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Shared Pilot-as-primary scoring helper.
-//!
-//! All three search algorithms (PurePilot, Beam, MCTS) use this module
-//! to score child candidates. Pilot is the primary scorer; NodeScorer
-//! provides a fallback when Pilot is unavailable or budget is exhausted.
-//!
-//! # Caching
-//!
-//! Pilot decisions are cached by `(query, parent_node_id)` to avoid
-//! redundant LLM calls when the same node is revisited (e.g. MCTS
-//! selection phase revisits a node multiple times).
-
-use std::collections::{HashMap, HashSet};
-use std::sync::Arc;
-use tokio::sync::Mutex;
-
-use super::scorer::{NodeScorer, ScoringContext};
-use crate::document::{DocumentTree, NodeId};
-use crate::retrieval::pilot::{Pilot, PilotDecision, SearchState};
-
-/// Cache key: (query_fingerprint, parent_node_id).
-type CacheKey = (u64, NodeId);
-
-/// Shared Pilot decision cache.
-///
-/// Thread-safe, query-scoped cache that stores Pilot decisions keyed by
-/// (query hash, parent node ID). Prevents redundant LLM calls when the
-/// same (query, node) pair is scored multiple times (common in MCTS).
-#[derive(Debug, Clone, Default)]
-pub struct PilotDecisionCache {
-    inner: Arc<Mutex<HashMap<CacheKey, PilotDecision>>>,
-}
-
-impl PilotDecisionCache {
-    /// Create a new empty cache.
-    pub fn new() -> Self {
-        Self::default()
-    }
-
-    /// Compute cache key from query and parent node.
-    fn cache_key(query: &str, parent: NodeId) -> CacheKey {
-        use std::hash::{Hash, Hasher};
-        let mut hasher = std::collections::hash_map::DefaultHasher::new();
-        query.hash(&mut hasher);
-        (hasher.finish(), parent)
-    }
-
-    /// Try to get a cached decision.
-    pub async fn get(&self, query: &str, parent: NodeId) -> Option<PilotDecision> {
-        let key = Self::cache_key(query, parent);
-        let cache = self.inner.lock().await;
-        cache.get(&key).cloned()
-    }
-
-    /// Store a decision in the cache.
-    pub async fn put(&self, query: &str, parent: NodeId, decision: &PilotDecision) {
-        let key = Self::cache_key(query, parent);
-        let mut cache = self.inner.lock().await;
-        cache.entry(key).or_insert_with(|| decision.clone());
-    }
-
-    /// Clear the cache.
-    pub async fn clear(&self) {
-        self.inner.lock().await.clear();
-    }
-}
-
-/// Score child candidates using Pilot as primary, NodeScorer as fallback.
-///
-/// Pilot decisions are cached by (query, parent_node_id). Subsequent calls
-/// with the same arguments return cached results without LLM calls.
-///
-/// `pilot_weight` controls how much Pilot vs NodeScorer contributes:
-/// - 1.0 = PurePilot (only Pilot scores matter)
-/// - 0.7 = Beam (Pilot dominant, NodeScorer as secondary)
-/// - 0.5 = MCTS prior (balanced)
-pub async fn score_candidates(
-    tree: &DocumentTree,
-    candidates: &[NodeId],
-    query: &str,
-    pilot: Option<&dyn Pilot>,
-    path: &[NodeId],
-    visited: &HashSet<NodeId>,
-    pilot_weight: f32,
-    cache: Option<&PilotDecisionCache>,
-    step_reasons: Option<&[Option<String>]>,
-) -> Vec<(NodeId, f32)> {
-    let scored = score_candidates_detailed(
-        tree,
-        candidates,
-        query,
-        pilot,
-        path,
-        visited,
-        pilot_weight,
-        cache,
-        step_reasons,
-    )
-    .await;
-    scored.into_iter().map(|s| (s.node_id, s.score)).collect()
-}
-
-/// A scored candidate with optional reasoning from the Pilot.
-#[derive(Debug, Clone)]
-pub struct ScoredCandidate {
-    /// The node ID.
-    pub node_id: NodeId,
-    /// Relevance score (0.0 - 1.0).
-    pub score: f32,
-    /// Reason the Pilot chose this node, if available.
-    pub reason: Option<String>,
-}
-
-/// Score child candidates and return detailed results with reasons.
-///
-/// Like [`score_candidates`] but preserves per-candidate reasoning
-/// from the Pilot. Use this when the search algorithm needs to
-/// record why each path step was taken (e.g., for beam search
-/// reasoning history).
-///
-/// # Pre-filtering
-///
-/// When a node has many children (exceeding `prefilter.threshold`),
-/// NodeScorer pre-filters candidates before sending to Pilot. This
-/// reduces LLM token cost and latency. Candidates filtered out still
-/// receive NodeScorer-only scores in the final merge, so no results
-/// are lost.
-pub async fn score_candidates_detailed(
-    tree: &DocumentTree,
-    candidates: &[NodeId],
-    query: &str,
-    pilot: Option<&dyn Pilot>,
-    path: &[NodeId],
-    visited: &HashSet<NodeId>,
-    pilot_weight: f32,
-    cache: Option<&PilotDecisionCache>,
-    step_reasons: Option<&[Option<String>]>,
-) -> Vec<ScoredCandidate> {
-    if candidates.is_empty() {
-        return Vec::new();
-    }
-
-    // If no Pilot, pure NodeScorer (no reasons available)
-    let Some(p) = pilot else {
-        return score_with_scorer_detailed(tree, candidates, query);
-    };
-
-    if !p.is_active() {
-        return score_with_scorer_detailed(tree, candidates, query);
-    }
-
-    // Determine parent node (last in path) for cache key
-    let parent = path.last().copied().unwrap_or(tree.root());
-
-    // === PRE-FILTERING ===
-    // When candidates exceed the threshold, use NodeScorer to narrow
-    // the set before sending to Pilot (LLM). Filtered-out candidates
-    // still get NodeScorer-only scores in the final merge below.
-    let prefilter_cfg = &p.config().prefilter;
-    let pilot_candidates: Vec<NodeId> = if prefilter_cfg.should_prefilter(candidates.len()) {
-        let scorer = NodeScorer::new(ScoringContext::new(query));
-        let mut sorted = scorer.score_and_sort(tree, candidates);
-        let pilot_max = prefilter_cfg.max_to_pilot.min(candidates.len());
-        sorted.truncate(pilot_max);
-        let ids: Vec<NodeId> = sorted.into_iter().map(|(id, _)| id).collect();
-        tracing::debug!(
-            "Pre-filtered: {} candidates -> {} to Pilot (threshold={})",
-            candidates.len(),
-            ids.len(),
-            prefilter_cfg.threshold,
-        );
-        ids
-    } else {
-        candidates.to_vec()
-    };
-
-    // === BINARY PRUNING ===
-    // After P2 pre-filtering, if candidates still exceed the prune
-    // threshold, ask Pilot for a quick yes/no filter before the
-    // expensive full-scoring call.
-    let prune_cfg = &p.config().prune;
-    let pilot_candidates = if prune_cfg.should_prune(pilot_candidates.len()) {
-        let mut prune_state = SearchState::new(tree, query, path, &pilot_candidates, visited);
-        prune_state.step_reasons = step_reasons;
-
-        if let Some(relevant_ids) = p.binary_prune(&prune_state).await {
-            let relevant_set: HashSet<NodeId> = relevant_ids.iter().copied().collect();
-            let mut pruned: Vec<NodeId> = pilot_candidates
-                .iter()
-                .filter(|id| relevant_set.contains(id))
-                .copied()
-                .collect();
-
-            // Enforce min_keep to prevent over-aggressive pruning
-            if pruned.len() < prune_cfg.min_keep {
-                // Fill from the top of pilot_candidates that weren't pruned
-                for id in &pilot_candidates {
-                    if pruned.len() >= prune_cfg.min_keep {
-                        break;
-                    }
-                    if !relevant_set.contains(id) {
-                        pruned.push(*id);
-                    }
-                }
-            }
-
-            tracing::debug!(
-                "Binary prune: {} candidates -> {} relevant (min_keep={})",
-                pilot_candidates.len(),
-                pruned.len(),
-                prune_cfg.min_keep,
-            );
-            pruned
-        } else {
-            pilot_candidates
-        }
-    } else {
-        pilot_candidates
-    };
-
-    // Check cache first
-    let decision = if let Some(c) = cache {
-        if let Some(cached) = c.get(query, parent).await {
-            tracing::trace!("Pilot cache hit for parent={:?}", parent);
-            cached
-        } else {
-            let mut state = SearchState::new(tree, query, path, &pilot_candidates, visited);
-            state.step_reasons = step_reasons;
-            let d = p.decide(&state).await;
-            c.put(query, parent, &d).await;
-            d
-        }
-    } else {
-        let mut state = SearchState::new(tree, query, path, &pilot_candidates, visited);
-        state.step_reasons = step_reasons;
-        p.decide(&state).await
-    };
-
-    // Build Pilot score + reason map
-    let mut pilot_data: HashMap<NodeId, (f32, Option<String>)> = HashMap::new();
-    for ranked in &decision.ranked_candidates {
-        pilot_data.insert(ranked.node_id, (ranked.score, ranked.reason.clone()));
-    }
-
-    // Compute NodeScorer fallback scores for ALL original candidates
-    let scorer_weight = 1.0 - pilot_weight;
-    let confidence = decision.confidence;
-    let effective_pilot = pilot_weight * confidence;
-
-    let scorer = NodeScorer::new(ScoringContext::new(query));
-
-    let mut scored: Vec<ScoredCandidate> = candidates
-        .iter()
-        .map(|&node_id| {
-            let algo_score = scorer.score(tree, node_id);
-            let (p_score, reason) = pilot_data
-                .get(&node_id)
-                .map(|(s, r)| (*s, r.clone()))
-                .unwrap_or((0.0, None));
-
-            let final_score = if effective_pilot > 0.0 && pilot_data.contains_key(&node_id) {
-                (effective_pilot * p_score + scorer_weight * algo_score)
-                    / (effective_pilot + scorer_weight)
-            } else {
-                algo_score
-            };
-
-            ScoredCandidate {
-                node_id,
-                score: final_score,
-                reason,
-            }
-        })
-        .collect();
-
-    scored.sort_by(|a, b| {
-        b.score
-            .partial_cmp(&a.score)
-            .unwrap_or(std::cmp::Ordering::Equal)
-    });
-    scored
-}
-
-/// Pure NodeScorer fallback.
-fn score_with_scorer(
-    tree: &DocumentTree,
-    candidates: &[NodeId],
-    query: &str,
-) -> Vec<(NodeId, f32)> {
-    let scorer = NodeScorer::new(ScoringContext::new(query));
-    scorer.score_and_sort(tree, candidates)
-}
-
-/// Pure NodeScorer fallback returning detailed results (no reasons).
-fn score_with_scorer_detailed(
-    tree: &DocumentTree,
-    candidates: &[NodeId],
-    query: &str,
-) -> Vec<ScoredCandidate> {
-    let scorer = NodeScorer::new(ScoringContext::new(query));
-    scorer
-        .score_and_sort(tree, candidates)
-        .into_iter()
-        .map(|(node_id, score)| ScoredCandidate {
-            node_id,
-            score,
-            reason: None,
-        })
-        .collect()
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-    use crate::document::TreeNode;
-    use indextree::Arena;
-
-    /// Helper to create a NodeId from an Arena for tests.
-    fn make_node_id(arena: &mut Arena<TreeNode>) -> NodeId {
-        NodeId(arena.new_node(TreeNode::default()))
-    }
-
-    #[test]
-    fn test_cache_key_deterministic() {
-        let mut arena = Arena::new();
-        let nid = make_node_id(&mut arena);
-
-        let key1 = PilotDecisionCache::cache_key("hello", nid);
-        let key2 = PilotDecisionCache::cache_key("hello", nid);
-        assert_eq!(key1, key2);
-
-        let key3 = PilotDecisionCache::cache_key("world", nid);
-        assert_ne!(key1, key3);
-    }
-}
diff --git a/rust/src/retrieval/pilot/fallback.rs b/rust/src/retrieval/pilot/fallback.rs
deleted file mode 100644
index da93354e..00000000
--- a/rust/src/retrieval/pilot/fallback.rs
+++ /dev/null
@@ -1,451 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Fallback manager for Pilot LLM calls.
-//!
-//! Implements layered fallback strategy:
-//! 1. Normal LLM call
-//! 2. Retry with exponential backoff
-//! 3. Simplified context (reduce tokens)
-//! 4. Algorithm-only mode (no LLM)
-
-use std::sync::atomic::{AtomicU8, AtomicUsize, Ordering};
-use std::time::Duration;
-use tracing::{debug, warn};
-
-/// Fallback level indicating current degradation state.
-#[derive(Debug, Clone, Copy, PartialEq, Eq)]
-pub enum FallbackLevel {
-    /// Normal operation - LLM calls working.
-    Normal = 0,
-    /// Retrying - transient failures, using backoff.
-    Retry = 1,
-    /// Simplified - using reduced context.
-    Simplified = 2,
-    /// Algorithm only - LLM unavailable.
-    AlgorithmOnly = 3,
-}
-
-impl Default for FallbackLevel {
-    fn default() -> Self {
-        Self::Normal
-    }
-}
-
-impl From<u8> for FallbackLevel {
-    fn from(value: u8) -> Self {
-        match value {
-            0 => Self::Normal,
-            1 => Self::Retry,
-            2 => Self::Simplified,
-            _ => Self::AlgorithmOnly,
-        }
-    }
-}
-
-/// Configuration for fallback behavior.
-#[derive(Debug, Clone)]
-pub struct FallbackConfig {
-    /// Maximum retry attempts before escalating.
-    pub max_retries: usize,
-    /// Initial delay for exponential backoff (ms).
-    pub initial_delay_ms: u64,
-    /// Maximum delay for exponential backoff (ms).
-    pub max_delay_ms: u64,
-    /// Multiplier for exponential backoff.
-    pub backoff_multiplier: f64,
-    /// Consecutive failures before escalating level.
-    pub failures_before_escalate: usize,
-    /// Consecutive successes before de-escalating level.
-    pub successes_before_deescalate: usize,
-}
-
-impl Default for FallbackConfig {
-    fn default() -> Self {
-        Self {
-            max_retries: 3,
-            initial_delay_ms: 1000,
-            max_delay_ms: 10000,
-            backoff_multiplier: 2.0,
-            failures_before_escalate: 3,
-            successes_before_deescalate: 2,
-        }
-    }
-}
-
-/// Errors that can trigger fallback.
-#[derive(Debug, Clone, thiserror::Error)]
-pub enum FallbackError {
-    /// Network/timeout error (retryable).
-    #[error("Network error: {0}")]
-    Network(String),
-    /// Rate limit error (retryable with backoff).
-    #[error("Rate limited")]
-    RateLimited,
-    /// Token limit exceeded (need simplified context).
-    #[error("Token limit exceeded")]
-    TokenLimitExceeded,
-    /// LLM service unavailable (use algorithm).
-    #[error("LLM unavailable: {0}")]
-    Unavailable(String),
-    /// Parsing error (may use default).
-    #[error("Response parsing failed: {0}")]
-    ParseError(String),
-    /// All fallbacks exhausted.
-    #[error("All fallback strategies exhausted")]
-    Exhausted,
-}
-
-impl FallbackError {
-    /// Check if this error should trigger a retry.
-    pub fn is_retryable(&self) -> bool {
-        matches!(self, Self::Network(_) | Self::RateLimited)
-    }
-
-    /// Check if this error suggests using simplified context.
-    pub fn needs_simplification(&self) -> bool {
-        matches!(self, Self::TokenLimitExceeded)
-    }
-
-    /// Check if this error requires algorithm fallback.
-    pub fn needs_algorithm_fallback(&self) -> bool {
-        matches!(self, Self::Unavailable(_) | Self::Exhausted)
-    }
-}
-
-/// Statistics for fallback operations.
-#[derive(Debug, Clone, Default)]
-pub struct FallbackStats {
-    /// Total operations attempted.
-    pub total_attempts: usize,
-    /// Successful operations (no fallback needed).
-    pub successful: usize,
-    /// Operations that needed retry.
-    pub retried: usize,
-    /// Operations that needed simplified context.
-    pub simplified: usize,
-    /// Operations that fell back to algorithm.
-    pub algorithm_fallbacks: usize,
-    /// Current fallback level.
-    pub current_level: FallbackLevel,
-}
-
-/// Manager for handling LLM call failures with layered fallback.
-///
-/// Implements a 4-level fallback strategy:
-/// 1. Normal: Direct LLM calls
-/// 2. Retry: Exponential backoff retry
-/// 3. Simplified: Reduced context to fit token limits
-/// 4. Algorithm: Pure algorithm mode, no LLM
-///
-/// # Example
-///
-/// ```rust,ignore
-/// use vectorless::retrieval::pilot::FallbackManager;
-///
-/// let manager = FallbackManager::new(FallbackConfig::default());
-///
-/// // Check current level
-/// if manager.current_level() == FallbackLevel::Normal {
-///     // Make LLM call
-/// }
-///
-/// // Record failure
-/// manager.record_failure(&error);
-/// ```
-pub struct FallbackManager {
-    config: FallbackConfig,
-    /// Current fallback level.
-    current_level: AtomicU8,
-    /// Consecutive failures at current level.
-    consecutive_failures: AtomicUsize,
-    /// Consecutive successes at current level.
-    consecutive_successes: AtomicUsize,
-    /// Total retry attempts in current session.
-    retry_attempts: AtomicUsize,
-}
-
-impl std::fmt::Debug for FallbackManager {
-    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
-        f.debug_struct("FallbackManager")
-            .field("config", &self.config)
-            .field("current_level", &self.current_level())
-            .field(
-                "consecutive_failures",
-                &self.consecutive_failures.load(Ordering::Relaxed),
-            )
-            .finish()
-    }
-}
-
-impl FallbackManager {
-    /// Create a new fallback manager with configuration.
-    pub fn new(config: FallbackConfig) -> Self {
-        Self {
-            config,
-            current_level: AtomicU8::new(0),
-            consecutive_failures: AtomicUsize::new(0),
-            consecutive_successes: AtomicUsize::new(0),
-            retry_attempts: AtomicUsize::new(0),
-        }
-    }
-
-    /// Create with default configuration.
-    pub fn with_defaults() -> Self {
-        Self::new(FallbackConfig::default())
-    }
-
-    /// Get current fallback level.
-    pub fn current_level(&self) -> FallbackLevel {
-        self.current_level.load(Ordering::Relaxed).into()
-    }
-
-    /// Check if we're at algorithm-only level.
-    pub fn is_algorithm_only(&self) -> bool {
-        self.current_level() == FallbackLevel::AlgorithmOnly
-    }
-
-    /// Check if we should use simplified context.
-    pub fn should_simplify(&self) -> bool {
-        matches!(
-            self.current_level(),
-            FallbackLevel::Simplified | FallbackLevel::AlgorithmOnly
-        )
-    }
-
-    /// Get delay for next retry based on attempt number.
-    pub fn retry_delay(&self, attempt: usize) -> Duration {
-        let delay = self.config.initial_delay_ms as f64
-            * self.config.backoff_multiplier.powi(attempt as i32);
-        let delay = delay.min(self.config.max_delay_ms as f64);
-        Duration::from_millis(delay as u64)
-    }
-
-    /// Record a successful operation.
-    ///
-    /// May de-escalate the fallback level after consecutive successes.
-    pub fn record_success(&self) {
-        self.consecutive_failures.store(0, Ordering::Relaxed);
-
-        let successes = self.consecutive_successes.fetch_add(1, Ordering::Relaxed) + 1;
-
-        // De-escalate after enough consecutive successes
-        if successes >= self.config.successes_before_deescalate {
-            let current = self.current_level.load(Ordering::Relaxed);
-            if current > 0 {
-                self.current_level.fetch_sub(1, Ordering::Relaxed);
-                debug!("Fallback level de-escalated to {:?}", self.current_level());
-            }
-            self.consecutive_successes.store(0, Ordering::Relaxed);
-        }
-    }
-
-    /// Record a failure and potentially escalate level.
-    ///
-    /// Returns the recommended action.
-    pub fn record_failure(&self, error: &FallbackError) -> FallbackAction {
-        self.consecutive_successes.store(0, Ordering::Relaxed);
-
-        // Check if we should escalate
-        let failures = self.consecutive_failures.fetch_add(1, Ordering::Relaxed) + 1;
-
-        if failures >= self.config.failures_before_escalate {
-            self.escalate_level();
-            self.consecutive_failures.store(0, Ordering::Relaxed);
-        }
-
-        // Determine action based on error and current level
-        match error {
-            FallbackError::Network(_) | FallbackError::RateLimited => {
-                if self.retry_attempts.load(Ordering::Relaxed) < self.config.max_retries {
-                    FallbackAction::Retry
-                } else {
-                    FallbackAction::Escalate
-                }
-            }
-            FallbackError::TokenLimitExceeded => FallbackAction::Simplify,
-            FallbackError::Unavailable(_) | FallbackError::Exhausted => {
-                FallbackAction::UseAlgorithm
-            }
-            FallbackError::ParseError(_) => {
-                // Try default decision, don't escalate
-                FallbackAction::UseDefault
-            }
-        }
-    }
-
-    /// Escalate to next fallback level.
-    fn escalate_level(&self) {
-        let current = self.current_level.load(Ordering::Relaxed);
-        if current < 3 {
-            self.current_level.fetch_add(1, Ordering::Relaxed);
-            warn!("Fallback level escalated to {:?}", self.current_level());
-        }
-    }
-
-    /// Start a retry attempt.
-    pub fn start_retry(&self) {
-        self.retry_attempts.fetch_add(1, Ordering::Relaxed);
-    }
-
-    /// Reset retry counter (after successful operation).
-    pub fn reset_retry_count(&self) {
-        self.retry_attempts.store(0, Ordering::Relaxed);
-    }
-
-    /// Reset all state for new query.
-    pub fn reset(&self) {
-        self.current_level.store(0, Ordering::Relaxed);
-        self.consecutive_failures.store(0, Ordering::Relaxed);
-        self.consecutive_successes.store(0, Ordering::Relaxed);
-        self.retry_attempts.store(0, Ordering::Relaxed);
-    }
-
-    /// Get current statistics.
-    pub fn stats(&self) -> FallbackStats {
-        FallbackStats {
-            current_level: self.current_level(),
-            ..Default::default()
-        }
-    }
-
-    /// Get the configuration.
-    pub fn config(&self) -> &FallbackConfig {
-        &self.config
-    }
-}
-
-/// Action to take after a failure.
-#[derive(Debug, Clone, Copy, PartialEq, Eq)]
-pub enum FallbackAction {
-    /// Retry the operation (with backoff).
-    Retry,
-    /// Simplify context and retry.
-    Simplify,
-    /// Escalate to next fallback level.
-    Escalate,
-    /// Use algorithm-only mode.
-    UseAlgorithm,
-    /// Use a default decision.
-    UseDefault,
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-
-    #[test]
-    fn test_fallback_level_conversion() {
-        assert_eq!(FallbackLevel::from(0), FallbackLevel::Normal);
-        assert_eq!(FallbackLevel::from(1), FallbackLevel::Retry);
-        assert_eq!(FallbackLevel::from(2), FallbackLevel::Simplified);
-        assert_eq!(FallbackLevel::from(3), FallbackLevel::AlgorithmOnly);
-        assert_eq!(FallbackLevel::from(4), FallbackLevel::AlgorithmOnly);
-    }
-
-    #[test]
-    fn test_fallback_manager_creation() {
-        let manager = FallbackManager::with_defaults();
-        assert_eq!(manager.current_level(), FallbackLevel::Normal);
-        assert!(!manager.is_algorithm_only());
-        assert!(!manager.should_simplify());
-    }
-
-    #[test]
-    fn test_retry_delay() {
-        let manager = FallbackManager::with_defaults();
-
-        let d0 = manager.retry_delay(0);
-        let d1 = manager.retry_delay(1);
-        let d2 = manager.retry_delay(2);
-
-        assert!(d1 > d0);
-        assert!(d2 > d1);
-    }
-
-    #[test]
-    fn test_retry_delay_max() {
-        let config = FallbackConfig {
-            max_delay_ms: 5000,
-            ..Default::default()
-        };
-        let manager = FallbackManager::new(config);
-
-        // High attempt should cap at max
-        let delay = manager.retry_delay(10);
-        assert!(delay.as_millis() <= 5000);
-    }
-
-    #[test]
-    fn test_record_success() {
-        let manager = FallbackManager::with_defaults();
-        manager.current_level.store(1, Ordering::Relaxed);
-
-        // Need multiple successes to de-escalate
-        for _ in 0..manager.config.successes_before_deescalate {
-            manager.record_success();
-        }
-
-        assert_eq!(manager.current_level(), FallbackLevel::Normal);
-    }
-
-    #[test]
-    fn test_record_failure_escalate() {
-        let manager = FallbackManager::with_defaults();
-
-        // Trigger failures to escalate
-        for _ in 0..manager.config.failures_before_escalate {
-            let action = manager.record_failure(&FallbackError::Network("test".to_string()));
-            assert!(matches!(
-                action,
-                FallbackAction::Retry | FallbackAction::Escalate
-            ));
-        }
-
-        assert_eq!(manager.current_level(), FallbackLevel::Retry);
-    }
-
-    #[test]
-    fn test_record_failure_token_limit() {
-        let manager = FallbackManager::with_defaults();
-
-        let action = manager.record_failure(&FallbackError::TokenLimitExceeded);
-        assert_eq!(action, FallbackAction::Simplify);
-    }
-
-    #[test]
-    fn test_record_failure_unavailable() {
-        let manager = FallbackManager::with_defaults();
-
-        let action = manager.record_failure(&FallbackError::Unavailable("test".to_string()));
-        assert_eq!(action, FallbackAction::UseAlgorithm);
-    }
-
-    #[test]
-    fn test_reset() {
-        let manager = FallbackManager::with_defaults();
-
-        // Escalate level
-        manager.current_level.store(3, Ordering::Relaxed);
-        manager.consecutive_failures.store(5, Ordering::Relaxed);
-
-        manager.reset();
-
-        assert_eq!(manager.current_level(), FallbackLevel::Normal);
-        assert_eq!(manager.consecutive_failures.load(Ordering::Relaxed), 0);
-    }
-
-    #[test]
-    fn test_error_retryable() {
-        assert!(FallbackError::Network("test".to_string()).is_retryable());
-        assert!(FallbackError::RateLimited.is_retryable());
-        assert!(!FallbackError::TokenLimitExceeded.is_retryable());
-        assert!(!FallbackError::Unavailable("test".to_string()).is_retryable());
-    }
-
-    #[test]
-    fn test_error_needs_simplification() {
-        assert!(FallbackError::TokenLimitExceeded.needs_simplification());
-        assert!(!FallbackError::Network("test".to_string()).needs_simplification());
-    }
-}
diff --git a/rust/src/retrieval/pilot/feedback.rs b/rust/src/retrieval/pilot/feedback.rs
deleted file mode 100644
index 051a3f04..00000000
--- a/rust/src/retrieval/pilot/feedback.rs
+++ /dev/null
@@ -1,733 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Pilot feedback learning system.
-//!
-//! This module provides feedback collection and learning capabilities
-//! for the Pilot to improve its decision-making over time.
-//!
-//! # Architecture
-//!
-//! ```text
-//! ┌─────────────────────────────────────────────────────────────────┐
-//! │                    Feedback Learning System                      │
-//! ├─────────────────────────────────────────────────────────────────┤
-//! │                                                                  │
-//! │   ┌─────────────┐   ┌─────────────┐   ┌─────────────┐          │
-//! │   │  Feedback   │   │  Feedback   │   │    Pilot    │          │
-//! │   │  Record     │──▶│   Store     │──▶│   Learner   │          │
-//! │   └─────────────┘   └─────────────┘   └─────────────┘          │
-//! │                            │                │                   │
-//! │                            ▼                ▼                   │
-//! │                     [Persistence]    [Decision Adjustment]      │
-//! │                                                                  │
-//! └─────────────────────────────────────────────────────────────────┘
-//! ```
-//!
-//! # Example
-//!
-//! ```rust,ignore
-//! use vectorless::retrieval::pilot::feedback::{FeedbackStore, FeedbackRecord};
-//!
-//! let store = FeedbackStore::new("./feedback_store");
-//!
-//! // Record feedback
-//! let record = FeedbackRecord::new(decision_id, was_correct, confidence);
-//! store.record(record).await?;
-//!
-//! // Learn from feedback
-//! let learner = PilotLearner::new(store);
-//! let adjustment = learner.get_adjustment(&context);
-//! ```
-
-use std::collections::HashMap;
-use std::path::Path;
-use std::sync::Arc;
-use std::sync::atomic::{AtomicU64, Ordering};
-
-use serde::{Deserialize, Serialize};
-use tracing::{debug, info};
-
-use super::decision::InterventionPoint;
-
-/// Unique identifier for a feedback record.
-#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Serialize, Deserialize)]
-pub struct FeedbackId(pub u64);
-
-/// Unique identifier for a decision.
-#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Serialize, Deserialize)]
-pub struct DecisionId(pub u64);
-
-/// Feedback record for a Pilot decision.
-#[derive(Debug, Clone, Serialize, Deserialize)]
-pub struct FeedbackRecord {
-    /// Unique feedback ID.
-    pub id: FeedbackId,
-    /// Associated decision ID.
-    pub decision_id: DecisionId,
-    /// Whether the decision was correct.
-    pub was_correct: bool,
-    /// Pilot's confidence at decision time.
-    pub pilot_confidence: f64,
-    /// Intervention point type.
-    pub intervention_point: InterventionPoint,
-    /// Query hash for grouping similar queries.
-    pub query_hash: u64,
-    /// Node path hash for context.
-    pub path_hash: u64,
-    /// Timestamp of feedback.
-    pub timestamp_ms: u64,
-    /// Optional user comment.
-    pub comment: Option<String>,
-}
-
-impl FeedbackRecord {
-    /// Create a new feedback record.
-    pub fn new(
-        decision_id: DecisionId,
-        was_correct: bool,
-        pilot_confidence: f64,
-        intervention_point: InterventionPoint,
-        query_hash: u64,
-        path_hash: u64,
-    ) -> Self {
-        static COUNTER: AtomicU64 = AtomicU64::new(1);
-        let id = FeedbackId(COUNTER.fetch_add(1, Ordering::Relaxed));
-        let timestamp_ms = std::time::SystemTime::now()
-            .duration_since(std::time::UNIX_EPOCH)
-            .map(|d| d.as_millis() as u64)
-            .unwrap_or(0);
-
-        Self {
-            id,
-            decision_id,
-            was_correct,
-            pilot_confidence,
-            intervention_point,
-            query_hash,
-            path_hash,
-            timestamp_ms,
-            comment: None,
-        }
-    }
-
-    /// Add a comment to the feedback.
-    pub fn with_comment(mut self, comment: impl Into<String>) -> Self {
-        self.comment = Some(comment.into());
-        self
-    }
-}
-
-/// Statistics for a specific context (query/path combination).
-#[derive(Debug, Clone, Default, Serialize, Deserialize)]
-pub struct ContextStats {
-    /// Total decisions in this context.
-    pub total: u64,
-    /// Correct decisions in this context.
-    pub correct: u64,
-    /// Average confidence when correct.
-    pub avg_confidence_correct: f64,
-    /// Average confidence when incorrect.
-    pub avg_confidence_incorrect: f64,
-}
-
-impl ContextStats {
-    /// Get accuracy for this context.
-    pub fn accuracy(&self) -> f64 {
-        if self.total == 0 {
-            0.0
-        } else {
-            self.correct as f64 / self.total as f64
-        }
-    }
-
-    /// Record a new feedback.
-    fn record(&mut self, was_correct: bool, confidence: f64) {
-        self.total += 1;
-        if was_correct {
-            self.correct += 1;
-            // Running average
-            self.avg_confidence_correct = (self.avg_confidence_correct * (self.correct - 1) as f64
-                + confidence)
-                / self.correct as f64;
-        } else {
-            let incorrect = self.total - self.correct;
-            self.avg_confidence_incorrect =
-                (self.avg_confidence_incorrect * (incorrect - 1) as f64 + confidence)
-                    / incorrect as f64;
-        }
-    }
-}
-
-/// Statistics for an intervention point type.
-#[derive(Debug, Clone, Default, Serialize, Deserialize)]
-pub struct InterventionStats {
-    /// Start intervention stats.
-    pub start: ContextStats,
-    /// Fork intervention stats.
-    pub fork: ContextStats,
-    /// Backtrack intervention stats.
-    pub backtrack: ContextStats,
-    /// Evaluate intervention stats.
-    pub evaluate: ContextStats,
-}
-
-impl InterventionStats {
-    /// Get stats for a specific intervention point.
-    pub fn get(&self, point: InterventionPoint) -> &ContextStats {
-        match point {
-            InterventionPoint::Start => &self.start,
-            InterventionPoint::Fork => &self.fork,
-            InterventionPoint::Backtrack => &self.backtrack,
-            InterventionPoint::Evaluate => &self.evaluate,
-            InterventionPoint::Prune => &self.fork, // Prune reuses fork stats
-        }
-    }
-
-    /// Get mutable stats for a specific intervention point.
-    fn get_mut(&mut self, point: InterventionPoint) -> &mut ContextStats {
-        match point {
-            InterventionPoint::Start => &mut self.start,
-            InterventionPoint::Fork => &mut self.fork,
-            InterventionPoint::Backtrack => &mut self.backtrack,
-            InterventionPoint::Evaluate => &mut self.evaluate,
-            InterventionPoint::Prune => &mut self.fork, // Prune reuses fork stats
-        }
-    }
-}
-
-/// In-memory feedback store.
-///
-/// Stores feedback records and provides statistics for learning.
-/// Thread-safe for concurrent access.
-#[derive(Debug)]
-pub struct FeedbackStore {
-    /// All feedback records.
-    records: std::sync::RwLock<Vec<FeedbackRecord>>,
-    /// Statistics by intervention point.
-    intervention_stats: std::sync::RwLock<InterventionStats>,
-    /// Statistics by query hash.
-    query_stats: std::sync::RwLock<HashMap<u64, ContextStats>>,
-    /// Statistics by path hash.
-    path_stats: std::sync::RwLock<HashMap<u64, ContextStats>>,
-    /// Configuration.
-    config: FeedbackStoreConfig,
-}
-
-/// Configuration for feedback store.
-#[derive(Debug, Clone)]
-pub struct FeedbackStoreConfig {
-    /// Maximum records to keep in memory.
-    pub max_records: usize,
-    /// Enable persistence to disk.
-    pub persist: bool,
-    /// Path for persistence.
-    pub storage_path: Option<String>,
-}
-
-impl Default for FeedbackStoreConfig {
-    fn default() -> Self {
-        Self {
-            max_records: 10_000,
-            persist: false,
-            storage_path: None,
-        }
-    }
-}
-
-impl FeedbackStoreConfig {
-    /// Create config with persistence enabled.
-    pub fn with_persistence(path: impl Into<String>) -> Self {
-        Self {
-            max_records: 10_000,
-            persist: true,
-            storage_path: Some(path.into()),
-        }
-    }
-}
-
-impl FeedbackStore {
-    /// Create a new feedback store.
-    pub fn new(config: FeedbackStoreConfig) -> Self {
-        Self {
-            records: std::sync::RwLock::new(Vec::new()),
-            intervention_stats: std::sync::RwLock::new(InterventionStats::default()),
-            query_stats: std::sync::RwLock::new(HashMap::new()),
-            path_stats: std::sync::RwLock::new(HashMap::new()),
-            config,
-        }
-    }
-
-    /// Create an in-memory store without persistence.
-    pub fn in_memory() -> Self {
-        Self::new(FeedbackStoreConfig::default())
-    }
-
-    /// Record a feedback.
-    pub fn record(&self, feedback: FeedbackRecord) {
-        // Update intervention stats
-        {
-            let mut stats = self.intervention_stats.write().unwrap();
-            stats
-                .get_mut(feedback.intervention_point)
-                .record(feedback.was_correct, feedback.pilot_confidence);
-        }
-
-        // Update query stats
-        {
-            let mut stats = self.query_stats.write().unwrap();
-            stats
-                .entry(feedback.query_hash)
-                .or_default()
-                .record(feedback.was_correct, feedback.pilot_confidence);
-        }
-
-        // Update path stats
-        {
-            let mut stats = self.path_stats.write().unwrap();
-            stats
-                .entry(feedback.path_hash)
-                .or_default()
-                .record(feedback.was_correct, feedback.pilot_confidence);
-        }
-
-        // Store record
-        {
-            let mut records = self.records.write().unwrap();
-            records.push(feedback);
-
-            // Enforce max records limit
-            if records.len() > self.config.max_records {
-                let remove_count = records.len() - self.config.max_records;
-                records.drain(0..remove_count);
-            }
-        }
-
-        debug!(
-            total_records = self.records.read().unwrap().len(),
-            "Recorded feedback"
-        );
-    }
-
-    /// Get overall intervention statistics.
-    pub fn intervention_stats(&self) -> InterventionStats {
-        self.intervention_stats.read().unwrap().clone()
-    }
-
-    /// Get statistics for a specific query hash.
-    pub fn query_stats(&self, query_hash: u64) -> Option<ContextStats> {
-        self.query_stats.read().unwrap().get(&query_hash).cloned()
-    }
-
-    /// Get statistics for a specific path hash.
-    pub fn path_stats(&self, path_hash: u64) -> Option<ContextStats> {
-        self.path_stats.read().unwrap().get(&path_hash).cloned()
-    }
-
-    /// Get total number of feedback records.
-    pub fn total_records(&self) -> usize {
-        self.records.read().unwrap().len()
-    }
-
-    /// Get overall accuracy across all feedback.
-    pub fn overall_accuracy(&self) -> f64 {
-        let stats = self.intervention_stats.read().unwrap();
-        let total =
-            stats.start.total + stats.fork.total + stats.backtrack.total + stats.evaluate.total;
-        let correct = stats.start.correct
-            + stats.fork.correct
-            + stats.backtrack.correct
-            + stats.evaluate.correct;
-
-        if total == 0 {
-            0.0
-        } else {
-            correct as f64 / total as f64
-        }
-    }
-
-    /// Clear all feedback records.
-    pub fn clear(&self) {
-        self.records.write().unwrap().clear();
-        *self.intervention_stats.write().unwrap() = InterventionStats::default();
-        self.query_stats.write().unwrap().clear();
-        self.path_stats.write().unwrap().clear();
-    }
-
-    /// Persist feedback to disk (if configured).
-    pub fn persist(&self) -> std::io::Result<()> {
-        if !self.config.persist {
-            return Ok(());
-        }
-
-        let path = self.config.storage_path.as_ref().ok_or_else(|| {
-            std::io::Error::new(std::io::ErrorKind::NotFound, "No storage path configured")
-        })?;
-
-        let records = self.records.read().unwrap();
-        let json = serde_json::to_string_pretty(&*records)?;
-        std::fs::write(path, json)?;
-
-        info!(path = %path, records = records.len(), "Persisted feedback store");
-        Ok(())
-    }
-
-    /// Load feedback from disk (if configured).
-    pub fn load(&self) -> std::io::Result<()> {
-        if !self.config.persist {
-            return Ok(());
-        }
-
-        let path = self.config.storage_path.as_ref().ok_or_else(|| {
-            std::io::Error::new(std::io::ErrorKind::NotFound, "No storage path configured")
-        })?;
-
-        if !Path::new(path).exists() {
-            return Ok(());
-        }
-
-        let json = std::fs::read_to_string(path)?;
-        let records: Vec<FeedbackRecord> = serde_json::from_str(&json)?;
-
-        // Rebuild stats from records
-        for record in &records {
-            // Update intervention stats
-            self.intervention_stats
-                .write()
-                .unwrap()
-                .get_mut(record.intervention_point)
-                .record(record.was_correct, record.pilot_confidence);
-
-            // Update query stats
-            self.query_stats
-                .write()
-                .unwrap()
-                .entry(record.query_hash)
-                .or_default()
-                .record(record.was_correct, record.pilot_confidence);
-
-            // Update path stats
-            self.path_stats
-                .write()
-                .unwrap()
-                .entry(record.path_hash)
-                .or_default()
-                .record(record.was_correct, record.pilot_confidence);
-        }
-
-        *self.records.write().unwrap() = records;
-
-        info!(path = %path, "Loaded feedback store");
-        Ok(())
-    }
-}
-
-/// Decision adjustment based on learned feedback.
-#[derive(Debug, Clone, Copy)]
-pub struct DecisionAdjustment {
-    /// Confidence adjustment (add to pilot confidence).
-    pub confidence_delta: f64,
-    /// Whether to skip intervention (algorithm is confident).
-    pub skip_intervention: bool,
-    /// Weight to apply to algorithm score vs LLM score.
-    pub algorithm_weight: f64,
-}
-
-impl Default for DecisionAdjustment {
-    fn default() -> Self {
-        Self {
-            confidence_delta: 0.0,
-            skip_intervention: false,
-            algorithm_weight: 0.5,
-        }
-    }
-}
-
-/// Pilot learner that adjusts decisions based on feedback.
-///
-/// Uses collected feedback to:
-/// 1. Adjust confidence thresholds for different intervention points
-/// 2. Decide when to skip intervention (trust algorithm)
-/// 3. Adjust the weight between algorithm and LLM scores
-#[derive(Debug)]
-pub struct PilotLearner {
-    /// Feedback store reference.
-    store: Arc<FeedbackStore>,
-    /// Learning configuration.
-    config: LearnerConfig,
-}
-
-/// Configuration for the pilot learner.
-#[derive(Debug, Clone)]
-pub struct LearnerConfig {
-    /// Minimum samples required before adjusting.
-    pub min_samples: u64,
-    /// Threshold for high accuracy (trust LLM more).
-    pub high_accuracy_threshold: f64,
-    /// Threshold for low accuracy (trust algorithm more).
-    pub low_accuracy_threshold: f64,
-    /// Maximum confidence adjustment.
-    pub max_confidence_delta: f64,
-}
-
-impl Default for LearnerConfig {
-    fn default() -> Self {
-        Self {
-            min_samples: 10,
-            high_accuracy_threshold: 0.8,
-            low_accuracy_threshold: 0.5,
-            max_confidence_delta: 0.2,
-        }
-    }
-}
-
-impl PilotLearner {
-    /// Create a new learner with the given feedback store.
-    pub fn new(store: Arc<FeedbackStore>) -> Self {
-        Self {
-            store,
-            config: LearnerConfig::default(),
-        }
-    }
-
-    /// Create a learner with custom configuration.
-    pub fn with_config(store: Arc<FeedbackStore>, config: LearnerConfig) -> Self {
-        Self { store, config }
-    }
-
-    /// Get decision adjustment for a given context.
-    pub fn get_adjustment(
-        &self,
-        intervention_point: InterventionPoint,
-        query_hash: u64,
-        path_hash: u64,
-    ) -> DecisionAdjustment {
-        let mut adjustment = DecisionAdjustment::default();
-
-        // Get intervention-level stats
-        let intervention_stats = self.store.intervention_stats();
-        let point_stats = intervention_stats.get(intervention_point);
-
-        // Not enough samples, use defaults
-        if point_stats.total < self.config.min_samples {
-            return adjustment;
-        }
-
-        let accuracy = point_stats.accuracy();
-
-        // Adjust based on accuracy
-        if accuracy >= self.config.high_accuracy_threshold {
-            // High accuracy: trust LLM more
-            adjustment.confidence_delta = self.config.max_confidence_delta;
-            adjustment.algorithm_weight = 0.3; // Favor LLM
-        } else if accuracy <= self.config.low_accuracy_threshold {
-            // Low accuracy: trust algorithm more
-            adjustment.confidence_delta = -self.config.max_confidence_delta;
-            adjustment.algorithm_weight = 0.7; // Favor algorithm
-            adjustment.skip_intervention = accuracy < 0.3; // Very low accuracy, skip LLM
-        }
-
-        // Further refine based on query-specific stats
-        if let Some(query_stats) = self.store.query_stats(query_hash) {
-            if query_stats.total >= self.config.min_samples {
-                let query_accuracy = query_stats.accuracy();
-                // Adjust confidence based on query-specific performance
-                if query_accuracy > accuracy {
-                    adjustment.confidence_delta += 0.05;
-                } else if query_accuracy < accuracy {
-                    adjustment.confidence_delta -= 0.05;
-                }
-            }
-        }
-
-        // Further refine based on path-specific stats
-        if let Some(path_stats) = self.store.path_stats(path_hash) {
-            if path_stats.total >= self.config.min_samples {
-                let path_accuracy = path_stats.accuracy();
-                // If this path has very high accuracy, increase confidence
-                if path_accuracy > 0.9 {
-                    adjustment.confidence_delta += 0.05;
-                }
-            }
-        }
-
-        // Clamp confidence delta
-        adjustment.confidence_delta = adjustment.confidence_delta.clamp(
-            -self.config.max_confidence_delta,
-            self.config.max_confidence_delta,
-        );
-
-        adjustment
-    }
-
-    /// Get the feedback store.
-    pub fn store(&self) -> &FeedbackStore {
-        &self.store
-    }
-
-    /// Get overall accuracy.
-    pub fn overall_accuracy(&self) -> f64 {
-        self.store.overall_accuracy()
-    }
-
-    /// Check if enough feedback has been collected.
-    pub fn has_sufficient_data(&self) -> bool {
-        let stats = self.store.intervention_stats();
-        let total =
-            stats.start.total + stats.fork.total + stats.backtrack.total + stats.evaluate.total;
-        total >= self.config.min_samples
-    }
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-
-    fn make_hash(s: &str) -> u64 {
-        use std::collections::hash_map::DefaultHasher;
-        use std::hash::{Hash, Hasher};
-        let mut hasher = DefaultHasher::new();
-        s.hash(&mut hasher);
-        hasher.finish()
-    }
-
-    #[test]
-    fn test_feedback_record_creation() {
-        let record = FeedbackRecord::new(
-            DecisionId(1),
-            true,
-            0.85,
-            InterventionPoint::Fork,
-            make_hash("test query"),
-            make_hash("/root/child"),
-        );
-
-        assert!(record.was_correct);
-        assert!((record.pilot_confidence - 0.85).abs() < 0.01);
-        assert!(record.comment.is_none());
-    }
-
-    #[test]
-    fn test_feedback_record_with_comment() {
-        let record = FeedbackRecord::new(
-            DecisionId(1),
-            false,
-            0.5,
-            InterventionPoint::Start,
-            make_hash("test"),
-            make_hash("/"),
-        )
-        .with_comment("Wrong direction");
-
-        assert!(!record.was_correct);
-        assert_eq!(record.comment, Some("Wrong direction".to_string()));
-    }
-
-    #[test]
-    fn test_feedback_store_recording() {
-        let store = FeedbackStore::in_memory();
-
-        // Record some feedback
-        store.record(FeedbackRecord::new(
-            DecisionId(1),
-            true,
-            0.9,
-            InterventionPoint::Fork,
-            make_hash("query1"),
-            make_hash("/path1"),
-        ));
-
-        store.record(FeedbackRecord::new(
-            DecisionId(2),
-            false,
-            0.6,
-            InterventionPoint::Fork,
-            make_hash("query1"),
-            make_hash("/path1"),
-        ));
-
-        store.record(FeedbackRecord::new(
-            DecisionId(3),
-            true,
-            0.8,
-            InterventionPoint::Start,
-            make_hash("query2"),
-            make_hash("/"),
-        ));
-
-        assert_eq!(store.total_records(), 3);
-
-        let stats = store.intervention_stats();
-        assert_eq!(stats.fork.total, 2);
-        assert_eq!(stats.fork.correct, 1);
-        assert!((stats.fork.accuracy() - 0.5).abs() < 0.01);
-
-        assert_eq!(stats.start.total, 1);
-        assert_eq!(stats.start.correct, 1);
-    }
-
-    #[test]
-    fn test_pilot_learner_adjustment() {
-        let store = Arc::new(FeedbackStore::in_memory());
-        let learner = PilotLearner::new(store.clone());
-
-        // Not enough data, should return default
-        let adj = learner.get_adjustment(InterventionPoint::Fork, 0, 0);
-        assert!((adj.confidence_delta - 0.0).abs() < 0.01);
-        assert!(!adj.skip_intervention);
-
-        // Add enough feedback with high accuracy
-        for i in 0..15 {
-            store.record(FeedbackRecord::new(
-                DecisionId(i),
-                true, // All correct
-                0.9,
-                InterventionPoint::Fork,
-                make_hash("query"),
-                make_hash("/path"),
-            ));
-        }
-
-        // Now should adjust
-        let adj = learner.get_adjustment(InterventionPoint::Fork, make_hash("query"), 0);
-        assert!(adj.confidence_delta > 0.0); // Should boost confidence
-        assert!((adj.algorithm_weight - 0.3).abs() < 0.01); // Should favor LLM
-    }
-
-    #[test]
-    fn test_pilot_learner_low_accuracy() {
-        let store = Arc::new(FeedbackStore::in_memory());
-        let learner = PilotLearner::new(store.clone());
-
-        // Add enough feedback with low accuracy
-        for i in 0..15 {
-            store.record(FeedbackRecord::new(
-                DecisionId(i),
-                i % 3 == 0, // Only ~33% correct
-                0.5,
-                InterventionPoint::Fork,
-                0,
-                0,
-            ));
-        }
-
-        let adj = learner.get_adjustment(InterventionPoint::Fork, 0, 0);
-        assert!(adj.confidence_delta < 0.0); // Should reduce confidence
-        assert!(adj.algorithm_weight > 0.5); // Should favor algorithm
-    }
-
-    #[test]
-    fn test_context_stats() {
-        let mut stats = ContextStats::default();
-
-        stats.record(true, 0.9);
-        stats.record(true, 0.8);
-        stats.record(false, 0.6);
-
-        assert_eq!(stats.total, 3);
-        assert_eq!(stats.correct, 2);
-        assert!((stats.accuracy() - 0.666).abs() < 0.01);
-        assert!((stats.avg_confidence_correct - 0.85).abs() < 0.01);
-        assert!((stats.avg_confidence_incorrect - 0.6).abs() < 0.01);
-    }
-}
diff --git a/rust/src/retrieval/pilot/llm_pilot.rs b/rust/src/retrieval/pilot/llm_pilot.rs
deleted file mode 100644
index 289726c3..00000000
--- a/rust/src/retrieval/pilot/llm_pilot.rs
+++ /dev/null
@@ -1,847 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! LLM-based Pilot implementation.
-//!
-//! This module provides the main Pilot implementation that uses LLM
-//! for semantic navigation guidance.
-
-use async_trait::async_trait;
-use std::sync::Arc;
-use tracing::{debug, info, warn};
-
-use crate::document::{DocumentTree, NodeId};
-use crate::llm::memo::{MemoKey, MemoStore, MemoValue};
-use crate::llm::{LlmClient, LlmExecutor};
-use crate::utils::fingerprint::Fingerprint;
-
-use super::budget::BudgetController;
-use super::builder::ContextBuilder;
-use super::config::PilotConfig;
-use super::decision::{InterventionPoint, PilotDecision};
-use super::feedback::{FeedbackRecord, FeedbackStore, PilotLearner};
-use super::parser::ResponseParser;
-use super::prompts::PromptBuilder;
-use super::r#trait::{Pilot, SearchState};
-
-/// LLM-based Pilot implementation.
-///
-/// Uses an LLM client to provide semantic navigation guidance
-/// at key decision points during tree search.
-///
-/// # Architecture
-///
-/// ```text
-/// ┌─────────────────────────────────────────────────────────────┐
-/// │                         LlmPilot                             │
-/// ├─────────────────────────────────────────────────────────────┤
-/// │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐         │
-/// │  │ Context     │  │ Prompt      │  │ Response    │         │
-/// │  │ Builder     │─▶│ Builder     │─▶│ Parser      │         │
-/// │  └─────────────┘  └─────────────┘  └─────────────┘         │
-/// │                                                              │
-/// │  ┌─────────────┐  ┌───────────────────────┐                │
-/// │  │ Budget      │  │ LlmExecutor           │                │
-/// │  │ Controller  │  │ (throttle+retry+fall) │                │
-/// │  └─────────────┘  └───────────────────────┘                │
-/// │                                                              │
-/// │  ┌─────────────┐  ┌───────────────────────┐                │
-/// │  │ Memo        │  │ (cache LLM decisions) │                │
-/// │  │ Store       │  │                       │                │
-/// │  └─────────────┘  └───────────────────────┘                │
-/// └─────────────────────────────────────────────────────────────┘
-/// ```
-///
-/// # Example
-///
-/// ```rust,ignore
-/// use vectorless::retrieval::pilot::{LlmPilot, PilotConfig};
-/// use vectorless::llm::{LlmClient, LlmExecutor};
-///
-/// let client = LlmClient::for_model("gpt-4o-mini");
-/// let pilot = LlmPilot::new(client, PilotConfig::default());
-///
-/// // Or with executor for unified throttle/retry/fallback
-/// let executor = LlmExecutor::for_model("gpt-4o-mini");
-/// let pilot = LlmPilot::with_executor(executor, PilotConfig::default());
-///
-/// // Use in search
-/// if pilot.should_intervene(&state) {
-///     let decision = pilot.decide(&state).await;
-/// }
-/// ```
-pub struct LlmPilot {
-    /// LLM client for making requests (fallback when no executor).
-    client: LlmClient,
-    /// LLM executor with unified throttle/retry/fallback (optional).
-    executor: Option<Arc<LlmExecutor>>,
-    /// Pilot configuration.
-    config: PilotConfig,
-    /// Budget controller for per-level call tracking.
-    budget: BudgetController,
-    /// Shared pipeline budget — the primary budget source when set.
-    /// When available, Pilot checks this before making LLM calls and
-    /// records token consumption here.
-    pipeline_budget:
-        parking_lot::RwLock<Option<Arc<crate::retrieval::pipeline::RetrievalBudgetController>>>,
-    /// Context builder.
-    context_builder: ContextBuilder,
-    /// Prompt builder.
-    prompt_builder: PromptBuilder,
-    /// Response parser.
-    response_parser: ResponseParser,
-    /// Feedback learner for improving decisions (optional).
-    learner: Option<Arc<PilotLearner>>,
-    /// Memo store for caching decisions (optional).
-    memo_store: Option<MemoStore>,
-}
-
-impl std::fmt::Debug for LlmPilot {
-    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
-        f.debug_struct("LlmPilot")
-            .field("config", &self.config)
-            .field("budget", &self.budget.usage())
-            .finish()
-    }
-}
-
-impl LlmPilot {
-    /// Create a new LLM-based Pilot.
-    pub fn new(client: LlmClient, config: PilotConfig) -> Self {
-        let budget = BudgetController::new(config.budget.clone());
-        let token_budget = config.budget.max_tokens_per_call;
-
-        Self {
-            client,
-            executor: None,
-            config,
-            budget,
-            pipeline_budget: parking_lot::RwLock::new(None),
-            context_builder: ContextBuilder::new(token_budget),
-            prompt_builder: PromptBuilder::new(),
-            response_parser: ResponseParser::new(),
-            learner: None,
-            memo_store: None,
-        }
-    }
-
-    /// Create a Pilot with LlmExecutor for unified throttle/retry/fallback.
-    pub fn with_executor(executor: LlmExecutor, config: PilotConfig) -> Self {
-        let budget = BudgetController::new(config.budget.clone());
-        let token_budget = config.budget.max_tokens_per_call;
-        // Create a fallback client for backwards compatibility
-        let client = LlmClient::for_model(&executor.config().model);
-
-        Self {
-            client,
-            executor: Some(Arc::new(executor)),
-            config,
-            budget,
-            pipeline_budget: parking_lot::RwLock::new(None),
-            context_builder: ContextBuilder::new(token_budget),
-            prompt_builder: PromptBuilder::new(),
-            response_parser: ResponseParser::new(),
-            learner: None,
-            memo_store: None,
-        }
-    }
-
-    /// Create a Pilot with shared executor (for sharing throttle/fallback across pilots).
-    pub fn with_shared_executor(executor: Arc<LlmExecutor>, config: PilotConfig) -> Self {
-        let budget = BudgetController::new(config.budget.clone());
-        let token_budget = config.budget.max_tokens_per_call;
-        let client = LlmClient::for_model(&executor.config().model);
-
-        Self {
-            client,
-            executor: Some(executor),
-            config,
-            budget,
-            pipeline_budget: parking_lot::RwLock::new(None),
-            context_builder: ContextBuilder::new(token_budget),
-            prompt_builder: PromptBuilder::new(),
-            response_parser: ResponseParser::new(),
-            learner: None,
-            memo_store: None,
-        }
-    }
-
-    /// Create with custom builders.
-    pub fn with_builders(
-        client: LlmClient,
-        config: PilotConfig,
-        context_builder: ContextBuilder,
-        prompt_builder: PromptBuilder,
-    ) -> Self {
-        let budget = BudgetController::new(config.budget.clone());
-
-        Self {
-            client,
-            executor: None,
-            config,
-            budget,
-            pipeline_budget: parking_lot::RwLock::new(None),
-            context_builder,
-            prompt_builder,
-            response_parser: ResponseParser::new(),
-            learner: None,
-            memo_store: None,
-        }
-    }
-
-    /// Add an executor to an existing pilot.
-    pub fn with_executor_mut(mut self, executor: LlmExecutor) -> Self {
-        self.executor = Some(Arc::new(executor));
-        self
-    }
-
-    /// Add a feedback learner to the pilot.
-    pub fn with_learner(mut self, learner: Arc<PilotLearner>) -> Self {
-        self.learner = Some(learner);
-        self
-    }
-
-    /// Add a feedback learner from a feedback store.
-    pub fn with_feedback_store(mut self, store: Arc<FeedbackStore>) -> Self {
-        self.learner = Some(Arc::new(PilotLearner::new(store)));
-        self
-    }
-
-    /// Add a memo store for caching decisions.
-    ///
-    /// When enabled, the pilot will cache LLM decisions based on
-    /// context fingerprints, avoiding redundant API calls for
-    /// similar navigation scenarios.
-    pub fn with_memo_store(mut self, store: MemoStore) -> Self {
-        self.memo_store = Some(store);
-        self
-    }
-
-    /// Set the shared pipeline budget controller.
-    ///
-    /// When set, this becomes the primary budget gate for LLM calls.
-    /// The Pilot's own BudgetController still tracks per-level call counts,
-    /// but token consumption is recorded against the pipeline budget.
-    /// Call this at query time (not construction time) since the pipeline
-    /// budget is created per-query.
-    pub fn set_pipeline_budget(
-        &self,
-        budget: Arc<crate::retrieval::pipeline::RetrievalBudgetController>,
-    ) {
-        *self.pipeline_budget.write() = Some(budget);
-    }
-
-    /// Check if using LlmExecutor (unified throttle/retry/fallback).
-    pub fn has_executor(&self) -> bool {
-        self.executor.is_some()
-    }
-
-    /// Check if using feedback learner.
-    pub fn has_learner(&self) -> bool {
-        self.learner.is_some()
-    }
-
-    /// Check if using memo store.
-    pub fn has_memo_store(&self) -> bool {
-        self.memo_store.is_some()
-    }
-
-    /// Get the feedback learner (if any).
-    pub fn learner(&self) -> Option<&PilotLearner> {
-        self.learner.as_deref()
-    }
-
-    /// Get the memo store (if any).
-    pub fn memo_store(&self) -> Option<&MemoStore> {
-        self.memo_store.as_ref()
-    }
-
-    /// Record feedback for a decision.
-    pub fn record_feedback(&self, record: FeedbackRecord) {
-        if let Some(ref learner) = self.learner {
-            let decision_id = record.decision_id;
-            learner.store().record(record);
-            debug!("Recorded feedback for decision {:?}", decision_id);
-        }
-    }
-
-    /// Compute a cache key for a pilot decision.
-    fn compute_cache_key(
-        &self,
-        context: &super::builder::PilotContext,
-        _point: InterventionPoint,
-    ) -> Option<MemoKey> {
-        let _store = self.memo_store.as_ref()?;
-
-        // Build a fingerprint from the context using available methods
-        let context_str = context.to_string();
-        let context_fp = Fingerprint::from_str(&context_str);
-        let query_fp = Fingerprint::from_str(&context.query_section);
-
-        Some(MemoKey::pilot_decision(&context_fp, &query_fp))
-    }
-
-    /// Check if budget allows LLM calls.
-    ///
-    /// Checks the shared pipeline budget first (if set), then falls back
-    /// to the Pilot's own per-call budget.
-    fn has_budget(&self) -> bool {
-        // Primary: check pipeline budget
-        if let Some(ref pb) = *self.pipeline_budget.read() {
-            if pb.status().should_stop() {
-                return false;
-            }
-        }
-        // Secondary: check Pilot's own call-level budget
-        self.budget.can_call()
-    }
-
-    /// Check if scores are too close (algorithm uncertain).
-    fn scores_are_close(&self, state: &SearchState<'_>) -> bool {
-        // Use the config's score_gap_threshold with the state's best_score
-        // If best_score is low, consider scores as close
-        state.candidates.len() >= 2
-            && state.best_score < self.config.intervention.score_gap_threshold
-    }
-
-    /// Determine the intervention point type.
-    fn get_intervention_point(&self, state: &SearchState<'_>) -> InterventionPoint {
-        if state.is_at_root() || state.iteration == 0 {
-            InterventionPoint::Start
-        } else if state.is_backtracking {
-            InterventionPoint::Backtrack
-        } else if state.is_fork_point() {
-            InterventionPoint::Fork
-        } else {
-            InterventionPoint::Evaluate
-        }
-    }
-
-    /// Make an LLM call and return the decision.
-    async fn call_llm(
-        &self,
-        point: InterventionPoint,
-        context: &super::builder::PilotContext,
-        candidates: &[super::parser::CandidateInfo],
-    ) -> PilotDecision {
-        // Check memo cache first
-        if let Some(ref store) = self.memo_store {
-            if let Some(cache_key) = self.compute_cache_key(context, point) {
-                if let Some(cached) = store.get(&cache_key) {
-                    if let MemoValue::PilotDecision(decision_value) = cached {
-                        debug!("Memo cache hit for pilot decision at {:?}", point);
-                        // Convert cached value back to PilotDecision
-                        let decision =
-                            self.cached_value_to_decision(decision_value, candidates, point);
-                        return decision;
-                    }
-                }
-            }
-        }
-
-        // Build prompt
-        let prompt = self.prompt_builder.build(point, context);
-
-        // Check if we can afford this call
-        if !self.budget.can_afford(prompt.estimated_tokens) {
-            warn!(
-                "Budget cannot afford LLM call (estimated: {} tokens)",
-                prompt.estimated_tokens
-            );
-            return self.default_decision(candidates, point);
-        }
-
-        // Get learner adjustment if available
-        let adjustment = if let Some(ref learner) = self.learner {
-            let query_hash = context.query_hash();
-            let path_hash = context.path_hash();
-            Some(learner.get_adjustment(point, query_hash, path_hash))
-        } else {
-            None
-        };
-
-        // Check if learner suggests skipping intervention
-        if let Some(ref adj) = adjustment {
-            if adj.skip_intervention {
-                debug!("Learner suggests skipping intervention (low historical accuracy)");
-                return self.default_decision(candidates, point);
-            }
-        }
-
-        debug!(
-            "Calling LLM for {:?} point (estimated: {} tokens)",
-            point, prompt.estimated_tokens
-        );
-
-        // Make LLM call -use executor if available, otherwise use client directly
-        let result = if let Some(ref executor) = self.executor {
-            // Use LlmExecutor for unified throttle/retry/fallback
-            executor.complete(&prompt.system, &prompt.user).await
-        } else {
-            // Fallback to direct client call
-            self.client.complete(&prompt.system, &prompt.user).await
-        };
-
-        match result {
-            Ok(response) => {
-                // Record usage (estimate output tokens)
-                let output_tokens = self.estimate_tokens(&response);
-                let total_tokens = prompt.estimated_tokens + output_tokens;
-                self.budget
-                    .record_usage(prompt.estimated_tokens, output_tokens, 0);
-
-                // Also record in pipeline budget if shared
-                if let Some(ref pb) = *self.pipeline_budget.read() {
-                    pb.record_tokens(total_tokens);
-                }
-
-                // Parse response
-                let mut decision = self.response_parser.parse(&response, candidates, point);
-
-                // Apply learner adjustment if available
-                if let Some(ref adj) = adjustment {
-                    decision.confidence =
-                        (decision.confidence + adj.confidence_delta as f32).clamp(0.0, 1.0);
-                    debug!(
-                        "Applied learner adjustment: confidence_delta={:.2}, algorithm_weight={:.2}",
-                        adj.confidence_delta, adj.algorithm_weight
-                    );
-                }
-
-                info!(
-                    "LLM decision: direction={:?}, confidence={:.2}, candidates={}",
-                    std::mem::discriminant(&decision.direction),
-                    decision.confidence,
-                    decision.ranked_candidates.len()
-                );
-
-                // Cache the decision
-                if let Some(ref store) = self.memo_store {
-                    if let Some(cache_key) = self.compute_cache_key(context, point) {
-                        let decision_value = self.decision_to_cached_value(&decision);
-                        let tokens_saved = prompt.estimated_tokens as u64 + output_tokens as u64;
-                        store.put_with_tokens(
-                            cache_key,
-                            MemoValue::PilotDecision(decision_value),
-                            tokens_saved,
-                        );
-                        debug!("Memo cache stored for pilot decision at {:?}", point);
-                    }
-                }
-
-                decision
-            }
-            Err(e) => {
-                warn!("LLM call failed: {}", e);
-                self.default_decision(candidates, point)
-            }
-        }
-    }
-
-    /// Convert a PilotDecision to a cacheable value.
-    fn decision_to_cached_value(
-        &self,
-        decision: &PilotDecision,
-    ) -> crate::llm::memo::PilotDecisionValue {
-        crate::llm::memo::PilotDecisionValue {
-            selected_idx: decision
-                .ranked_candidates
-                .first()
-                .map(|c| c.node_id.0.into())
-                .unwrap_or(0),
-            confidence: decision.confidence,
-            reasoning: decision.reasoning.clone(),
-        }
-    }
-
-    /// Convert a cached value back to a PilotDecision.
-    fn cached_value_to_decision(
-        &self,
-        value: crate::llm::memo::PilotDecisionValue,
-        candidates: &[super::parser::CandidateInfo],
-        point: InterventionPoint,
-    ) -> PilotDecision {
-        let ranked = candidates
-            .iter()
-            .enumerate()
-            .map(|(i, c)| super::decision::RankedCandidate {
-                node_id: c.node_id,
-                score: if i == value.selected_idx {
-                    1.0
-                } else {
-                    0.5 / (i + 1) as f32
-                },
-                reason: None,
-            })
-            .collect();
-
-        PilotDecision {
-            ranked_candidates: ranked,
-            direction: super::decision::SearchDirection::GoDeeper {
-                reason: "Cached decision".to_string(),
-            },
-            confidence: value.confidence,
-            reasoning: value.reasoning,
-            intervention_point: point,
-        }
-    }
-
-    /// Create a default decision when LLM fails.
-    fn default_decision(
-        &self,
-        candidates: &[super::parser::CandidateInfo],
-        point: InterventionPoint,
-    ) -> PilotDecision {
-        let ranked = candidates
-            .iter()
-            .enumerate()
-            .map(|(i, c)| super::decision::RankedCandidate {
-                node_id: c.node_id,
-                score: 1.0 / (i + 1) as f32,
-                reason: None,
-            })
-            .collect();
-
-        PilotDecision {
-            ranked_candidates: ranked,
-            direction: super::decision::SearchDirection::GoDeeper {
-                reason: "Default decision (LLM unavailable)".to_string(),
-            },
-            confidence: 0.0,
-            reasoning: "LLM call failed or budget exhausted".to_string(),
-            intervention_point: point,
-        }
-    }
-
-    /// Estimate token count for a string.
-    fn estimate_tokens(&self, text: &str) -> usize {
-        let char_count = text.chars().count();
-        let chinese_count = text
-            .chars()
-            .filter(|c| ('\u{4E00}'..='\u{9FFF}').contains(c))
-            .count();
-        let english_count = char_count - chinese_count;
-
-        (chinese_count as f32 / 1.5 + english_count as f32 / 4.0).ceil() as usize
-    }
-}
-
-#[async_trait]
-impl Pilot for LlmPilot {
-    fn name(&self) -> &str {
-        "llm_pilot"
-    }
-
-    fn should_intervene(&self, state: &SearchState<'_>) -> bool {
-        // Check mode
-        if !self.config.mode.uses_llm() {
-            return false;
-        }
-
-        // Check budget
-        if !self.has_budget() {
-            debug!("Budget exhausted, skipping intervention");
-            return false;
-        }
-
-        let intervention = &self.config.intervention;
-
-        // Condition 1: Fork point with enough candidates
-        if state.candidates.len() > intervention.fork_threshold {
-            debug!(
-                "Intervening: fork point with {} candidates",
-                state.candidates.len()
-            );
-            return true;
-        }
-
-        // Condition 2: Scores are too close (algorithm uncertain)
-        if self.scores_are_close(state) {
-            debug!("Intervening: scores are close");
-            return true;
-        }
-
-        // Condition 3: Low confidence (best score too low)
-        if intervention.is_low_confidence(state.best_score) {
-            debug!(
-                "Intervening: low confidence (best_score={:.2})",
-                state.best_score
-            );
-            return true;
-        }
-
-        // Condition 4: Backtracking and guide_at_backtrack is enabled
-        if state.is_backtracking && self.config.guide_at_backtrack {
-            debug!("Intervening: backtracking");
-            return true;
-        }
-
-        false
-    }
-
-    async fn decide(&self, state: &SearchState<'_>) -> PilotDecision {
-        let point = self.get_intervention_point(state);
-
-        // Build context
-        let context = self.context_builder.build(state);
-
-        // Build candidate info with titles
-        let candidate_info: Vec<super::parser::CandidateInfo> = state
-            .candidates
-            .iter()
-            .enumerate()
-            .filter_map(|(i, &node_id)| {
-                state
-                    .tree
-                    .get(node_id)
-                    .map(|node| super::parser::CandidateInfo {
-                        node_id,
-                        title: node.title.clone(),
-                        index: i,
-                    })
-            })
-            .collect();
-
-        // Make LLM call
-        self.call_llm(point, &context, &candidate_info).await
-    }
-
-    async fn guide_start(
-        &self,
-        tree: &DocumentTree,
-        query: &str,
-        start_node: NodeId,
-    ) -> Option<PilotDecision> {
-        // Check if guide_at_start is enabled
-        if !self.config.guide_at_start {
-            return None;
-        }
-
-        // Check budget
-        if !self.has_budget() {
-            debug!("Budget exhausted, cannot guide start");
-            return None;
-        }
-
-        // Build start context
-        let context = self.context_builder.build_start_context(tree, query);
-
-        // Get start_node's children as candidates (NOT root's children)
-        let node_ids = tree.children(start_node);
-        if node_ids.is_empty() {
-            debug!("Start node has no children, no guidance needed");
-            return None;
-        }
-
-        // Build CandidateInfo with titles
-        let candidates: Vec<super::parser::CandidateInfo> = node_ids
-            .iter()
-            .enumerate()
-            .filter_map(|(i, &node_id)| {
-                tree.get(node_id).map(|node| super::parser::CandidateInfo {
-                    node_id,
-                    title: node.title.clone(),
-                    index: i,
-                })
-            })
-            .collect();
-
-        // Make LLM call
-        let decision = self
-            .call_llm(InterventionPoint::Start, &context, &candidates)
-            .await;
-
-        info!(
-            "Pilot start guidance: confidence={:.2}, candidates={}",
-            decision.confidence,
-            decision.ranked_candidates.len()
-        );
-
-        Some(decision)
-    }
-
-    async fn guide_backtrack(&self, state: &SearchState<'_>) -> Option<PilotDecision> {
-        // Check if guide_at_backtrack is enabled
-        if !self.config.guide_at_backtrack {
-            return None;
-        }
-
-        // Check budget
-        if !self.has_budget() {
-            return None;
-        }
-
-        // Build backtrack context
-        let context = self
-            .context_builder
-            .build_backtrack_context(state, state.path);
-
-        // Build CandidateInfo
-        let candidates: Vec<super::parser::CandidateInfo> = state
-            .candidates
-            .iter()
-            .enumerate()
-            .filter_map(|(i, &node_id)| {
-                state
-                    .tree
-                    .get(node_id)
-                    .map(|node| super::parser::CandidateInfo {
-                        node_id,
-                        title: node.title.clone(),
-                        index: i,
-                    })
-            })
-            .collect();
-
-        // Make LLM call
-        Some(
-            self.call_llm(InterventionPoint::Backtrack, &context, &candidates)
-                .await,
-        )
-    }
-
-    async fn binary_prune(&self, state: &SearchState<'_>) -> Option<Vec<NodeId>> {
-        if !self.has_budget() {
-            debug!("Budget exhausted, cannot binary prune");
-            return None;
-        }
-
-        let context = self.context_builder.build(state);
-
-        let candidate_info: Vec<super::parser::CandidateInfo> = state
-            .candidates
-            .iter()
-            .enumerate()
-            .filter_map(|(i, &node_id)| {
-                state
-                    .tree
-                    .get(node_id)
-                    .map(|node| super::parser::CandidateInfo {
-                        node_id,
-                        title: node.title.clone(),
-                        index: i,
-                    })
-            })
-            .collect();
-
-        let decision = self
-            .call_llm(InterventionPoint::Prune, &context, &candidate_info)
-            .await;
-
-        // Extract relevant node IDs from ranked candidates (score > 0.5 means relevant)
-        let relevant: Vec<NodeId> = decision
-            .ranked_candidates
-            .iter()
-            .filter(|c| c.score > 0.5)
-            .map(|c| c.node_id)
-            .collect();
-
-        if relevant.is_empty() {
-            debug!("Binary prune: LLM marked no candidates as relevant");
-            return None;
-        }
-
-        debug!(
-            "Binary prune: {} of {} candidates marked relevant",
-            relevant.len(),
-            state.candidates.len()
-        );
-
-        Some(relevant)
-    }
-
-    fn config(&self) -> &PilotConfig {
-        &self.config
-    }
-
-    fn is_active(&self) -> bool {
-        self.config.mode.uses_llm() && self.has_budget()
-    }
-
-    fn reset(&self) {
-        self.budget.reset();
-        *self.pipeline_budget.write() = None;
-        debug!("LlmPilot reset for new query");
-    }
-
-    fn as_any(&self) -> &dyn std::any::Any {
-        self
-    }
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-    use crate::document::NodeId;
-    use indextree::Arena;
-
-    fn create_test_node_ids(count: usize) -> Vec<NodeId> {
-        let mut arena = Arena::new();
-        let mut ids = Vec::new();
-        for i in 0..count {
-            let node = crate::document::TreeNode {
-                title: format!("Node {}", i),
-                structure: String::new(),
-                content: String::new(),
-                summary: String::new(),
-                depth: 0,
-                start_index: 1,
-                end_index: 1,
-                start_page: None,
-                end_page: None,
-                node_id: None,
-                physical_index: None,
-                token_count: None,
-                references: Vec::new(),
-            };
-            ids.push(NodeId(arena.new_node(node)));
-        }
-        ids
-    }
-
-    #[test]
-    fn test_llm_pilot_creation() {
-        let client = LlmClient::for_model("gpt-4o-mini");
-        let config = PilotConfig::default();
-        let pilot = LlmPilot::new(client, config);
-
-        assert_eq!(pilot.name(), "llm_pilot");
-        assert!(pilot.is_active());
-    }
-
-    #[test]
-    fn test_llm_pilot_algorithm_only_mode() {
-        let client = LlmClient::for_model("gpt-4o-mini");
-        let config = PilotConfig::algorithm_only();
-        let pilot = LlmPilot::new(client, config);
-
-        assert!(!pilot.config().mode.uses_llm());
-    }
-
-    #[test]
-    fn test_llm_pilot_budget_exhausted() {
-        let client = LlmClient::for_model("gpt-4o-mini");
-        let config = PilotConfig::default();
-        let pilot = LlmPilot::new(client, config);
-
-        // Exhaust budget
-        pilot.budget.record_usage(3000, 500, 0);
-
-        assert!(!pilot.has_budget());
-    }
-
-    #[test]
-    fn test_reset() {
-        let client = LlmClient::for_model("gpt-4o-mini");
-        let config = PilotConfig::default();
-        let pilot = LlmPilot::new(client, config);
-
-        // Use some budget
-        pilot.budget.record_usage(100, 50, 0);
-        assert!(pilot.budget.total_tokens() > 0);
-
-        // Reset
-        pilot.reset();
-        assert_eq!(pilot.budget.total_tokens(), 0);
-    }
-}
diff --git a/rust/src/retrieval/pilot/metrics.rs b/rust/src/retrieval/pilot/metrics.rs
deleted file mode 100644
index cf0f12b3..00000000
--- a/rust/src/retrieval/pilot/metrics.rs
+++ /dev/null
@@ -1,557 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Metrics collector for Pilot operations.
-//!
-//! Collects performance metrics including:
-//! - LLM call statistics (count, success/failure)
-//! - Token usage (input, output, total)
-//! - Latency tracking (average, p50, p99)
-//! - Decision quality metrics
-
-use std::sync::atomic::{AtomicU64, AtomicUsize, Ordering};
-use std::time::Duration;
-
-use super::decision::InterventionPoint;
-
-/// Snapshot of Pilot metrics at a point in time.
-#[derive(Debug, Clone, Default)]
-pub struct PilotMetrics {
-    // LLM call statistics
-    /// Total LLM calls attempted.
-    pub total_calls: usize,
-    /// Successful LLM calls.
-    pub successful_calls: usize,
-    /// Failed LLM calls.
-    pub failed_calls: usize,
-    /// Calls that needed fallback.
-    pub fallback_calls: usize,
-
-    // Token statistics
-    /// Total input tokens consumed.
-    pub total_input_tokens: usize,
-    /// Total output tokens generated.
-    pub total_output_tokens: usize,
-    /// Average tokens per call.
-    pub avg_tokens_per_call: f64,
-
-    // Latency statistics
-    /// Total time spent in LLM calls (ms).
-    pub total_latency_ms: u64,
-    /// Average latency per call (ms).
-    pub avg_latency_ms: f64,
-    /// P50 latency (ms).
-    pub p50_latency_ms: u64,
-    /// P99 latency (ms).
-    pub p99_latency_ms: u64,
-
-    // Intervention statistics
-    /// Calls at START point.
-    pub start_interventions: usize,
-    /// Calls at FORK point.
-    pub fork_interventions: usize,
-    /// Calls at BACKTRACK point.
-    pub backtrack_interventions: usize,
-    /// Calls at EVALUATE point.
-    pub evaluate_interventions: usize,
-
-    // Quality metrics (require feedback)
-    /// LLM decision accuracy (0.0-1.0).
-    pub llm_accuracy: Option<f64>,
-    /// Retrieval precision (0.0-1.0).
-    pub retrieval_precision: Option<f64>,
-}
-
-impl PilotMetrics {
-    /// Calculate success rate (0.0-1.0).
-    pub fn success_rate(&self) -> f64 {
-        if self.total_calls == 0 {
-            return 0.0;
-        }
-        self.successful_calls as f64 / self.total_calls as f64
-    }
-
-    /// Calculate token utilization.
-    pub fn token_utilization(&self, budget: usize) -> f64 {
-        if budget == 0 {
-            return 0.0;
-        }
-        let total = self.total_input_tokens + self.total_output_tokens;
-        (total as f64 / budget as f64).min(1.0)
-    }
-
-    /// Calculate fallback rate (0.0-1.0).
-    pub fn fallback_rate(&self) -> f64 {
-        if self.total_calls == 0 {
-            return 0.0;
-        }
-        self.fallback_calls as f64 / self.total_calls as f64
-    }
-}
-
-/// Record of a single LLM call.
-#[derive(Debug, Clone)]
-pub struct CallRecord {
-    /// Intervention point.
-    pub point: InterventionPoint,
-    /// Input tokens used.
-    pub input_tokens: usize,
-    /// Output tokens generated.
-    pub output_tokens: usize,
-    /// Latency in milliseconds.
-    pub latency_ms: u64,
-    /// Whether the call succeeded.
-    pub success: bool,
-    /// Whether fallback was used.
-    pub used_fallback: bool,
-}
-
-/// Latency sample for percentile calculation.
-#[derive(Debug, Clone)]
-struct LatencySample {
-    latency_ms: u64,
-}
-
-/// Metrics collector for Pilot operations.
-///
-/// Thread-safe collector that tracks all Pilot metrics.
-/// Uses atomic operations for concurrent access.
-///
-/// # Example
-///
-/// ```rust,ignore
-/// use vectorless::retrieval::pilot::MetricsCollector;
-///
-/// let metrics = MetricsCollector::new();
-///
-/// // Record a call
-/// let start = std::time::Instant::now();
-/// // ... make LLM call ...
-/// metrics.record_call(InterventionPoint::Fork, 100, 50, start.elapsed(), true, false);
-///
-/// // Get snapshot
-/// let snapshot = metrics.snapshot();
-/// println!("Success rate: {:.2}%", snapshot.success_rate() * 100.0);
-/// ```
-pub struct MetricsCollector {
-    // Call counters
-    total_calls: AtomicUsize,
-    successful_calls: AtomicUsize,
-    failed_calls: AtomicUsize,
-    fallback_calls: AtomicUsize,
-
-    // Token counters
-    total_input_tokens: AtomicUsize,
-    total_output_tokens: AtomicUsize,
-
-    // Latency tracking
-    total_latency_ms: AtomicU64,
-    latency_samples: std::sync::RwLock<Vec<LatencySample>>,
-
-    // Intervention counters
-    start_interventions: AtomicUsize,
-    fork_interventions: AtomicUsize,
-    backtrack_interventions: AtomicUsize,
-    evaluate_interventions: AtomicUsize,
-
-    // Quality metrics (set externally)
-    llm_accuracy: std::sync::RwLock<Option<f64>>,
-    retrieval_precision: std::sync::RwLock<Option<f64>>,
-}
-
-impl Default for MetricsCollector {
-    fn default() -> Self {
-        Self::new()
-    }
-}
-
-impl MetricsCollector {
-    /// Create a new metrics collector.
-    pub fn new() -> Self {
-        Self {
-            total_calls: AtomicUsize::new(0),
-            successful_calls: AtomicUsize::new(0),
-            failed_calls: AtomicUsize::new(0),
-            fallback_calls: AtomicUsize::new(0),
-            total_input_tokens: AtomicUsize::new(0),
-            total_output_tokens: AtomicUsize::new(0),
-            total_latency_ms: AtomicU64::new(0),
-            latency_samples: std::sync::RwLock::new(Vec::with_capacity(100)),
-            start_interventions: AtomicUsize::new(0),
-            fork_interventions: AtomicUsize::new(0),
-            backtrack_interventions: AtomicUsize::new(0),
-            evaluate_interventions: AtomicUsize::new(0),
-            llm_accuracy: std::sync::RwLock::new(None),
-            retrieval_precision: std::sync::RwLock::new(None),
-        }
-    }
-
-    /// Record an LLM call.
-    pub fn record_call(
-        &self,
-        point: InterventionPoint,
-        input_tokens: usize,
-        output_tokens: usize,
-        latency: Duration,
-        success: bool,
-        used_fallback: bool,
-    ) {
-        // Update call counters
-        self.total_calls.fetch_add(1, Ordering::Relaxed);
-        if success {
-            self.successful_calls.fetch_add(1, Ordering::Relaxed);
-        } else {
-            self.failed_calls.fetch_add(1, Ordering::Relaxed);
-        }
-        if used_fallback {
-            self.fallback_calls.fetch_add(1, Ordering::Relaxed);
-        }
-
-        // Update token counters
-        self.total_input_tokens
-            .fetch_add(input_tokens, Ordering::Relaxed);
-        self.total_output_tokens
-            .fetch_add(output_tokens, Ordering::Relaxed);
-
-        // Update latency
-        let latency_ms = latency.as_millis() as u64;
-        self.total_latency_ms
-            .fetch_add(latency_ms, Ordering::Relaxed);
-
-        // Store latency sample
-        if let Ok(mut samples) = self.latency_samples.write() {
-            samples.push(LatencySample { latency_ms });
-            // Keep last 1000 samples
-            if samples.len() > 1000 {
-                samples.remove(0);
-            }
-        }
-
-        // Update intervention counters
-        match point {
-            InterventionPoint::Start => {
-                self.start_interventions.fetch_add(1, Ordering::Relaxed);
-            }
-            InterventionPoint::Fork | InterventionPoint::Prune => {
-                self.fork_interventions.fetch_add(1, Ordering::Relaxed);
-            }
-            InterventionPoint::Backtrack => {
-                self.backtrack_interventions.fetch_add(1, Ordering::Relaxed);
-            }
-            InterventionPoint::Evaluate => {
-                self.evaluate_interventions.fetch_add(1, Ordering::Relaxed);
-            }
-        }
-    }
-
-    /// Record a call using CallRecord.
-    pub fn record(&self, record: CallRecord) {
-        let latency = Duration::from_millis(record.latency_ms);
-        self.record_call(
-            record.point,
-            record.input_tokens,
-            record.output_tokens,
-            latency,
-            record.success,
-            record.used_fallback,
-        );
-    }
-
-    /// Set LLM accuracy (from external feedback).
-    pub fn set_llm_accuracy(&self, accuracy: f64) {
-        if let Ok(mut acc) = self.llm_accuracy.write() {
-            *acc = Some(accuracy.clamp(0.0, 1.0));
-        }
-    }
-
-    /// Set retrieval precision (from external feedback).
-    pub fn set_retrieval_precision(&self, precision: f64) {
-        if let Ok(mut prec) = self.retrieval_precision.write() {
-            *prec = Some(precision.clamp(0.0, 1.0));
-        }
-    }
-
-    /// Get a snapshot of current metrics.
-    pub fn snapshot(&self) -> PilotMetrics {
-        let total_calls = self.total_calls.load(Ordering::Relaxed);
-        let successful_calls = self.successful_calls.load(Ordering::Relaxed);
-        let failed_calls = self.failed_calls.load(Ordering::Relaxed);
-        let fallback_calls = self.fallback_calls.load(Ordering::Relaxed);
-        let total_input_tokens = self.total_input_tokens.load(Ordering::Relaxed);
-        let total_output_tokens = self.total_output_tokens.load(Ordering::Relaxed);
-        let total_latency_ms = self.total_latency_ms.load(Ordering::Relaxed);
-
-        let avg_tokens_per_call = if total_calls > 0 {
-            (total_input_tokens + total_output_tokens) as f64 / total_calls as f64
-        } else {
-            0.0
-        };
-
-        let avg_latency_ms = if total_calls > 0 {
-            total_latency_ms as f64 / total_calls as f64
-        } else {
-            0.0
-        };
-
-        // Calculate percentiles from samples
-        let (p50_latency_ms, p99_latency_ms) = self.calculate_percentiles();
-
-        PilotMetrics {
-            total_calls,
-            successful_calls,
-            failed_calls,
-            fallback_calls,
-            total_input_tokens,
-            total_output_tokens,
-            avg_tokens_per_call,
-            total_latency_ms,
-            avg_latency_ms,
-            p50_latency_ms,
-            p99_latency_ms,
-            start_interventions: self.start_interventions.load(Ordering::Relaxed),
-            fork_interventions: self.fork_interventions.load(Ordering::Relaxed),
-            backtrack_interventions: self.backtrack_interventions.load(Ordering::Relaxed),
-            evaluate_interventions: self.evaluate_interventions.load(Ordering::Relaxed),
-            llm_accuracy: self.llm_accuracy.read().ok().and_then(|v| *v),
-            retrieval_precision: self.retrieval_precision.read().ok().and_then(|v| *v),
-        }
-    }
-
-    /// Calculate p50 and p99 latencies.
-    fn calculate_percentiles(&self) -> (u64, u64) {
-        if let Ok(samples) = self.latency_samples.read() {
-            if samples.is_empty() {
-                return (0, 0);
-            }
-
-            let mut latencies: Vec<u64> = samples.iter().map(|s| s.latency_ms).collect();
-            latencies.sort();
-
-            let p50_idx = (latencies.len() as f64 * 0.50) as usize;
-            let p99_idx = (latencies.len() as f64 * 0.99) as usize;
-
-            let p50 = latencies.get(p50_idx).copied().unwrap_or(0);
-            let p99 = latencies
-                .get(p99_idx.min(latencies.len() - 1))
-                .copied()
-                .unwrap_or(0);
-
-            (p50, p99)
-        } else {
-            (0, 0)
-        }
-    }
-
-    /// Reset all metrics for a new query.
-    pub fn reset(&self) {
-        self.total_calls.store(0, Ordering::Relaxed);
-        self.successful_calls.store(0, Ordering::Relaxed);
-        self.failed_calls.store(0, Ordering::Relaxed);
-        self.fallback_calls.store(0, Ordering::Relaxed);
-        self.total_input_tokens.store(0, Ordering::Relaxed);
-        self.total_output_tokens.store(0, Ordering::Relaxed);
-        self.total_latency_ms.store(0, Ordering::Relaxed);
-        self.start_interventions.store(0, Ordering::Relaxed);
-        self.fork_interventions.store(0, Ordering::Relaxed);
-        self.backtrack_interventions.store(0, Ordering::Relaxed);
-        self.evaluate_interventions.store(0, Ordering::Relaxed);
-
-        if let Ok(mut samples) = self.latency_samples.write() {
-            samples.clear();
-        }
-    }
-
-    /// Get total tokens used.
-    pub fn total_tokens(&self) -> usize {
-        self.total_input_tokens.load(Ordering::Relaxed)
-            + self.total_output_tokens.load(Ordering::Relaxed)
-    }
-
-    /// Get total calls made.
-    pub fn total_calls(&self) -> usize {
-        self.total_calls.load(Ordering::Relaxed)
-    }
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-    use std::time::Duration;
-
-    #[test]
-    fn test_metrics_creation() {
-        let metrics = MetricsCollector::new();
-        let snapshot = metrics.snapshot();
-
-        assert_eq!(snapshot.total_calls, 0);
-        assert_eq!(snapshot.successful_calls, 0);
-        assert_eq!(snapshot.failed_calls, 0);
-    }
-
-    #[test]
-    fn test_record_call() {
-        let metrics = MetricsCollector::new();
-
-        metrics.record_call(
-            InterventionPoint::Fork,
-            100,
-            50,
-            Duration::from_millis(200),
-            true,
-            false,
-        );
-
-        let snapshot = metrics.snapshot();
-        assert_eq!(snapshot.total_calls, 1);
-        assert_eq!(snapshot.successful_calls, 1);
-        assert_eq!(snapshot.failed_calls, 0);
-        assert_eq!(snapshot.total_input_tokens, 100);
-        assert_eq!(snapshot.total_output_tokens, 50);
-        assert_eq!(snapshot.fork_interventions, 1);
-    }
-
-    #[test]
-    fn test_record_failed_call() {
-        let metrics = MetricsCollector::new();
-
-        metrics.record_call(
-            InterventionPoint::Start,
-            100,
-            0,
-            Duration::from_millis(100),
-            false,
-            true,
-        );
-
-        let snapshot = metrics.snapshot();
-        assert_eq!(snapshot.total_calls, 1);
-        assert_eq!(snapshot.successful_calls, 0);
-        assert_eq!(snapshot.failed_calls, 1);
-        assert_eq!(snapshot.fallback_calls, 1);
-        assert_eq!(snapshot.start_interventions, 1);
-    }
-
-    #[test]
-    fn test_success_rate() {
-        let metrics = MetricsCollector::new();
-
-        // No calls
-        assert_eq!(metrics.snapshot().success_rate(), 0.0);
-
-        // 3 successful, 1 failed
-        metrics.record_call(InterventionPoint::Fork, 0, 0, Duration::ZERO, true, false);
-        metrics.record_call(InterventionPoint::Fork, 0, 0, Duration::ZERO, true, false);
-        metrics.record_call(InterventionPoint::Fork, 0, 0, Duration::ZERO, true, false);
-        metrics.record_call(InterventionPoint::Fork, 0, 0, Duration::ZERO, false, false);
-
-        assert!((metrics.snapshot().success_rate() - 0.75).abs() < 0.01);
-    }
-
-    #[test]
-    fn test_token_utilization() {
-        let metrics = MetricsCollector::new();
-
-        metrics.record_call(
-            InterventionPoint::Fork,
-            500,
-            200,
-            Duration::ZERO,
-            true,
-            false,
-        );
-
-        let utilization = metrics.snapshot().token_utilization(1000);
-        assert!((utilization - 0.7).abs() < 0.01);
-    }
-
-    #[test]
-    fn test_latency_percentiles() {
-        let metrics = MetricsCollector::new();
-
-        // Add 100 samples with increasing latency
-        for i in 0..100 {
-            metrics.record_call(
-                InterventionPoint::Fork,
-                0,
-                0,
-                Duration::from_millis(i as u64 + 1),
-                true,
-                false,
-            );
-        }
-
-        let snapshot = metrics.snapshot();
-
-        // P50 should be around 50
-        assert!(snapshot.p50_latency_ms >= 40 && snapshot.p50_latency_ms <= 60);
-
-        // P99 should be around 99
-        assert!(snapshot.p99_latency_ms >= 90 && snapshot.p99_latency_ms <= 100);
-    }
-
-    #[test]
-    fn test_reset() {
-        let metrics = MetricsCollector::new();
-
-        metrics.record_call(
-            InterventionPoint::Fork,
-            100,
-            50,
-            Duration::from_millis(200),
-            true,
-            false,
-        );
-        assert!(metrics.total_calls() > 0);
-
-        metrics.reset();
-
-        let snapshot = metrics.snapshot();
-        assert_eq!(snapshot.total_calls, 0);
-        assert_eq!(snapshot.total_input_tokens, 0);
-    }
-
-    #[test]
-    fn test_quality_metrics() {
-        let metrics = MetricsCollector::new();
-
-        metrics.set_llm_accuracy(0.85);
-        metrics.set_retrieval_precision(0.92);
-
-        let snapshot = metrics.snapshot();
-        assert_eq!(snapshot.llm_accuracy, Some(0.85));
-        assert_eq!(snapshot.retrieval_precision, Some(0.92));
-    }
-
-    #[test]
-    fn test_quality_metrics_clamping() {
-        let metrics = MetricsCollector::new();
-
-        metrics.set_llm_accuracy(1.5);
-        metrics.set_retrieval_precision(-0.1);
-
-        let snapshot = metrics.snapshot();
-        assert_eq!(snapshot.llm_accuracy, Some(1.0));
-        assert_eq!(snapshot.retrieval_precision, Some(0.0));
-    }
-
-    #[test]
-    fn test_call_record() {
-        let metrics = MetricsCollector::new();
-
-        let record = CallRecord {
-            point: InterventionPoint::Backtrack,
-            input_tokens: 150,
-            output_tokens: 75,
-            latency_ms: 300,
-            success: true,
-            used_fallback: false,
-        };
-
-        metrics.record(record);
-
-        let snapshot = metrics.snapshot();
-        assert_eq!(snapshot.total_calls, 1);
-        assert_eq!(snapshot.backtrack_interventions, 1);
-        assert_eq!(snapshot.total_input_tokens, 150);
-    }
-}
diff --git a/rust/src/retrieval/pilot/mod.rs b/rust/src/retrieval/pilot/mod.rs
deleted file mode 100644
index b14aa997..00000000
--- a/rust/src/retrieval/pilot/mod.rs
+++ /dev/null
@@ -1,55 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Pilot - The brain of the Retriever Pipeline.
-//!
-//! Pilot is the core intelligence component responsible for understanding queries,
-//! analyzing document structure, and making navigation decisions. Unlike traditional
-//! vector-based retrieval, Pilot uses LLM for semantic understanding and navigation
-//! while keeping the algorithm efficient for execution.
-//!
-//! # Design Philosophy
-//!
-//! 1. Algorithm handles "how to search" - efficient, deterministic, low latency
-//! 2. Pilot handles "where to go" - semantic understanding, disambiguation, direction
-//! 3. Intervention at key decision points - not every step, only when needed
-//! 4. Layered fallback - algorithm takes over when LLM fails, Pilot rescues when algorithm fails
-//!
-//!
-//! # Quick Start
-//!
-//! ```rust,ignore
-//! use vectorless::retrieval::pilot::{LlmPilot, PilotConfig, Pilot};
-//!
-//! let pilot = LlmPilot::new(llm_client, PilotConfig::default());
-//!
-//! // Check if intervention needed
-//! if pilot.should_intervene(&state) {
-//!     let decision = pilot.decide(&state).await;
-//!     // Use decision to guide search
-//! }
-//! ```
-
-mod budget;
-mod builder;
-mod complexity;
-mod config;
-mod decision;
-mod decision_scorer;
-mod fallback;
-mod feedback;
-mod llm_pilot;
-mod metrics;
-mod noop;
-mod parser;
-mod prompts;
-mod scorer;
-mod r#trait;
-
-pub use complexity::detect_with_llm;
-pub use config::PilotConfig;
-pub use decision::{InterventionPoint, PilotDecision};
-pub use decision_scorer::{PilotDecisionCache, score_candidates, score_candidates_detailed};
-pub use llm_pilot::LlmPilot;
-pub use scorer::{NodeScorer, ScoringContext};
-pub use r#trait::{Pilot, SearchState};
diff --git a/rust/src/retrieval/pilot/noop.rs b/rust/src/retrieval/pilot/noop.rs
deleted file mode 100644
index e5159276..00000000
--- a/rust/src/retrieval/pilot/noop.rs
+++ /dev/null
@@ -1,160 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! NoopPilot - A no-operation Pilot implementation.
-//!
-//! This module provides a Pilot implementation that never intervenes,
-//! useful for testing, benchmarking, and as a fallback when LLM
-//! is unavailable.
-
-use async_trait::async_trait;
-
-use crate::document::{DocumentTree, NodeId};
-
-use super::{InterventionPoint, Pilot, PilotConfig, PilotDecision, SearchState};
-
-/// A Pilot implementation that never intervenes.
-///
-/// This is useful for:
-/// - Testing the search algorithm without LLM interference
-/// - Benchmarking baseline performance
-/// - Fallback when LLM is unavailable
-///
-/// # Example
-///
-/// ```rust,ignore
-/// use vectorless::retrieval::pilot::NoopPilot;
-///
-/// let pilot = NoopPilot::new();
-///
-/// // This will always return false
-/// assert!(!pilot.should_intervene(&state));
-/// ```
-#[derive(Debug, Clone, Default)]
-pub struct NoopPilot {
-    config: PilotConfig,
-}
-
-impl NoopPilot {
-    /// Create a new NoopPilot.
-    pub fn new() -> Self {
-        Self {
-            config: PilotConfig::algorithm_only(),
-        }
-    }
-
-    /// Create with custom config.
-    pub fn with_config(config: PilotConfig) -> Self {
-        Self { config }
-    }
-}
-
-#[async_trait]
-impl Pilot for NoopPilot {
-    fn name(&self) -> &str {
-        "noop"
-    }
-
-    fn should_intervene(&self, _state: &SearchState<'_>) -> bool {
-        // Never intervene
-        false
-    }
-
-    async fn decide(&self, state: &SearchState<'_>) -> PilotDecision {
-        // Return a default decision that preserves original order
-        let decision = PilotDecision::preserve_order(state.candidates);
-        PilotDecision {
-            intervention_point: InterventionPoint::Fork,
-            ..decision
-        }
-    }
-
-    async fn guide_start(
-        &self,
-        _tree: &DocumentTree,
-        _query: &str,
-        _start_node: NodeId,
-    ) -> Option<PilotDecision> {
-        // No guidance at start
-        None
-    }
-
-    async fn guide_backtrack(&self, _state: &SearchState<'_>) -> Option<PilotDecision> {
-        // No guidance during backtrack
-        None
-    }
-
-    async fn binary_prune(&self, _state: &SearchState<'_>) -> Option<Vec<NodeId>> {
-        // NoopPilot does not support binary pruning
-        None
-    }
-
-    fn config(&self) -> &PilotConfig {
-        &self.config
-    }
-
-    fn is_active(&self) -> bool {
-        // NoopPilot is never active
-        false
-    }
-
-    fn reset(&self) {
-        // No state to reset
-    }
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-    use crate::document::NodeId;
-    use std::collections::HashSet;
-
-    #[test]
-    fn test_noop_pilot_never_intervenes() {
-        let pilot = NoopPilot::new();
-
-        // Create a minimal state
-        let tree = DocumentTree::new("test", "test content");
-        let query = "test query";
-        let path: &[NodeId] = &[];
-        let candidates: &[NodeId] = &[];
-        let visited = HashSet::new();
-
-        let state = SearchState::new(&tree, query, path, candidates, &visited);
-
-        // Should never intervene
-        assert!(!pilot.should_intervene(&state));
-    }
-
-    #[tokio::test]
-    async fn test_noop_pilot_returns_default_decision() {
-        let pilot = NoopPilot::new();
-
-        let tree = DocumentTree::new("test", "test content");
-        let query = "test query";
-        let path: &[NodeId] = &[];
-        let candidates: &[NodeId] = &[];
-        let visited = HashSet::new();
-
-        let state = SearchState::new(&tree, query, path, candidates, &visited);
-        let decision = pilot.decide(&state).await;
-
-        assert_eq!(decision.confidence, 0.0);
-        assert!(!decision.has_candidates());
-    }
-
-    #[tokio::test]
-    async fn test_noop_pilot_no_start_guidance() {
-        let pilot = NoopPilot::new();
-        let tree = DocumentTree::new("test", "test content");
-
-        let guidance = pilot.guide_start(&tree, "test", tree.root()).await;
-        assert!(guidance.is_none());
-    }
-
-    #[test]
-    fn test_noop_pilot_not_active() {
-        let pilot = NoopPilot::new();
-        assert!(!pilot.is_active());
-    }
-}
diff --git a/rust/src/retrieval/pilot/parser.rs b/rust/src/retrieval/pilot/parser.rs
deleted file mode 100644
index d79a246f..00000000
--- a/rust/src/retrieval/pilot/parser.rs
+++ /dev/null
@@ -1,833 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Response parser for Pilot LLM calls.
-//!
-//! Parses LLM responses into structured `PilotDecision` objects.
-//! Uses multiple parsing strategies with graceful fallbacks:
-//! 1. JSON parse (preferred)
-//! 2. Regex extraction
-//! 3. Default decision (fallback)
-
-use regex::Regex;
-use serde::{Deserialize, Serialize};
-use tracing::warn;
-
-use super::decision::{InterventionPoint, PilotDecision, RankedCandidate, SearchDirection};
-use crate::document::NodeId;
-
-/// Parsed response from LLM.
-#[derive(Debug, Clone, Serialize, Deserialize)]
-pub struct LlmResponse {
-    /// Ranked candidates with scores (FORK format).
-    #[serde(default)]
-    pub ranked_candidates: Vec<CandidateScore>,
-    /// Entry points for START intervention (list of node titles).
-    #[serde(default)]
-    pub entry_points: Vec<String>,
-    /// Best entry points (alternative START format from LLM).
-    #[serde(default)]
-    pub best_entry_points: Vec<EntryPoint>,
-    /// Selected nodes (another alternative START format - list of titles).
-    #[serde(default)]
-    pub selected_nodes: Vec<String>,
-    /// Selected node (singular - some LLMs return this format).
-    #[serde(default)]
-    pub selected_node: Option<String>,
-    /// Recommended node (another singular format).
-    #[serde(default)]
-    pub recommended_node: Option<String>,
-    /// Analysis wrapper (some LLMs nest response in "analysis" object).
-    #[serde(default)]
-    pub analysis: Option<AnalysisWrapper>,
-    /// Recommended search direction.
-    #[serde(default)]
-    pub direction: DirectionResponse,
-    /// Confidence level (0.0 - 1.0 or "high"/"medium"/"low").
-    #[serde(
-        default = "default_confidence",
-        deserialize_with = "deserialize_confidence"
-    )]
-    pub confidence: f32,
-    /// Reasoning for the decision.
-    #[serde(default)]
-    pub reasoning: String,
-    /// Relevant candidate indices from PRUNE response (binary yes/no).
-    #[serde(default)]
-    pub relevant_indices: Vec<usize>,
-    /// Alternative field name some LLMs use for relevant indices.
-    #[serde(default)]
-    pub relevant: Vec<usize>,
-}
-
-/// Custom deserializer for confidence that accepts both float and string.
-fn deserialize_confidence<'de, D>(deserializer: D) -> Result<f32, D::Error>
-where
-    D: serde::Deserializer<'de>,
-{
-    let value = serde_json::Value::deserialize(deserializer)?;
-    match value {
-        serde_json::Value::Number(n) => {
-            // Handle numeric value
-            Ok(n.as_f64().unwrap_or(0.5) as f32)
-        }
-        serde_json::Value::String(s) => {
-            // Handle string values like "high", "medium", "low"
-            let lower = s.to_lowercase();
-            let confidence = match lower.as_str() {
-                "high" | "very high" | "strong" => 0.9,
-                "medium" | "moderate" => 0.6,
-                "low" | "weak" => 0.3,
-                _ => 0.5, // default for unknown strings
-            };
-            Ok(confidence)
-        }
-        _ => Ok(0.5), // default for other types
-    }
-}
-
-/// Analysis wrapper for nested LLM responses.
-#[derive(Debug, Clone, Serialize, Deserialize)]
-pub struct AnalysisWrapper {
-    /// Query from analysis.
-    #[serde(default)]
-    pub query: Option<String>,
-    /// Intent detected.
-    #[serde(default)]
-    pub intent: Option<String>,
-    /// Selected node (singular).
-    #[serde(default)]
-    pub selected_node: Option<String>,
-    /// Selected nodes (plural).
-    #[serde(default)]
-    pub selected_nodes: Vec<String>,
-    /// Reasoning from analysis.
-    #[serde(default)]
-    pub reasoning: Option<String>,
-}
-
-/// Candidate score from LLM response.
-#[derive(Debug, Clone, Serialize, Deserialize)]
-pub struct CandidateScore {
-    /// Index of the candidate (0-based).
-    pub index: usize,
-    /// Score for this candidate (0.0 - 1.0).
-    pub score: f32,
-    /// Optional reason for the score.
-    #[serde(default)]
-    pub reason: Option<String>,
-}
-
-/// Candidate info for title matching.
-#[derive(Debug, Clone)]
-pub struct CandidateInfo {
-    /// Node ID.
-    pub node_id: NodeId,
-    /// Title of the node.
-    pub title: String,
-    /// Index in the candidates list.
-    pub index: usize,
-}
-
-/// Entry point from START response.
-#[derive(Debug, Clone, Serialize, Deserialize)]
-pub struct EntryPoint {
-    /// Node ID or index.
-    #[serde(default)]
-    pub node_id: Option<usize>,
-    /// Index of the candidate.
-    #[serde(default)]
-    pub index: Option<usize>,
-    /// Title of the entry point.
-    #[serde(default)]
-    pub title: Option<String>,
-    /// Relevance score (may be 1-5 or 0.0-1.0).
-    #[serde(default)]
-    pub relevance_score: Option<f32>,
-    /// Score (alternative field name).
-    #[serde(default)]
-    pub score: Option<f32>,
-}
-
-/// Top-3 candidate from LLM LOCatetop-3 response.
-#[derive(Debug, Clone, Serialize, Deserialize)]
-pub struct Top3Candidate {
-    /// Node ID from TO TO copy.
-    pub node_id: usize,
-    /// Relevance score (0.0-1.0).
-    pub relevance_score: f32,
-    /// Reason for the selection.
-    pub reason: String,
-}
-
-/// Direction response from LLM.
-#[derive(Debug, Clone, Serialize, Deserialize, Default)]
-#[serde(rename_all = "snake_case")]
-pub enum DirectionResponse {
-    #[default]
-    GoDeeper,
-    ExploreSiblings,
-    Backtrack,
-    FoundAnswer,
-}
-
-fn default_confidence() -> f32 {
-    0.5
-}
-
-/// Response parser for LLM outputs.
-///
-/// Implements layered parsing with graceful degradation:
-/// 1. Try JSON parse first
-/// 2. Fall back to regex extraction
-/// 3. Return default decision if all else fails
-///
-/// # Example
-///
-/// ```rust,ignore
-/// use vectorless::retrieval::pilot::ResponseParser;
-///
-/// let parser = ResponseParser::new();
-/// let decision = parser.parse(&llm_response, candidates, InterventionPoint::Fork);
-/// ```
-pub struct ResponseParser {
-    /// Regex for extracting JSON from markdown code blocks.
-    json_block_regex: Regex,
-    /// Regex for extracting confidence.
-    confidence_regex: Regex,
-    /// Regex for extracting direction.
-    direction_regex: Regex,
-}
-
-impl Default for ResponseParser {
-    fn default() -> Self {
-        Self::new()
-    }
-}
-
-impl ResponseParser {
-    /// Create a new response parser.
-    pub fn new() -> Self {
-        Self {
-            // Match JSON in markdown code blocks
-            json_block_regex: Regex::new(r"```(?:json)?\s*([\s\S]*?)```").unwrap(),
-            // Match confidence: 0.8 or confidence: 0.8
-            confidence_regex: Regex::new(r"(?i)confidence[:\s]+([0-9.]+)").unwrap(),
-            // Match direction keywords
-            direction_regex: Regex::new(
-                r"(?i)(go.?deeper|explore.?siblings|backtrack|found.?answer)",
-            )
-            .unwrap(),
-        }
-    }
-
-    /// Parse LLM response into a PilotDecision.
-    ///
-    /// # Arguments
-    ///
-    /// * `response` - Raw LLM response text
-    /// * `candidates` - Candidate info with NodeId, title, and index
-    /// * `point` - The intervention point
-    pub fn parse(
-        &self,
-        response: &str,
-        candidates: &[CandidateInfo],
-        point: InterventionPoint,
-    ) -> PilotDecision {
-        println!(
-            "[DEBUG] ResponseParser::parse() - candidates.len()={}",
-            candidates.len()
-        );
-
-        // Try JSON parse first
-        if let Some(decision) = self.try_json_parse(response, candidates, point) {
-            println!(
-                "[DEBUG] ResponseParser::parse() - JSON parse succeeded, ranked={}",
-                decision.ranked_candidates.len()
-            );
-            return decision;
-        }
-        println!("[DEBUG] ResponseParser::parse() - JSON parse failed, trying regex...");
-
-        // Try regex extraction
-        if let Some(decision) = self.try_regex_parse(response, candidates, point) {
-            println!(
-                "[DEBUG] ResponseParser::parse() - Regex parse succeeded, ranked={}",
-                decision.ranked_candidates.len()
-            );
-            return decision;
-        }
-        println!("[DEBUG] ResponseParser::parse() - Regex parse failed, using default decision");
-
-        // Return default decision
-        self.default_decision(candidates, point)
-    }
-
-    /// Try to parse response as JSON.
-    fn try_json_parse(
-        &self,
-        response: &str,
-        candidates: &[CandidateInfo],
-        point: InterventionPoint,
-    ) -> Option<PilotDecision> {
-        // First, try to extract JSON from code blocks
-        let json_str = if let Some(caps) = self.json_block_regex.captures(response) {
-            let extracted = caps.get(1)?.as_str().trim().to_string();
-            println!("[DEBUG] ResponseParser::try_json_parse() - Found JSON in code block");
-            extracted
-        } else {
-            // Try to find raw JSON object
-            let start = response.find('{')?;
-            let end = response.rfind('}')? + 1;
-            let extracted = response[start..end].to_string();
-            println!("[DEBUG] ResponseParser::try_json_parse() - Found raw JSON (no code block)");
-            extracted
-        };
-
-        println!(
-            "[DEBUG] ResponseParser::try_json_parse() - Extracted JSON:\n{}",
-            json_str
-        );
-
-        // Parse JSON
-        let llm_response: LlmResponse = match serde_json::from_str::<LlmResponse>(&json_str) {
-            Ok(r) => {
-                println!("[DEBUG] ResponseParser::try_json_parse() - JSON parsed successfully");
-                println!(
-                    "[DEBUG] ResponseParser::try_json_parse() - ranked_candidates count: {}",
-                    r.ranked_candidates.len()
-                );
-                r
-            }
-            Err(e) => {
-                println!(
-                    "[DEBUG] ResponseParser::try_json_parse() - JSON parse FAILED: {}",
-                    e
-                );
-                warn!("Failed to parse LLM response as JSON: {}", e);
-                return None;
-            }
-        };
-
-        // Convert to PilotDecision
-        Some(self.llm_response_to_decision(llm_response, candidates, point))
-    }
-
-    /// Try to parse response using regex.
-    fn try_regex_parse(
-        &self,
-        response: &str,
-        candidates: &[CandidateInfo],
-        point: InterventionPoint,
-    ) -> Option<PilotDecision> {
-        // Extract confidence
-        let confidence = self
-            .confidence_regex
-            .captures(response)
-            .and_then(|caps| caps.get(1)?.as_str().parse::<f32>().ok())
-            .unwrap_or(0.5)
-            .clamp(0.0, 1.0);
-
-        // Extract direction
-        let direction = self
-            .direction_regex
-            .captures(response)
-            .map(|caps| {
-                let dir = caps.get(1)?.as_str().to_lowercase();
-                match dir.as_str() {
-                    d if d.contains("deeper") => Some(SearchDirection::GoDeeper {
-                        reason: String::new(),
-                    }),
-                    d if d.contains("sibling") => Some(SearchDirection::ExploreSiblings {
-                        recommended: vec![],
-                    }),
-                    d if d.contains("backtrack") => Some(SearchDirection::Backtrack {
-                        reason: String::new(),
-                        alternative_branches: vec![],
-                    }),
-                    d if d.contains("found") || d.contains("answer") => {
-                        Some(SearchDirection::FoundAnswer { confidence })
-                    }
-                    _ => None,
-                }
-            })
-            .flatten()
-            .unwrap_or_else(|| SearchDirection::GoDeeper {
-                reason: String::new(),
-            });
-
-        // Try to extract candidate rankings from numbered list
-        let ranked = self.extract_ranked_candidates(response, candidates);
-
-        if ranked.is_empty() && candidates.len() > 1 {
-            return None; // Regex parse failed
-        }
-
-        Some(PilotDecision {
-            ranked_candidates: ranked,
-            direction,
-            confidence,
-            reasoning: "Extracted via regex".to_string(),
-            intervention_point: point,
-        })
-    }
-
-    /// Extract ranked candidates from text using patterns.
-    fn extract_ranked_candidates(
-        &self,
-        response: &str,
-        candidates: &[CandidateInfo],
-    ) -> Vec<RankedCandidate> {
-        let mut ranked = Vec::new();
-
-        // Pattern: "1. Candidate Name (score: 0.8)"
-        let ranking_pattern =
-            Regex::new(r"(\d+)[.\)]\s*(?:Candidate\s*)?(\d+)[\s:]+(?:score[:\s]*)?([0-9.]+)?")
-                .unwrap();
-
-        for caps in ranking_pattern.captures_iter(response) {
-            if let Some(index_match) = caps.get(2) {
-                if let Ok(index) = index_match.as_str().parse::<usize>() {
-                    let score: f32 = caps
-                        .get(3)
-                        .and_then(|m| m.as_str().parse().ok())
-                        .unwrap_or(0.5);
-
-                    if index < candidates.len() {
-                        ranked.push(RankedCandidate {
-                            node_id: candidates[index].node_id,
-                            score: score.clamp(0.0, 1.0),
-                            reason: None,
-                        });
-                    }
-                }
-            }
-        }
-
-        // If we got some rankings, return them
-        if !ranked.is_empty() {
-            return ranked;
-        }
-
-        // Fallback: look for numbers that might be candidate indices
-        let number_pattern = Regex::new(r"\b(\d+)\b").unwrap();
-        let mut seen = std::collections::HashSet::new();
-
-        for caps in number_pattern.captures_iter(response) {
-            if let Some(match_1) = caps.get(1) {
-                if let Ok(idx) = match_1.as_str().parse::<usize>() {
-                    if idx < candidates.len() && seen.insert(idx) {
-                        ranked.push(RankedCandidate {
-                            node_id: candidates[idx].node_id,
-                            score: 1.0 - (ranked.len() as f32 * 0.1), // Decreasing scores
-                            reason: None,
-                        });
-                    }
-                }
-            }
-
-            if ranked.len() >= candidates.len() {
-                break;
-            }
-        }
-
-        ranked
-    }
-
-    /// Convert LlmResponse to PilotDecision.
-    fn llm_response_to_decision(
-        &self,
-        mut llm_response: LlmResponse,
-        candidates: &[CandidateInfo],
-        point: InterventionPoint,
-    ) -> PilotDecision {
-        println!(
-            "[DEBUG] ResponseParser::llm_response_to_decision() - point={:?}",
-            point
-        );
-        println!(
-            "[DEBUG] ResponseParser::llm_response_to_decision() - ranked_candidates.len()={}",
-            llm_response.ranked_candidates.len()
-        );
-        println!(
-            "[DEBUG] ResponseParser::llm_response_to_decision() - best_entry_points.len()={}",
-            llm_response.best_entry_points.len()
-        );
-        println!(
-            "[DEBUG] ResponseParser::llm_response_to_decision() - entry_points.len()={}",
-            llm_response.entry_points.len()
-        );
-        println!(
-            "[DEBUG] ResponseParser::llm_response_to_decision() - selected_nodes.len()={}",
-            llm_response.selected_nodes.len()
-        );
-        println!(
-            "[DEBUG] ResponseParser::llm_response_to_decision() - selected_node={:?}",
-            llm_response.selected_node
-        );
-        println!(
-            "[DEBUG] ResponseParser::llm_response_to_decision() - analysis={:?}",
-            llm_response
-                .analysis
-                .as_ref()
-                .map(|a| (&a.selected_node, &a.selected_nodes))
-        );
-
-        // Convert candidate scores to RankedCandidate
-        let mut ranked_candidates: Vec<RankedCandidate> = llm_response
-            .ranked_candidates
-            .iter()
-            .filter_map(|cs| {
-                if cs.index < candidates.len() {
-                    Some(RankedCandidate {
-                        node_id: candidates[cs.index].node_id,
-                        score: cs.score.clamp(0.0, 1.0),
-                        reason: cs.reason.clone(),
-                    })
-                } else {
-                    None
-                }
-            })
-            .collect();
-
-        // Handle START response format: best_entry_points, entry_points, or selected_nodes
-        if ranked_candidates.is_empty() {
-            // Try to convert best_entry_points (format: [{"node_id": 1, "title": "...", "relevance_score": 5}])
-            for entry in &llm_response.best_entry_points {
-                // Get index from either node_id or index field
-                // node_id is 1-indexed from LLM, convert to 0-indexed
-                let idx = if let Some(nid) = entry.node_id {
-                    if nid > 0 { nid - 1 } else { nid }
-                } else if let Some(idx) = entry.index {
-                    idx
-                } else {
-                    continue; // Skip if no valid index
-                };
-
-                if idx < candidates.len() {
-                    let score = entry.relevance_score.or(entry.score).unwrap_or(0.5) / 5.0; // Normalize 1-5 scale to 0.0-1.0
-                    ranked_candidates.push(RankedCandidate {
-                        node_id: candidates[idx].node_id,
-                        score: score.clamp(0.0, 1.0),
-                        reason: entry.title.clone(),
-                    });
-                    println!(
-                        "[DEBUG] ResponseParser - converted best_entry_point[{}] to ranked_candidate (idx={}, score={:.2})",
-                        idx, idx, score
-                    );
-                }
-            }
-
-            // Try to convert selected_nodes (format: ["Project Documentation", "Overview"])
-            // Match by title
-            for selected_title in &llm_response.selected_nodes {
-                for candidate in candidates {
-                    if Self::titles_match(selected_title, &candidate.title) {
-                        ranked_candidates.push(RankedCandidate {
-                            node_id: candidate.node_id,
-                            score: 0.9, // High score for title match
-                            reason: Some(format!("Title match: {}", selected_title)),
-                        });
-                        println!(
-                            "[DEBUG] ResponseParser - matched selected_node '{}' to candidate '{}' (index={})",
-                            selected_title, candidate.title, candidate.index
-                        );
-                        break; // Only match once per selected_node
-                    }
-                }
-            }
-
-            // Try to convert selected_node (singular - format: "Project Documentation")
-            if let Some(ref single_node) = llm_response.selected_node {
-                for candidate in candidates {
-                    if Self::titles_match(single_node, &candidate.title) {
-                        if !ranked_candidates
-                            .iter()
-                            .any(|rc| rc.node_id == candidate.node_id)
-                        {
-                            ranked_candidates.push(RankedCandidate {
-                                node_id: candidate.node_id,
-                                score: 0.9,
-                                reason: Some(format!("Title match (singular): {}", single_node)),
-                            });
-                            println!(
-                                "[DEBUG] ResponseParser - matched selected_node (singular) '{}' to candidate '{}' (index={})",
-                                single_node, candidate.title, candidate.index
-                            );
-                        }
-                        break;
-                    }
-                }
-            }
-
-            // Try to convert recommended_node (another singular format)
-            if let Some(ref recommended) = llm_response.recommended_node {
-                for candidate in candidates {
-                    if Self::titles_match(recommended, &candidate.title) {
-                        if !ranked_candidates
-                            .iter()
-                            .any(|rc| rc.node_id == candidate.node_id)
-                        {
-                            ranked_candidates.push(RankedCandidate {
-                                node_id: candidate.node_id,
-                                score: 0.85,
-                                reason: Some(format!("Recommended node: {}", recommended)),
-                            });
-                            println!(
-                                "[DEBUG] ResponseParser - matched recommended_node '{}' to candidate '{}' (index={})",
-                                recommended, candidate.title, candidate.index
-                            );
-                        }
-                        break;
-                    }
-                }
-            }
-
-            // Try to extract from analysis wrapper if present
-            if let Some(ref analysis) = llm_response.analysis {
-                // Check analysis.selected_nodes (plural array)
-                for selected_title in &analysis.selected_nodes {
-                    for candidate in candidates {
-                        if Self::titles_match(selected_title, &candidate.title) {
-                            if !ranked_candidates
-                                .iter()
-                                .any(|rc| rc.node_id == candidate.node_id)
-                            {
-                                ranked_candidates.push(RankedCandidate {
-                                    node_id: candidate.node_id,
-                                    score: 0.85,
-                                    reason: Some(format!(
-                                        "Analysis selected_nodes: {}",
-                                        selected_title
-                                    )),
-                                });
-                                println!(
-                                    "[DEBUG] ResponseParser - matched analysis.selected_nodes '{}' to candidate '{}' (index={})",
-                                    selected_title, candidate.title, candidate.index
-                                );
-                            }
-                            break;
-                        }
-                    }
-                }
-
-                // Check analysis.selected_node (singular)
-                if let Some(ref single_node) = analysis.selected_node {
-                    for candidate in candidates {
-                        if Self::titles_match(single_node, &candidate.title) {
-                            if !ranked_candidates
-                                .iter()
-                                .any(|rc| rc.node_id == candidate.node_id)
-                            {
-                                ranked_candidates.push(RankedCandidate {
-                                    node_id: candidate.node_id,
-                                    score: 0.85,
-                                    reason: Some(format!(
-                                        "Analysis selected_node: {}",
-                                        single_node
-                                    )),
-                                });
-                                println!(
-                                    "[DEBUG] ResponseParser - matched analysis.selected_node (singular) '{}' to candidate '{}' (index={})",
-                                    single_node, candidate.title, candidate.index
-                                );
-                            }
-                            break;
-                        }
-                    }
-                }
-
-                // Use analysis.reasoning if top-level reasoning is empty
-                if llm_response.reasoning.is_empty() {
-                    if let Some(ref r) = analysis.reasoning {
-                        llm_response.reasoning = r.clone();
-                    }
-                }
-            }
-
-            // Try to convert entry_points (format: ["Node Title 1", "Node Title 2"])
-            for entry_title in &llm_response.entry_points {
-                for candidate in candidates {
-                    if Self::titles_match(entry_title, &candidate.title) {
-                        // Check if already added
-                        if !ranked_candidates
-                            .iter()
-                            .any(|rc| rc.node_id == candidate.node_id)
-                        {
-                            ranked_candidates.push(RankedCandidate {
-                                node_id: candidate.node_id,
-                                score: 0.8, // Slightly lower score for entry_points
-                                reason: Some(format!("Entry point: {}", entry_title)),
-                            });
-                            println!(
-                                "[DEBUG] ResponseParser - matched entry_point '{}' to candidate '{}' (index={})",
-                                entry_title, candidate.title, candidate.index
-                            );
-                        }
-                        break;
-                    }
-                }
-            }
-        }
-
-        // Handle PRUNE response format: relevant_indices
-        if ranked_candidates.is_empty() {
-            let indices: Vec<usize> = if !llm_response.relevant_indices.is_empty() {
-                llm_response.relevant_indices.clone()
-            } else if !llm_response.relevant.is_empty() {
-                llm_response.relevant.clone()
-            } else {
-                Vec::new()
-            };
-
-            for idx in &indices {
-                if *idx < candidates.len() {
-                    ranked_candidates.push(RankedCandidate {
-                        node_id: candidates[*idx].node_id,
-                        score: 1.0, // Relevant = high score
-                        reason: Some(format!("Marked relevant (index {})", idx)),
-                    });
-                }
-            }
-
-            // Non-relevant candidates get low score (for completeness)
-            if !ranked_candidates.is_empty() {
-                let relevant_ids: std::collections::HashSet<NodeId> =
-                    ranked_candidates.iter().map(|rc| rc.node_id).collect();
-                for candidate in candidates {
-                    if !relevant_ids.contains(&candidate.node_id) {
-                        ranked_candidates.push(RankedCandidate {
-                            node_id: candidate.node_id,
-                            score: 0.1, // Not relevant
-                            reason: None,
-                        });
-                    }
-                }
-            }
-        }
-
-        // Convert direction
-        let direction = match llm_response.direction {
-            DirectionResponse::GoDeeper => SearchDirection::GoDeeper {
-                reason: llm_response.reasoning.clone(),
-            },
-            DirectionResponse::ExploreSiblings => SearchDirection::ExploreSiblings {
-                recommended: ranked_candidates
-                    .iter()
-                    .take(3)
-                    .map(|c| c.node_id)
-                    .collect(),
-            },
-            DirectionResponse::Backtrack => SearchDirection::Backtrack {
-                reason: llm_response.reasoning.clone(),
-                alternative_branches: ranked_candidates
-                    .iter()
-                    .take(3)
-                    .map(|c| c.node_id)
-                    .collect(),
-            },
-            DirectionResponse::FoundAnswer => SearchDirection::FoundAnswer {
-                confidence: llm_response.confidence,
-            },
-        };
-
-        println!(
-            "[DEBUG] ResponseParser::llm_response_to_decision() - final ranked_candidates.len()={}",
-            ranked_candidates.len()
-        );
-
-        PilotDecision {
-            ranked_candidates,
-            direction,
-            confidence: llm_response.confidence.clamp(0.0, 1.0),
-            reasoning: llm_response.reasoning,
-            intervention_point: point,
-        }
-    }
-
-    /// Check if two titles match (fuzzy matching).
-    fn titles_match(llm_title: &str, candidate_title: &str) -> bool {
-        let llm_lower = llm_title.to_lowercase().trim().to_string();
-        let candidate_lower = candidate_title.to_lowercase().trim().to_string();
-
-        // Exact match
-        if llm_lower == candidate_lower {
-            return true;
-        }
-
-        // Contains match
-        if llm_lower.contains(&candidate_lower) || candidate_lower.contains(&llm_lower) {
-            return true;
-        }
-
-        // Word overlap match (at least 50% of words match)
-        let llm_words: std::collections::HashSet<&str> = llm_lower.split_whitespace().collect();
-        let candidate_words: std::collections::HashSet<&str> =
-            candidate_lower.split_whitespace().collect();
-        let overlap = llm_words.intersection(&candidate_words).count();
-        let min_words = llm_words.len().min(candidate_words.len());
-        if min_words > 0 && overlap as f32 / min_words as f32 >= 0.5 {
-            return true;
-        }
-
-        false
-    }
-
-    /// Create a default decision when parsing fails.
-    fn default_decision(
-        &self,
-        candidates: &[CandidateInfo],
-        point: InterventionPoint,
-    ) -> PilotDecision {
-        // Score candidates uniformly
-        let ranked: Vec<RankedCandidate> = candidates
-            .iter()
-            .enumerate()
-            .map(|(i, c)| RankedCandidate {
-                node_id: c.node_id,
-                score: 1.0 / (i + 1) as f32, // Decreasing scores
-                reason: None,
-            })
-            .collect();
-
-        PilotDecision {
-            ranked_candidates: ranked,
-            direction: SearchDirection::GoDeeper {
-                reason: String::new(),
-            },
-            confidence: 0.0,
-            reasoning: "Default decision (parsing failed)".to_string(),
-            intervention_point: point,
-        }
-    }
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-    use indextree::Arena;
-
-    fn create_test_node_ids(count: usize) -> Vec<NodeId> {
-        let mut arena = Arena::new();
-        let mut ids = Vec::new();
-        for i in 0..count {
-            let node = crate::document::TreeNode {
-                title: format!("Node {}", i),
-                structure: String::new(),
-                content: String::new(),
-                summary: String::new(),
-                depth: 0,
-                start_index: 1,
-                end_index: 1,
-                start_page: None,
-                end_page: None,
-                node_id: None,
-                physical_index: None,
-                token_count: None,
-                references: Vec::new(),
-            };
-            ids.push(NodeId(arena.new_node(node)));
-        }
-        ids
-    }
-}
diff --git a/rust/src/retrieval/pilot/prompts/builder.rs b/rust/src/retrieval/pilot/prompts/builder.rs
deleted file mode 100644
index c5301ad2..00000000
--- a/rust/src/retrieval/pilot/prompts/builder.rs
+++ /dev/null
@@ -1,299 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Prompt builder for constructing LLM prompts.
-//!
-//! Combines templates with context to produce final prompts.
-
-use super::super::builder::PilotContext;
-use super::super::decision::InterventionPoint;
-use super::templates::{BacktrackPrompt, EvaluatePrompt, ForkPrompt, PromptTemplate, StartPrompt};
-
-/// Built prompt ready for LLM call.
-#[derive(Debug, Clone)]
-pub struct BuiltPrompt {
-    /// System prompt.
-    pub system: String,
-    /// User prompt.
-    pub user: String,
-    /// Total estimated tokens.
-    pub estimated_tokens: usize,
-}
-
-/// Builder for constructing LLM prompts.
-///
-/// Manages prompt templates and constructs final prompts
-/// by combining templates with context.
-///
-/// # Example
-///
-/// ```rust,ignore
-/// use vectorless::retrieval::pilot::prompts::PromptBuilder;
-///
-/// let builder = PromptBuilder::new();
-/// let prompt = builder.build(InterventionPoint::Fork, &context);
-/// println!("System: {}", prompt.system);
-/// println!("User: {}", prompt.user);
-/// ```
-pub struct PromptBuilder {
-    start_template: StartPrompt,
-    fork_template: ForkPrompt,
-    backtrack_template: BacktrackPrompt,
-    evaluate_template: EvaluatePrompt,
-}
-
-impl Default for PromptBuilder {
-    fn default() -> Self {
-        Self::new()
-    }
-}
-
-impl PromptBuilder {
-    /// Create a new prompt builder with default templates.
-    pub fn new() -> Self {
-        Self {
-            start_template: StartPrompt::with_fallback(),
-            fork_template: ForkPrompt::with_fallback(),
-            backtrack_template: BacktrackPrompt::with_fallback(),
-            evaluate_template: EvaluatePrompt::with_fallback(),
-        }
-    }
-
-    /// Create with custom templates.
-    pub fn with_templates(
-        start: StartPrompt,
-        fork: ForkPrompt,
-        backtrack: BacktrackPrompt,
-        evaluate: EvaluatePrompt,
-    ) -> Self {
-        Self {
-            start_template: start,
-            fork_template: fork,
-            backtrack_template: backtrack,
-            evaluate_template: evaluate,
-        }
-    }
-
-    /// Build a prompt for the given intervention point.
-    pub fn build(&self, point: InterventionPoint, context: &PilotContext) -> BuiltPrompt {
-        match point {
-            InterventionPoint::Start => self.build_start(context),
-            InterventionPoint::Fork => self.build_fork(context),
-            InterventionPoint::Backtrack => self.build_backtrack(context),
-            InterventionPoint::Evaluate => self.build_evaluate(context),
-            InterventionPoint::Prune => self.build_fork(context), // Prune reuses fork template
-        }
-    }
-
-    /// Build START prompt.
-    fn build_start(&self, context: &PilotContext) -> BuiltPrompt {
-        let template = &self.start_template;
-        let system = template.system_prompt().to_string();
-        let user = self.fill_template(template.user_prompt_template(), context);
-        let estimated_tokens = self.estimate_tokens(&system) + self.estimate_tokens(&user);
-
-        BuiltPrompt {
-            system,
-            user,
-            estimated_tokens,
-        }
-    }
-
-    /// Build FORK prompt.
-    fn build_fork(&self, context: &PilotContext) -> BuiltPrompt {
-        let template = &self.fork_template;
-        let system = template.system_prompt().to_string();
-        let user = self.fill_template(template.user_prompt_template(), context);
-        let estimated_tokens = self.estimate_tokens(&system) + self.estimate_tokens(&user);
-
-        BuiltPrompt {
-            system,
-            user,
-            estimated_tokens,
-        }
-    }
-
-    /// Build BACKTRACK prompt.
-    fn build_backtrack(&self, context: &PilotContext) -> BuiltPrompt {
-        let template = &self.backtrack_template;
-        let system = template.system_prompt().to_string();
-        let user = self.fill_template(template.user_prompt_template(), context);
-        let estimated_tokens = self.estimate_tokens(&system) + self.estimate_tokens(&user);
-
-        BuiltPrompt {
-            system,
-            user,
-            estimated_tokens,
-        }
-    }
-
-    /// Build EVALUATE prompt.
-    fn build_evaluate(&self, context: &PilotContext) -> BuiltPrompt {
-        let template = &self.evaluate_template;
-        let system = template.system_prompt().to_string();
-        let user = self.fill_template(template.user_prompt_template(), context);
-        let estimated_tokens = self.estimate_tokens(&system) + self.estimate_tokens(&user);
-
-        BuiltPrompt {
-            system,
-            user,
-            estimated_tokens,
-        }
-    }
-
-    /// Fill template with context.
-    fn fill_template(&self, template: &str, context: &PilotContext) -> String {
-        let mut result = template.to_string();
-
-        // Replace context placeholder with full context
-        result = result.replace("{context}", &context.to_string());
-
-        // Replace individual sections
-        result = result.replace("{query}", &context.query_section);
-        result = result.replace("{path}", &context.path_section);
-        result = result.replace("{candidates}", &context.candidates_section);
-        result = result.replace("{toc}", &context.toc_section);
-
-        result
-    }
-
-    /// Estimate token count for a string.
-    fn estimate_tokens(&self, text: &str) -> usize {
-        let char_count = text.chars().count();
-        let chinese_count = text
-            .chars()
-            .filter(|c| ('\u{4E00}'..='\u{9FFF}').contains(c))
-            .count();
-        let english_count = char_count - chinese_count;
-
-        (chinese_count as f32 / 1.5 + english_count as f32 / 4.0).ceil() as usize
-    }
-
-    /// Get the template for an intervention point.
-    pub fn get_template(&self, point: InterventionPoint) -> &dyn PromptTemplate {
-        match point {
-            InterventionPoint::Start => &self.start_template,
-            InterventionPoint::Fork => &self.fork_template,
-            InterventionPoint::Backtrack => &self.backtrack_template,
-            InterventionPoint::Evaluate => &self.evaluate_template,
-            InterventionPoint::Prune => &self.fork_template, // Prune reuses fork template
-        }
-    }
-
-    /// Get output format hint for an intervention point.
-    pub fn output_format(&self, point: InterventionPoint) -> &'static str {
-        match point {
-            InterventionPoint::Start => {
-                r#"{
-  "entry_points": ["list of starting node titles"],
-  "reasoning": "explanation",
-  "confidence": 0.0-1.0
-}"#
-            }
-            InterventionPoint::Fork => {
-                r#"{
-  "ranked_candidates": [
-    {"index": 0, "score": 0.9, "reason": "explanation"}
-  ],
-  "direction": "go_deeper|explore_siblings|backtrack|found_answer",
-  "confidence": 0.0-1.0,
-  "reasoning": "explanation"
-}"#
-            }
-            InterventionPoint::Backtrack => {
-                r#"{
-  "alternative_branches": [
-    {"index": 0, "score": 0.8, "reason": "explanation"}
-  ],
-  "direction": "backtrack",
-  "confidence": 0.0-1.0,
-  "reasoning": "explanation"
-}"#
-            }
-            InterventionPoint::Evaluate => {
-                r#"{
-  "relevance_score": 0.0-1.0,
-  "is_answer": true|false,
-  "direction": "go_deeper|found_answer",
-  "confidence": 0.0-1.0,
-  "reasoning": "explanation"
-}"#
-            }
-            InterventionPoint::Prune => {
-                r#"{
-  "relevant_indices": [0, 2, 5],
-  "confidence": 0.0-1.0,
-  "reasoning": "explanation"
-}"#
-            }
-        }
-    }
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-
-    #[test]
-    fn test_prompt_builder_creation() {
-        let builder = PromptBuilder::new();
-        assert!(!builder.start_template.system_prompt().is_empty());
-        assert!(!builder.fork_template.system_prompt().is_empty());
-    }
-
-    #[test]
-    fn test_build_fork_prompt() {
-        let builder = PromptBuilder::new();
-        let context = PilotContext {
-            query_section: "Query: test query\n".to_string(),
-            path_section: "Path: Root → Test\n".to_string(),
-            candidates_section: "Candidates:\n1. Option A\n".to_string(),
-            toc_section: String::new(),
-            estimated_tokens: 50,
-        };
-
-        let prompt = builder.build(InterventionPoint::Fork, &context);
-
-        assert!(!prompt.system.is_empty());
-        assert!(!prompt.user.is_empty());
-        assert!(prompt.user.contains("test query") || prompt.user.contains("Query"));
-    }
-
-    #[test]
-    fn test_build_start_prompt() {
-        let builder = PromptBuilder::new();
-        let context = PilotContext {
-            query_section: "Query: how to configure\n".to_string(),
-            path_section: String::new(),
-            candidates_section: String::new(),
-            toc_section: "TOC:\n1. Config\n".to_string(),
-            estimated_tokens: 30,
-        };
-
-        let prompt = builder.build(InterventionPoint::Start, &context);
-
-        assert!(!prompt.system.is_empty());
-        assert!(prompt.estimated_tokens > 0);
-    }
-
-    #[test]
-    fn test_output_format() {
-        let builder = PromptBuilder::new();
-
-        let fork_format = builder.output_format(InterventionPoint::Fork);
-        assert!(fork_format.contains("ranked_candidates"));
-
-        let start_format = builder.output_format(InterventionPoint::Start);
-        assert!(start_format.contains("entry_points"));
-    }
-
-    #[test]
-    fn test_template_fallback() {
-        let start = StartPrompt::with_fallback();
-        assert!(!start.system_prompt().is_empty());
-        assert!(!start.user_prompt_template().is_empty());
-
-        let fork = ForkPrompt::with_fallback();
-        assert!(!fork.system_prompt().is_empty());
-    }
-}
diff --git a/rust/src/retrieval/pilot/prompts/mod.rs b/rust/src/retrieval/pilot/prompts/mod.rs
deleted file mode 100644
index aeee13c4..00000000
--- a/rust/src/retrieval/pilot/prompts/mod.rs
+++ /dev/null
@@ -1,15 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Prompt builders for Pilot LLM calls.
-//!
-//! Provides specialized prompts for each intervention point:
-//! - START: Search initialization guidance
-//! - FORK: Branch selection at decision points
-//! - BACKTRACK: Recovery after dead ends
-//! - EVALUATE: Node relevance assessment
-
-mod builder;
-mod templates;
-
-pub use builder::PromptBuilder;
diff --git a/rust/src/retrieval/pilot/prompts/system_backtrack.txt b/rust/src/retrieval/pilot/prompts/system_backtrack.txt
deleted file mode 100644
index fef7e16b..00000000
--- a/rust/src/retrieval/pilot/prompts/system_backtrack.txt
+++ /dev/null
@@ -1,11 +0,0 @@
-You are a document navigation assistant specialized in recovery strategies.
-
-Your task is to analyze why a search path failed to find the answer and suggest alternative branches to explore.
-
-Guidelines:
-- Identify what made the failed path unsuccessful
-- Look for unexplored branches that might contain the answer
-- Consider if the query might be satisfied by combining information from multiple branches
-- Suggest the most promising alternatives first
-
-You must respond in valid JSON format.
diff --git a/rust/src/retrieval/pilot/prompts/system_complexity.txt b/rust/src/retrieval/pilot/prompts/system_complexity.txt
deleted file mode 100644
index e344ae70..00000000
--- a/rust/src/retrieval/pilot/prompts/system_complexity.txt
+++ /dev/null
@@ -1,21 +0,0 @@
-You are a query complexity classifier for a document retrieval system.
-Classify the query into exactly one of: "simple", "medium", "complex".
-
-Definitions:
-- simple: direct lookup, definition, single-fact question (e.g. "what is X", "define Y")
-- medium: requires combining information from 2-3 sections (e.g. "how does X work with Y")
-- complex: requires comparison, analysis, synthesis, multi-step reasoning, or information from many parts (e.g. "compare X and Y", "analyze the impact of Z")
-
-The query may be in English, Chinese, or mixed language.
-
-CRITICAL: You MUST respond with ONLY valid JSON. No markdown, no explanation, just the JSON object.
-
-Your response must have this EXACT structure:
-{
-  "complexity": "simple",
-  "reasoning": "brief explanation"
-}
-
-Where:
-- complexity: MUST be exactly one of: "simple", "medium", "complex"
-- reasoning: MUST be a string
diff --git a/rust/src/retrieval/pilot/prompts/system_evaluate.txt b/rust/src/retrieval/pilot/prompts/system_evaluate.txt
deleted file mode 100644
index f5d66410..00000000
--- a/rust/src/retrieval/pilot/prompts/system_evaluate.txt
+++ /dev/null
@@ -1,11 +0,0 @@
-You are a document analysis assistant specialized in content evaluation.
-
-Your task is to determine if the current node contains the answer to the user's query.
-
-Guidelines:
-- Carefully analyze the node's content against the query
-- Consider if the content fully or partially answers the query
-- If the answer seems to be in child nodes, suggest going deeper
-- Only mark as "found_answer" if you're confident the content satisfies the query
-
-You must respond in valid JSON format.
diff --git a/rust/src/retrieval/pilot/prompts/system_fork.txt b/rust/src/retrieval/pilot/prompts/system_fork.txt
deleted file mode 100644
index e4a4a5f8..00000000
--- a/rust/src/retrieval/pilot/prompts/system_fork.txt
+++ /dev/null
@@ -1,19 +0,0 @@
-You are a document navigation assistant specialized in making decisions at branch points.
-
-Your task is to rank candidate branches by their likelihood of containing the answer to the user's query.
-
-Guidelines:
-- Analyze each candidate's title and summary for relevance
-- Consider the current search path and context
-- Higher scores should go to more relevant candidates
-- If uncertain between candidates, reflect this in closer scores
-- If no candidate seems relevant, suggest backtracking
-
-Scoring guide:
-- 0.9-1.0: Highly confident this branch contains the answer
-- 0.7-0.9: Likely contains relevant information
-- 0.5-0.7: Possibly relevant, worth exploring
-- 0.3-0.5: Unlikely but may have related content
-- 0.0-0.3: Not relevant
-
-You must respond in valid JSON format.
diff --git a/rust/src/retrieval/pilot/prompts/system_start.txt b/rust/src/retrieval/pilot/prompts/system_start.txt
deleted file mode 100644
index 086f9e96..00000000
--- a/rust/src/retrieval/pilot/prompts/system_start.txt
+++ /dev/null
@@ -1,15 +0,0 @@
-You are a document navigation assistant. Your task is to identify the best entry points for searching a hierarchical document based on a user query.
-
-CRITICAL: You MUST respond with ONLY valid JSON. No markdown, no explanation, just the JSON object.
-
-Your response must to have this EXACT structure:
-{
-  "entry_points": ["list of node titles as strings"],
-  "reasoning": "explanation string",
-  "confidence": 0.85
-}
-
-Where:
-- entry_points: MUST be an array of node titles (strings) from the candidate list
-- reasoning: MUST be a string
-- confidence: MUST be a number (0.0 to 1.0), not a string like "high"
diff --git a/rust/src/retrieval/pilot/prompts/templates.rs b/rust/src/retrieval/pilot/prompts/templates.rs
deleted file mode 100644
index 6829ade8..00000000
--- a/rust/src/retrieval/pilot/prompts/templates.rs
+++ /dev/null
@@ -1,522 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Prompt templates for different intervention points.
-//!
-//! Each template is designed for a specific decision context
-//! and follows a consistent structure:
-//! 1. System context (role definition)
-//! 2. Task description
-//! 3. Input format
-//! 4. Output format (JSON schema)
-
-use super::super::decision::InterventionPoint;
-
-/// Common trait for prompt templates.
-pub trait PromptTemplate: Send + Sync {
-    /// Get the system prompt.
-    fn system_prompt(&self) -> &str;
-
-    /// Get the user prompt template.
-    fn user_prompt_template(&self) -> &str;
-
-    /// Get the intervention point this template is for.
-    fn intervention_point(&self) -> InterventionPoint;
-
-    /// Get the expected output format (JSON schema hint).
-    fn output_format_hint(&self) -> &str;
-}
-
-/// Prompt template for START intervention point.
-///
-/// Used at the beginning of search to:
-/// - Understand query intent
-/// - Identify entry points
-/// - Set search direction
-#[derive(Debug, Clone)]
-pub struct StartPrompt {
-    system: String,
-    template: String,
-}
-
-impl Default for StartPrompt {
-    fn default() -> Self {
-        Self::with_fallback()
-    }
-}
-
-impl StartPrompt {
-    /// Create a new start prompt template.
-    pub fn new() -> Self {
-        Self::default()
-    }
-
-    /// Create with custom templates.
-    pub fn with_templates(system: String, template: String) -> Self {
-        Self { system, template }
-    }
-}
-
-impl PromptTemplate for StartPrompt {
-    fn system_prompt(&self) -> &str {
-        &self.system
-    }
-
-    fn user_prompt_template(&self) -> &str {
-        &self.template
-    }
-
-    fn intervention_point(&self) -> InterventionPoint {
-        InterventionPoint::Start
-    }
-
-    fn output_format_hint(&self) -> &str {
-        r#"{
-  "entry_points": ["list of node titles to start from"],
-  "reasoning": "explanation of why these entry points",
-  "confidence": 0.0-1.0
-}"#
-    }
-}
-
-/// Prompt template for FORK intervention point.
-///
-/// Used when multiple candidate branches are available to:
-/// - Rank candidates by relevance
-/// - Recommend search direction
-/// - Provide reasoning
-#[derive(Debug, Clone)]
-pub struct ForkPrompt {
-    system: String,
-    template: String,
-}
-
-impl Default for ForkPrompt {
-    fn default() -> Self {
-        Self::with_fallback()
-    }
-}
-
-impl ForkPrompt {
-    /// Create a new fork prompt template.
-    pub fn new() -> Self {
-        Self::default()
-    }
-
-    /// Create with custom templates.
-    pub fn with_templates(system: String, template: String) -> Self {
-        Self { system, template }
-    }
-}
-
-impl PromptTemplate for ForkPrompt {
-    fn system_prompt(&self) -> &str {
-        &self.system
-    }
-
-    fn user_prompt_template(&self) -> &str {
-        &self.template
-    }
-
-    fn intervention_point(&self) -> InterventionPoint {
-        InterventionPoint::Fork
-    }
-
-    fn output_format_hint(&self) -> &str {
-        r#"{
-  "ranked_candidates": [
-    {"index": 0, "score": 0.9, "reason": "why this candidate"}
-  ],
-  "direction": "go_deeper|explore_siblings|backtrack|found_answer",
-  "confidence": 0.0-1.0,
-  "reasoning": "overall explanation"
-}"#
-    }
-}
-
-/// Prompt template for BACKTRACK intervention point.
-///
-/// Used when search needs to recover from a dead end to:
-/// - Analyze failure reason
-/// - Suggest alternative branches
-/// - Guide recovery strategy
-#[derive(Debug, Clone)]
-pub struct BacktrackPrompt {
-    system: String,
-    template: String,
-}
-
-impl Default for BacktrackPrompt {
-    fn default() -> Self {
-        Self::with_fallback()
-    }
-}
-
-impl BacktrackPrompt {
-    /// Create a new backtrack prompt template.
-    pub fn new() -> Self {
-        Self::default()
-    }
-
-    /// Create with custom templates.
-    pub fn with_templates(system: String, template: String) -> Self {
-        Self { system, template }
-    }
-}
-
-impl PromptTemplate for BacktrackPrompt {
-    fn system_prompt(&self) -> &str {
-        &self.system
-    }
-
-    fn user_prompt_template(&self) -> &str {
-        &self.template
-    }
-
-    fn intervention_point(&self) -> InterventionPoint {
-        InterventionPoint::Backtrack
-    }
-
-    fn output_format_hint(&self) -> &str {
-        r#"{
-  "alternative_branches": [
-    {"index": 0, "score": 0.8, "reason": "why this alternative"}
-  ],
-  "direction": "backtrack",
-  "confidence": 0.0-1.0,
-  "reasoning": "why the original path failed and alternatives chosen"
-}"#
-    }
-}
-
-/// Prompt template for EVALUATE intervention point.
-///
-/// Used to assess if a node contains the answer to:
-/// - Determine relevance score
-/// - Check if answer is found
-/// - Guide further search
-#[derive(Debug, Clone)]
-pub struct EvaluatePrompt {
-    system: String,
-    template: String,
-}
-
-impl Default for EvaluatePrompt {
-    fn default() -> Self {
-        Self::with_fallback()
-    }
-}
-
-impl EvaluatePrompt {
-    /// Create a new evaluate prompt template.
-    pub fn new() -> Self {
-        Self::default()
-    }
-
-    /// Create with custom templates.
-    pub fn with_templates(system: String, template: String) -> Self {
-        Self { system, template }
-    }
-}
-
-impl PromptTemplate for EvaluatePrompt {
-    fn system_prompt(&self) -> &str {
-        &self.system
-    }
-
-    fn user_prompt_template(&self) -> &str {
-        &self.template
-    }
-
-    fn intervention_point(&self) -> InterventionPoint {
-        InterventionPoint::Evaluate
-    }
-
-    fn output_format_hint(&self) -> &str {
-        r#"{
-  "relevance_score": 0.0-1.0,
-  "is_answer": true|false,
-  "direction": "go_deeper|found_answer",
-  "confidence": 0.0-1.0,
-  "reasoning": "why this node is or isn't the answer"
-}"#
-    }
-}
-
-/// Fallback templates when file loading fails.
-pub mod fallback {
-    pub fn system_start() -> String {
-        r#"You are a document navigation assistant. Help identify the best entry points for searching a hierarchical document.
-
-CRITICAL: You MUST respond with ONLY valid JSON. No markdown code blocks, No explanation. Just the JSON object.
-
-Your response must have this EXACT structure:
-{
-  "entry_points": ["Title 1", "Title 2"],
-  "reasoning": "Brief explanation",
-  "confidence": 0.85
-}
-
-Rules:
-- entry_points: Array of node title strings (from the candidates provided)
-- reasoning: String explaining your choice
-- confidence: Number between 0.0 and 1.0 (use a number, NOT "high"/"medium"/"low")"#.to_string()
-    }
-
-    pub fn user_start() -> String {
-        r#"{context}
-
-Respond with ONLY the JSON object (no markdown, no explanation):
-{
-  "entry_points": ["list of node titles as strings"],
-  "reasoning": "your reasoning here",
-  "confidence": 0.85
-}"#
-        .to_string()
-    }
-
-    pub fn system_fork() -> String {
-        r#"You are a document navigation assistant. At each decision point, rank the candidate branches by their likelihood of containing the answer to the user's query.
-
-CRITICAL: You MUST respond with ONLY valid JSON. No markdown code blocks.
-
-Your response must have this EXACT structure:
-{
-  "ranked_candidates": [
-    {"index": 0, "score": 0.9, "reason": "explanation"}
-  ],
-  "direction": "go_deeper",
-  "confidence": 0.85,
-  "reasoning": "overall explanation"
-}
-
-Rules:
-- ranked_candidates: Array of objects with index (number), score (0.0-1.0), reason (string)
-- direction: One of "go_deeper", "explore_siblings", "backtrack", "found_answer"
-- confidence: Number between 0.0 and 1.0 (NOT a string)"#.to_string()
-    }
-
-    pub fn user_fork() -> String {
-        r#"{context}
-
-Respond with ONLY the JSON object:
-{
-  "ranked_candidates": [
-    {"index": 0, "score": 0.9, "reason": "why this candidate"}
-  ],
-  "direction": "go_deeper",
-  "confidence": 0.85,
-  "reasoning": "overall explanation"
-}"#
-        .to_string()
-    }
-
-    pub fn system_backtrack() -> String {
-        r#"You are a document navigation assistant. When a search path fails to find the answer, analyze why and suggest alternative branches to explore.
-
-CRITICAL: You MUST respond with ONLY valid JSON. No markdown code blocks.
-
-Your response must have this EXACT structure:
-{
-  "alternative_branches": [
-    {"index": 0, "score": 0.8, "reason": "explanation"}
-  ],
-  "direction": "backtrack",
-  "confidence": 0.85,
-  "reasoning": "why the original path failed"
-}"#.to_string()
-    }
-
-    pub fn user_backtrack() -> String {
-        r#"{context}
-
-Respond with ONLY the JSON object:
-{
-  "alternative_branches": [
-    {"index": 0, "score": 0.8, "reason": "why this alternative"}
-  ],
-  "direction": "backtrack",
-  "confidence": 0.85,
-  "reasoning": "why original path failed"
-}"#
-        .to_string()
-    }
-
-    pub fn system_evaluate() -> String {
-        r#"You are a document analysis assistant. Evaluate whether the current node contains the answer to the user's query.
-
-CRITICAL: You MUST respond with ONLY valid JSON. No markdown code blocks.
-
-Your response must have this EXACT structure:
-{
-  "relevance_score": 0.85,
-  "is_answer": false,
-  "direction": "go_deeper",
-  "confidence": 0.85,
-  "reasoning": "explanation"
-}"#.to_string()
-    }
-
-    pub fn user_evaluate() -> String {
-        r#"{context}
-
-Respond with ONLY the JSON object:
-{
-  "relevance_score": 0.85,
-  "is_answer": false,
-  "direction": "go_deeper",
-  "confidence": 0.85,
-  "reasoning": "explanation"
-}"#
-        .to_string()
-    }
-
-    pub fn system_locate_top3() -> String {
-        r#"You are a document navigation assistant. Your task is to locate the most relevant sections in a document hierarchy for a user's query.
-
-CRITICAL INSTRUCTIONS:
-1. Analyze the user query carefully to understand the intent
-2. Examine the provided Table of Contents (TOC) with node IDs
-3. Select the TOP 3 most relevant nodes that would contain the answer
-4. You MUST respond with ONLY valid JSON. No markdown code blocks. No explanations outside JSON.
-
-Your response must have this EXACT structure:
-{
-  "reasoning": "Brief analysis of the query and why you selected these nodes",
-  "candidates": [
-    {"node_id": <number_from_toc>, "relevance_score": 0.95, "reason": "Why this node matches the query"},
-    {"node_id": <number_from_toc>, "relevance_score": 0.80, "reason": "Why this node is also relevant"},
-    {"node_id": <number_from_toc>, "relevance_score": 0.65, "reason": "Why this node might be relevant"}
-  ]
-}
-
-Rules:
-- node_id: MUST be a number from the provided TOC (copy exactly)
-- relevance_score: Number between 0.0 and 1.0 (higher = more relevant)
-- reason: Brief explanation for each selection
-- candidates: Must have exactly 3 items, ordered by relevance (highest first)
-- If fewer than 3 relevant nodes exist, use lower scores for less relevant ones"#.to_string()
-    }
-
-    pub fn user_locate_top3() -> String {
-        r#"{context}
-
-Based on the query and TOC above, select the TOP 3 most relevant nodes.
-
-Respond with ONLY the JSON object:
-{
-  "reasoning": "Your analysis here",
-  "candidates": [
-    {"node_id": 1, "relevance_score": 0.95, "reason": "explanation"},
-    {"node_id": 2, "relevance_score": 0.80, "reason": "explanation"},
-    {"node_id": 3, "relevance_score": 0.65, "reason": "explanation"}
-  ]
-}"#
-        .to_string()
-    }
-}
-
-impl StartPrompt {
-    /// Get template with fallback.
-    pub fn with_fallback() -> Self {
-        Self {
-            system: fallback::system_start(),
-            template: fallback::user_start(),
-        }
-    }
-}
-
-impl ForkPrompt {
-    /// Get template with fallback.
-    pub fn with_fallback() -> Self {
-        Self {
-            system: fallback::system_fork(),
-            template: fallback::user_fork(),
-        }
-    }
-}
-
-impl BacktrackPrompt {
-    /// Get template with fallback.
-    pub fn with_fallback() -> Self {
-        Self {
-            system: fallback::system_backtrack(),
-            template: fallback::user_backtrack(),
-        }
-    }
-}
-
-impl EvaluatePrompt {
-    /// Get template with fallback.
-    pub fn with_fallback() -> Self {
-        Self {
-            system: fallback::system_evaluate(),
-            template: fallback::user_evaluate(),
-        }
-    }
-}
-
-impl LocateTop3Prompt {
-    /// Get template with fallback.
-    pub fn with_fallback() -> Self {
-        Self {
-            system: fallback::system_locate_top3(),
-            template: fallback::user_locate_top3(),
-        }
-    }
-}
-
-/// Prompt template for LOCATE_TOP3 intervention point.
-///
-/// Used at the start to directly locate top-3 relevant nodes from TOC:
-/// - Understand query intent
-/// - Identify top 3 most relevant nodes with confidence scores
-/// - Provide reasoning for each selection
-#[derive(Debug, Clone)]
-pub struct LocateTop3Prompt {
-    system: String,
-    template: String,
-}
-
-impl Default for LocateTop3Prompt {
-    fn default() -> Self {
-        Self::with_fallback()
-    }
-}
-
-impl LocateTop3Prompt {
-    /// Create a new locate top-3 prompt template.
-    pub fn new() -> Self {
-        Self::default()
-    }
-
-    /// Create with custom templates.
-    pub fn with_templates(system: String, template: String) -> Self {
-        Self { system, template }
-    }
-}
-
-impl PromptTemplate for LocateTop3Prompt {
-    fn system_prompt(&self) -> &str {
-        &self.system
-    }
-
-    fn user_prompt_template(&self) -> &str {
-        &self.template
-    }
-
-    fn intervention_point(&self) -> InterventionPoint {
-        InterventionPoint::Start
-    }
-
-    fn output_format_hint(&self) -> &str {
-        r#"{
-  "reasoning": "Overall analysis of the query and document structure",
-  "candidates": [
-    {"node_id": 1, "relevance_score": 0.95, "reason": "Why this node is relevant"},
-    {"node_id": 2, "relevance_score": 0.80, "reason": "Why this node is relevant"},
-    {"node_id": 3, "relevance_score": 0.65, "reason": "Why this node is relevant"}
-  ]
-}"#
-    }
-}
diff --git a/rust/src/retrieval/pilot/prompts/user_backtrack.txt b/rust/src/retrieval/pilot/prompts/user_backtrack.txt
deleted file mode 100644
index b8feab8b..00000000
--- a/rust/src/retrieval/pilot/prompts/user_backtrack.txt
+++ /dev/null
@@ -1,9 +0,0 @@
-The current search path did not find a satisfactory answer. Analyze the situation and suggest alternative branches.
-
-{context}
-
-Provide your response as a JSON object with:
-- alternative_branches: array of suggested branches with index, score, and reason
-- direction: should be "backtrack"
-- confidence: your confidence in these alternatives (0.0-1.0)
-- reasoning: explanation of why the original path failed and why these alternatives are promising
diff --git a/rust/src/retrieval/pilot/prompts/user_complexity.txt b/rust/src/retrieval/pilot/prompts/user_complexity.txt
deleted file mode 100644
index 1abaeaa0..00000000
--- a/rust/src/retrieval/pilot/prompts/user_complexity.txt
+++ /dev/null
@@ -1 +0,0 @@
-Classify this query: {query}
diff --git a/rust/src/retrieval/pilot/prompts/user_evaluate.txt b/rust/src/retrieval/pilot/prompts/user_evaluate.txt
deleted file mode 100644
index ca4bf51c..00000000
--- a/rust/src/retrieval/pilot/prompts/user_evaluate.txt
+++ /dev/null
@@ -1,10 +0,0 @@
-Evaluate whether this node's content answers the user's query.
-
-{context}
-
-Provide your response as a JSON object with:
-- relevance_score: how relevant is this content (0.0-1.0)
-- is_answer: true if this content answers the query, false otherwise
-- direction: "go_deeper" if children might have the answer, or "found_answer"
-- confidence: your confidence in this evaluation (0.0-1.0)
-- reasoning: explanation of your evaluation
diff --git a/rust/src/retrieval/pilot/prompts/user_fork.txt b/rust/src/retrieval/pilot/prompts/user_fork.txt
deleted file mode 100644
index a4d7f37e..00000000
--- a/rust/src/retrieval/pilot/prompts/user_fork.txt
+++ /dev/null
@@ -1,9 +0,0 @@
-Given the current search context and candidate branches, rank them by relevance to the user's query.
-
-{context}
-
-Provide your response as a JSON object with:
-- ranked_candidates: array of objects with index, score (0.0-1.0), and reason
-- direction: one of "go_deeper", "explore_siblings", "backtrack", or "found_answer"
-- confidence: your overall confidence (0.0-1.0)
-- reasoning: brief explanation of your decision
diff --git a/rust/src/retrieval/pilot/prompts/user_start.txt b/rust/src/retrieval/pilot/prompts/user_start.txt
deleted file mode 100644
index df048df8..00000000
--- a/rust/src/retrieval/pilot/prompts/user_start.txt
+++ /dev/null
@@ -1,17 +0,0 @@
-Analyze the following document structure and user query to identify the best entry points for search.
-
-{context}
-
-IMPORTANT: You MUST respond with ONLY a JSON object in this EXACT format:
-{
-  "entry_points": ["Title 1", "Title 2"],
-  "reasoning": "Brief explanation of why these entry points",
-  "confidence": 0.85
-}
-
-Rules:
-- entry_points: Array of strings (node titles from the candidates above)
-- reasoning: String explaining your choice
-- confidence: Number between 0.0 and 1.0 (NOT a string like "high")
-
-Do NOT use any other field names. Use "entry_points" not "selected_node" or "recommended_node".
diff --git a/rust/src/retrieval/pilot/scorer.rs b/rust/src/retrieval/pilot/scorer.rs
deleted file mode 100644
index 6bf8cedb..00000000
--- a/rust/src/retrieval/pilot/scorer.rs
+++ /dev/null
@@ -1,359 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Node scoring utilities with BM25 support.
-//!
-//! Implements the NodeScore formula: `Σ ChunkScore(n) / √(N+1)`
-//! with optional BM25 scoring for better relevance ranking.
-
-use std::collections::HashMap;
-
-use crate::document::{DocumentTree, NodeId};
-
-use crate::retrieval::scoring::bm25::Bm25Params;
-
-// Re-export extract_keywords for other modules to use
-pub use crate::retrieval::scoring::bm25::extract_keywords;
-
-/// Scoring strategy to use.
-#[derive(Debug, Clone, Copy, PartialEq, Eq, Default)]
-pub enum ScoringStrategy {
-    /// Keyword overlap only (fastest).
-    KeywordOnly,
-    /// BM25 only (better relevance).
-    #[default]
-    BM25,
-    /// Hybrid: weighted combination of keyword + BM25.
-    Hybrid,
-}
-
-/// Context for scoring calculations.
-///
-/// This wraps the BM25 engine and provides additional scoring context.
-#[derive(Debug, Clone)]
-pub struct ScoringContext {
-    /// Query terms for keyword matching.
-    pub query_terms: Vec<String>,
-    /// Weight for title matches.
-    pub title_weight: f32,
-    /// Weight for summary matches.
-    pub summary_weight: f32,
-    /// Weight for content matches.
-    pub content_weight: f32,
-    /// Depth penalty factor.
-    pub depth_penalty: f32,
-    /// Scoring strategy.
-    pub strategy: ScoringStrategy,
-    /// BM25 parameters.
-    pub bm25_params: Bm25Params,
-    /// Average document length for BM25.
-    pub avg_doc_len: f32,
-    /// Document frequency for terms (for IDF).
-    pub doc_freq: HashMap<String, usize>,
-    /// Total document count for IDF.
-    pub doc_count: usize,
-}
-
-impl Default for ScoringContext {
-    fn default() -> Self {
-        Self {
-            query_terms: Vec::new(),
-            title_weight: 2.0,
-            summary_weight: 1.5,
-            content_weight: 1.0,
-            depth_penalty: 0.1,
-            strategy: ScoringStrategy::default(),
-            bm25_params: Bm25Params::default(),
-            avg_doc_len: 100.0,
-            doc_freq: HashMap::new(),
-            doc_count: 1,
-        }
-    }
-}
-
-impl ScoringContext {
-    /// Create a new scoring context with query terms.
-    pub fn new(query: &str) -> Self {
-        Self {
-            query_terms: extract_keywords(query),
-            ..Default::default()
-        }
-    }
-
-    /// Create a context with a specific scoring strategy.
-    pub fn with_strategy(query: &str, strategy: ScoringStrategy) -> Self {
-        Self {
-            query_terms: extract_keywords(query),
-            strategy,
-            ..Default::default()
-        }
-    }
-
-    /// Set BM25 parameters.
-    pub fn with_bm25_params(mut self, params: Bm25Params) -> Self {
-        self.bm25_params = params;
-        self
-    }
-
-    /// Set document statistics for BM25.
-    pub fn with_doc_stats(
-        mut self,
-        doc_count: usize,
-        avg_doc_len: f32,
-        doc_freq: HashMap<String, usize>,
-    ) -> Self {
-        self.doc_count = doc_count.max(1);
-        self.avg_doc_len = avg_doc_len.max(1.0);
-        self.doc_freq = doc_freq;
-        self
-    }
-
-    /// Calculate term frequency in text.
-    fn term_frequency(&self, text: &str, term: &str) -> f32 {
-        text.to_lowercase().matches(term).count() as f32
-    }
-
-    /// Calculate IDF (Inverse Document Frequency) for a term.
-    fn idf(&self, term: &str) -> f32 {
-        let df = self.doc_freq.get(term).copied().unwrap_or(1) as f32;
-        let n = self.doc_count as f32;
-        ((n - df + 0.5) / (df + 0.5) + 1.0).ln()
-    }
-
-    /// Calculate BM25 score for a single field.
-    fn bm25_field_score(&self, text: &str) -> f32 {
-        if self.query_terms.is_empty() {
-            return 0.0;
-        }
-
-        let doc_len = text.split_whitespace().count() as f32;
-        let k1 = self.bm25_params.k1;
-        let b = self.bm25_params.b;
-
-        let mut score = 0.0;
-        for term in &self.query_terms {
-            let tf = self.term_frequency(text, term);
-            if tf == 0.0 {
-                continue;
-            }
-
-            let idf = self.idf(term);
-            let numerator = tf * (k1 + 1.0);
-            let denominator = tf + k1 * (1.0 - b + b * doc_len / self.avg_doc_len);
-
-            score += idf * numerator / denominator;
-        }
-
-        score
-    }
-
-    /// Calculate keyword overlap score for a text.
-    fn keyword_overlap(&self, text: &str) -> f32 {
-        if self.query_terms.is_empty() {
-            return 0.0;
-        }
-
-        let text_lower = text.to_lowercase();
-        let matches = self
-            .query_terms
-            .iter()
-            .filter(|term| text_lower.contains(term.as_str()))
-            .count();
-
-        matches as f32 / self.query_terms.len() as f32
-    }
-
-    /// Calculate a quick keyword-based score for a node.
-    pub fn quick_score(&self, tree: &DocumentTree, node_id: NodeId) -> f32 {
-        if let Some(node) = tree.get(node_id) {
-            let title_score = self.keyword_overlap(&node.title);
-            let summary_score = self.keyword_overlap(&node.summary);
-            let content_score = self.keyword_overlap(&node.content);
-
-            let base_score = (title_score * self.title_weight
-                + summary_score * self.summary_weight
-                + content_score * self.content_weight)
-                / (self.title_weight + self.summary_weight + self.content_weight);
-
-            // Apply depth penalty (prefer shallower nodes)
-            let depth_factor = 1.0 - (node.depth as f32 * self.depth_penalty).min(0.5);
-
-            base_score * depth_factor
-        } else {
-            0.0
-        }
-    }
-
-    /// Calculate BM25 score for a node.
-    pub fn bm25_score(&self, tree: &DocumentTree, node_id: NodeId) -> f32 {
-        if let Some(node) = tree.get(node_id) {
-            let title_score = self.bm25_field_score(&node.title) * self.title_weight;
-            let summary_score = self.bm25_field_score(&node.summary) * self.summary_weight;
-            let content_score = self.bm25_field_score(&node.content) * self.content_weight;
-
-            let total_score = title_score + summary_score + content_score;
-
-            // Normalize to [0, 1] range using sigmoid-like scaling
-            // This prevents over-penalization with few query terms
-            let normalized = (total_score / 3.0).tanh(); // 3.0 is a reasonable midpoint
-
-            // Apply depth penalty
-            let depth_factor = 1.0 - (node.depth as f32 * self.depth_penalty).min(0.5);
-
-            normalized * depth_factor
-        } else {
-            0.0
-        }
-    }
-
-    /// Calculate hybrid score (keyword + BM25).
-    pub fn hybrid_score(&self, tree: &DocumentTree, node_id: NodeId) -> f32 {
-        let keyword = self.quick_score(tree, node_id);
-        let bm25 = self.bm25_score(tree, node_id);
-
-        // Weighted combination: 40% keyword, 60% BM25
-        keyword * 0.4 + bm25 * 0.6
-    }
-
-    /// Calculate score based on configured strategy.
-    pub fn score(&self, tree: &DocumentTree, node_id: NodeId) -> f32 {
-        match self.strategy {
-            ScoringStrategy::KeywordOnly => self.quick_score(tree, node_id),
-            ScoringStrategy::BM25 => self.bm25_score(tree, node_id),
-            ScoringStrategy::Hybrid => self.hybrid_score(tree, node_id),
-        }
-    }
-}
-
-/// Node scorer for calculating relevance scores.
-pub struct NodeScorer {
-    /// Scoring context.
-    context: ScoringContext,
-}
-
-impl NodeScorer {
-    /// Create a new node scorer.
-    pub fn new(context: ScoringContext) -> Self {
-        Self { context }
-    }
-
-    /// Create a scorer with default context for a query.
-    pub fn for_query(query: &str) -> Self {
-        Self::new(ScoringContext::new(query))
-    }
-
-    /// Create a scorer with a specific strategy.
-    pub fn with_strategy(query: &str, strategy: ScoringStrategy) -> Self {
-        Self::new(ScoringContext::with_strategy(query, strategy))
-    }
-
-    /// Get the scoring context.
-    pub fn context(&self) -> &ScoringContext {
-        &self.context
-    }
-
-    /// Get mutable scoring context.
-    pub fn context_mut(&mut self) -> &mut ScoringContext {
-        &mut self.context
-    }
-
-    /// Score a single node.
-    pub fn score(&self, tree: &DocumentTree, node_id: NodeId) -> f32 {
-        self.context.score(tree, node_id)
-    }
-
-    /// Score multiple nodes and return sorted by score (descending).
-    pub fn score_and_sort(&self, tree: &DocumentTree, node_ids: &[NodeId]) -> Vec<(NodeId, f32)> {
-        let mut scored: Vec<_> = node_ids
-            .iter()
-            .map(|&id| (id, self.score(tree, id)))
-            .collect();
-
-        scored.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap_or(std::cmp::Ordering::Equal));
-        scored
-    }
-
-    /// Calculate chunk score for a portion of content.
-    ///
-    /// Used in the NodeScore formula.
-    pub fn chunk_score(&self, chunk: &str) -> f32 {
-        self.context.keyword_overlap(chunk)
-    }
-
-    /// Calculate the full NodeScore using the formula:
-    /// `Σ ChunkScore(n) / √(N+1)`
-    ///
-    /// Where N is the number of chunks and ChunkScore is calculated for each.
-    pub fn node_score(&self, tree: &DocumentTree, node_id: NodeId, chunk_size: usize) -> f32 {
-        if let Some(node) = tree.get(node_id) {
-            let content = format!("{} {} {}", node.title, node.summary, node.content);
-
-            // Split into chunks
-            let chunks: Vec<&str> = content
-                .as_bytes()
-                .chunks(chunk_size)
-                .map(|b| std::str::from_utf8(b).unwrap_or(""))
-                .collect();
-
-            if chunks.is_empty() {
-                return 0.0;
-            }
-
-            // Sum chunk scores
-            let total_score: f32 = chunks.iter().map(|c| self.chunk_score(c)).sum();
-
-            // Apply formula: Σ ChunkScore(n) / √(N+1)
-            let n = chunks.len() as f32;
-            total_score / (n + 1.0).sqrt()
-        } else {
-            0.0
-        }
-    }
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-
-    #[test]
-    fn test_extract_keywords() {
-        let keywords = extract_keywords("What is the architecture of vectorless?");
-        assert!(keywords.contains(&"architecture".to_string()));
-        assert!(keywords.contains(&"vectorless".to_string()));
-        assert!(!keywords.contains(&"what".to_string())); // stopword
-        assert!(!keywords.contains(&"the".to_string())); // stopword
-    }
-
-    #[test]
-    fn test_keyword_overlap() {
-        let ctx = ScoringContext::new("vectorless architecture");
-
-        let text = "Vectorless has a unique architecture for document retrieval.";
-        let score = ctx.keyword_overlap(text);
-
-        assert!(score > 0.5); // Should match both keywords
-    }
-
-    #[test]
-    fn test_bm25_scoring() {
-        let ctx = ScoringContext::with_strategy("rust cargo", ScoringStrategy::BM25);
-
-        let text = "Rust is a programming language. Cargo is its package manager. Rust Rust Rust.";
-        let score = ctx.bm25_field_score(text);
-
-        // Should have higher score due to term frequency
-        assert!(score > 0.0);
-    }
-
-    #[test]
-    fn test_scorer_creation() {
-        let scorer = NodeScorer::for_query("test query");
-        assert!(!scorer.context().query_terms.is_empty());
-    }
-
-    #[test]
-    fn test_scorer_with_strategy() {
-        let scorer = NodeScorer::with_strategy("test", ScoringStrategy::BM25);
-        assert_eq!(scorer.context().strategy, ScoringStrategy::BM25);
-    }
-}
diff --git a/rust/src/retrieval/pilot/trait.rs b/rust/src/retrieval/pilot/trait.rs
deleted file mode 100644
index fc99ee53..00000000
--- a/rust/src/retrieval/pilot/trait.rs
+++ /dev/null
@@ -1,254 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Pilot trait definition - the core interface for navigation intelligence.
-//!
-//! This module defines the [`Pilot`] trait which represents the brain of the
-//! retrieval pipeline. Implementations provide navigation guidance at key
-//! decision points during tree search.
-
-use async_trait::async_trait;
-use std::collections::HashSet;
-use std::sync::LazyLock;
-
-use crate::document::{DocumentTree, NodeId};
-
-use super::{InterventionPoint, PilotConfig, PilotDecision};
-
-/// Empty HashSet for use in SearchState::for_start
-static EMPTY_VISITED: LazyLock<HashSet<NodeId>> = LazyLock::new(HashSet::new);
-
-/// Search state passed to Pilot for decision making.
-///
-/// This struct contains all the context Pilot needs to understand
-/// the current search situation and make informed decisions.
-#[derive(Debug, Clone)]
-pub struct SearchState<'a> {
-    /// The document tree being searched.
-    pub tree: &'a DocumentTree,
-    /// The user's query string.
-    pub query: &'a str,
-    /// Current path from root to current node.
-    pub path: &'a [NodeId],
-    /// Candidate child nodes to evaluate.
-    pub candidates: &'a [NodeId],
-    /// Set of already visited nodes (to avoid cycles).
-    pub visited: &'a HashSet<NodeId>,
-    /// Current depth in the tree.
-    pub depth: usize,
-    /// Current search iteration number.
-    pub iteration: usize,
-    /// Best score found so far in this search.
-    pub best_score: f32,
-    /// Whether the search is currently backtracking.
-    pub is_backtracking: bool,
-    /// Per-step reasoning for why each node in `path` was chosen.
-    ///
-    /// Same length as `path` when present. `None` means no reasoning
-    /// history is available (e.g. first iteration, algorithm-only mode).
-    pub step_reasons: Option<&'a [Option<String>]>,
-}
-
-impl<'a> SearchState<'a> {
-    /// Create a new search state.
-    pub fn new(
-        tree: &'a DocumentTree,
-        query: &'a str,
-        path: &'a [NodeId],
-        candidates: &'a [NodeId],
-        visited: &'a HashSet<NodeId>,
-    ) -> Self {
-        Self {
-            tree,
-            query,
-            path,
-            candidates,
-            visited,
-            depth: path.len(),
-            iteration: 0,
-            best_score: 0.0,
-            is_backtracking: false,
-            step_reasons: None,
-        }
-    }
-
-    /// Create a minimal search state for start guidance.
-    pub fn for_start(tree: &'a DocumentTree, query: &'a str) -> Self {
-        Self {
-            tree,
-            query,
-            path: &[],
-            candidates: &[],
-            visited: &EMPTY_VISITED,
-            depth: 0,
-            iteration: 0,
-            best_score: 0.0,
-            is_backtracking: false,
-            step_reasons: None,
-        }
-    }
-
-    /// Check if we're at the root level.
-    pub fn is_at_root(&self) -> bool {
-        self.path.is_empty()
-    }
-
-    /// Check if there are multiple candidates (fork point).
-    pub fn is_fork_point(&self) -> bool {
-        self.candidates.len() > 1
-    }
-
-    /// Get the current node (last in path).
-    pub fn current_node(&self) -> Option<NodeId> {
-        self.path.last().copied()
-    }
-}
-
-/// Pilot trait - the brain of the retrieval pipeline.
-///
-/// Pilot provides navigation guidance at key decision points during
-/// tree search. It uses LLM intelligence for semantic understanding
-/// while allowing the algorithm to handle efficient execution.
-///
-/// # Implementation Notes
-///
-/// Implementations should:
-/// - Be cheap to construct
-/// - Handle LLM failures gracefully
-/// - Respect budget constraints
-/// - Provide explainable decisions
-///
-/// # Example
-///
-/// ```rust,ignore
-/// use vectorless::retrieval::pilot::{Pilot, SearchState, PilotDecision};
-///
-/// struct MyPilot;
-///
-/// #[async_trait]
-/// impl Pilot for MyPilot {
-///     fn name(&self) -> &str { "my_pilot" }
-///
-///     fn should_intervene(&self, state: &SearchState<'_>) -> bool {
-///         state.candidates.len() > 3
-///     }
-///
-///     async fn decide(&self, state: &SearchState<'_>) -> PilotDecision {
-///         // LLM-based decision making
-///         PilotDecision::default()
-///     }
-/// }
-/// ```
-#[async_trait]
-pub trait Pilot: Send + Sync {
-    /// Get the name of this Pilot implementation.
-    fn name(&self) -> &str;
-
-    /// Determine if Pilot should intervene at this point.
-    ///
-    /// This is the key method for controlling when LLM is called.
-    /// Implementations should consider:
-    /// - Candidate count (fork points)
-    /// - Score uncertainty
-    /// - Budget constraints
-    /// - Current depth and iteration
-    ///
-    /// Returns `true` if Pilot should be consulted for a decision.
-    fn should_intervene(&self, state: &SearchState<'_>) -> bool;
-
-    /// Make a navigation decision.
-    ///
-    /// Called when `should_intervene` returns `true`.
-    /// Implementations should:
-    /// - Build appropriate context
-    /// - Call LLM (if applicable)
-    /// - Parse and validate response
-    /// - Return a structured decision
-    ///
-    /// This method should never panic. On errors, return a default
-    /// decision that preserves the original candidate order.
-    async fn decide(&self, state: &SearchState<'_>) -> PilotDecision;
-
-    /// Provide guidance before search starts.
-    ///
-    /// Called once at the beginning of search to help determine
-    /// the starting point and initial direction.
-    ///
-    /// `start_node` is the node from which the search begins. The pilot
-    /// should evaluate that node's children (not root's children) as candidates.
-    ///
-    /// Returns `None` if no guidance is available or needed.
-    async fn guide_start(
-        &self,
-        tree: &DocumentTree,
-        query: &str,
-        start_node: NodeId,
-    ) -> Option<PilotDecision>;
-
-    /// Provide guidance during backtracking.
-    ///
-    /// Called when search needs to backtrack due to insufficient
-    /// results. Pilot can analyze the failure and suggest
-    /// alternative paths.
-    ///
-    /// Returns `None` if no guidance is available.
-    async fn guide_backtrack(&self, state: &SearchState<'_>) -> Option<PilotDecision>;
-
-    /// Binary prune — quick relevance filter for wide nodes.
-    ///
-    /// Called after P2 pre-filtering when candidates still exceed the
-    /// prune threshold. Asks the LLM a simple yes/no question per
-    /// candidate instead of full scoring. Returns the subset of
-    /// candidate node IDs deemed relevant.
-    ///
-    /// Returns `None` if no pruning guidance is available (e.g. budget
-    /// exhausted, not supported).
-    async fn binary_prune(&self, state: &SearchState<'_>) -> Option<Vec<NodeId>>;
-
-    /// Get the current configuration.
-    fn config(&self) -> &PilotConfig;
-
-    /// Check if this Pilot is actually capable of providing guidance.
-    ///
-    /// Returns `false` for NoopPilot or when budget is exhausted.
-    fn is_active(&self) -> bool {
-        true
-    }
-
-    /// Reset internal state for a new query.
-    ///
-    /// Called at the start of each new search to reset
-    /// budget counters, caches, and other per-query state.
-    fn reset(&self);
-
-    /// Downcast support for shared budget injection.
-    ///
-    /// Default implementation returns a dummy Any.
-    fn as_any(&self) -> &dyn std::any::Any {
-        // Default: no downcast support
-        &()
-    }
-}
-
-/// Extension trait for Pilot with utility methods.
-pub trait PilotExt: Pilot {
-    /// Check if Pilot can intervene given current state and budget.
-    fn can_intervene(&self, state: &SearchState<'_>) -> bool {
-        self.is_active() && self.should_intervene(state)
-    }
-
-    /// Get the current intervention point type.
-    fn intervention_point(&self, state: &SearchState<'_>) -> InterventionPoint {
-        if state.is_at_root() || state.iteration == 0 {
-            InterventionPoint::Start
-        } else if state.is_backtracking {
-            InterventionPoint::Backtrack
-        } else if state.is_fork_point() {
-            InterventionPoint::Fork
-        } else {
-            InterventionPoint::Evaluate
-        }
-    }
-}
-
-impl<T: Pilot + ?Sized> PilotExt for T {}
diff --git a/rust/src/retrieval/pipeline/budget.rs b/rust/src/retrieval/pipeline/budget.rs
deleted file mode 100644
index 91a77b2d..00000000
--- a/rust/src/retrieval/pipeline/budget.rs
+++ /dev/null
@@ -1,331 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Adaptive token budget controller for the retrieval pipeline.
-//!
-//! Unlike the Pilot-level [`BudgetController`](crate::retrieval::pilot::BudgetController)
-//! which only tracks Pilot LLM calls, this controller tracks the **entire pipeline's**
-//! token consumption across all stages and provides dynamic budget allocation decisions.
-//!
-//! # Design
-//!
-//! ```text
-//! ┌──────────────────────────────────────────────────┐
-//! │          RetrievalBudgetController                │
-//! │                                                   │
-//! │  total_budget ────────────────────────┬────────── │
-//! │  consumed (from all stages)           │ remaining │
-//! │                                       │           │
-//! │  Plan stage: initial allocation       │           │
-//! │  Search stage: check before iteration │           │
-//! │  Evaluate stage: report & decide      │           │
-//! │  Graceful degradation when low        │           │
-//! └──────────────────────────────────────────────────┘
-//! ```
-
-use std::sync::atomic::{AtomicBool, AtomicUsize, Ordering};
-
-/// Status of the budget for stage-level decision making.
-#[derive(Debug, Clone, Copy, PartialEq, Eq)]
-pub enum BudgetStatus {
-    /// Plenty of budget remaining, proceed normally.
-    Healthy,
-    /// Budget is getting low, consider cheaper strategies.
-    Constrained,
-    /// Budget is exhausted, stop LLM calls and return best results.
-    Exhausted,
-}
-
-impl BudgetStatus {
-    /// Whether LLM calls should still be made.
-    pub fn allow_llm(self) -> bool {
-        matches!(self, Self::Healthy | Self::Constrained)
-    }
-
-    /// Whether the pipeline should stop iterating and return current results.
-    pub fn should_stop(self) -> bool {
-        self == Self::Exhausted
-    }
-}
-
-/// Adaptive budget controller for the retrieval pipeline.
-///
-/// Tracks token consumption across all stages (Plan, Search, Evaluate)
-/// and provides budget-aware decisions for dynamic strategy adjustment.
-///
-/// # Example
-///
-/// ```rust,ignore
-/// let budget = RetrievalBudgetController::new(4000);
-///
-/// // In Search stage: check before starting an iteration
-/// if budget.status().should_stop() {
-///     return StageOutcome::complete(); // graceful degradation
-/// }
-///
-/// // After LLM call: record consumption
-/// budget.record_tokens(350);
-///
-/// // In Evaluate: decide based on remaining budget
-/// if budget.status() == BudgetStatus::Constrained {
-///     // Use cheaper sufficiency check
-/// }
-/// ```
-pub struct RetrievalBudgetController {
-    /// Total token budget for this retrieval operation.
-    total_budget: usize,
-    /// Tokens consumed so far (atomic for thread safety).
-    consumed: AtomicUsize,
-    /// Whether budget exhaustion has been signaled to the pipeline.
-    exhaustion_signaled: AtomicBool,
-    /// Threshold ratio for "constrained" status (e.g. 0.7 = warn at 70% used).
-    constrain_threshold: f32,
-}
-
-// Manual Clone because AtomicUsize/AtomicBool don't impl Clone.
-impl Clone for RetrievalBudgetController {
-    fn clone(&self) -> Self {
-        Self {
-            total_budget: self.total_budget,
-            consumed: AtomicUsize::new(self.consumed.load(Ordering::Relaxed)),
-            exhaustion_signaled: AtomicBool::new(self.exhaustion_signaled.load(Ordering::Relaxed)),
-            constrain_threshold: self.constrain_threshold,
-        }
-    }
-}
-
-impl RetrievalBudgetController {
-    /// Create a new budget controller with the given total token budget.
-    pub fn new(total_budget: usize) -> Self {
-        Self {
-            total_budget,
-            consumed: AtomicUsize::new(0),
-            exhaustion_signaled: AtomicBool::new(false),
-            constrain_threshold: 0.7,
-        }
-    }
-
-    /// Create with a custom constrain threshold (0.0 - 1.0).
-    ///
-    /// When consumption exceeds `total_budget * threshold`, status becomes Constrained.
-    pub fn with_constrain_threshold(mut self, threshold: f32) -> Self {
-        self.constrain_threshold = threshold.clamp(0.0, 1.0);
-        self
-    }
-
-    /// Get the current budget status.
-    pub fn status(&self) -> BudgetStatus {
-        if self.exhaustion_signaled.load(Ordering::Relaxed) {
-            return BudgetStatus::Exhausted;
-        }
-
-        let consumed = self.consumed.load(Ordering::Relaxed);
-        if consumed >= self.total_budget {
-            self.exhaustion_signaled.store(true, Ordering::Relaxed);
-            return BudgetStatus::Exhausted;
-        }
-
-        let utilization = consumed as f32 / self.total_budget as f32;
-        if utilization >= self.constrain_threshold {
-            BudgetStatus::Constrained
-        } else {
-            BudgetStatus::Healthy
-        }
-    }
-
-    /// Record tokens consumed by any stage.
-    pub fn record_tokens(&self, tokens: usize) {
-        self.consumed.fetch_add(tokens, Ordering::Relaxed);
-    }
-
-    /// Get total tokens consumed so far.
-    pub fn consumed(&self) -> usize {
-        self.consumed.load(Ordering::Relaxed)
-    }
-
-    /// Get remaining token budget.
-    pub fn remaining(&self) -> usize {
-        self.total_budget
-            .saturating_sub(self.consumed.load(Ordering::Relaxed))
-    }
-
-    /// Get total budget.
-    pub fn total_budget(&self) -> usize {
-        self.total_budget
-    }
-
-    /// Get utilization ratio (0.0 - 1.0).
-    pub fn utilization(&self) -> f32 {
-        if self.total_budget == 0 {
-            0.0
-        } else {
-            (self.consumed.load(Ordering::Relaxed) as f32 / self.total_budget as f32).min(1.0)
-        }
-    }
-
-    /// Signal that budget is exhausted (e.g. external trigger).
-    pub fn signal_exhausted(&self) {
-        self.exhaustion_signaled.store(true, Ordering::Relaxed);
-    }
-
-    /// Whether budget exhaustion has been signaled.
-    pub fn is_exhausted(&self) -> bool {
-        self.exhaustion_signaled.load(Ordering::Relaxed)
-            || self.consumed.load(Ordering::Relaxed) >= self.total_budget
-    }
-
-    /// Reset for a new query.
-    pub fn reset(&self) {
-        self.consumed.store(0, Ordering::Relaxed);
-        self.exhaustion_signaled.store(false, Ordering::Relaxed);
-    }
-
-    /// Suggest a search strategy based on budget status and query complexity.
-    ///
-    /// Returns the recommended beam width for the next search iteration.
-    pub fn suggested_beam_width(&self, current_beam: usize, iteration: usize) -> usize {
-        match self.status() {
-            BudgetStatus::Healthy => {
-                // Full power, maybe even increase beam for complex queries
-                current_beam
-            }
-            BudgetStatus::Constrained => {
-                // Reduce beam to save tokens
-                let reduced = if iteration <= 1 {
-                    current_beam
-                } else {
-                    (current_beam / 2).max(1)
-                };
-                reduced
-            }
-            BudgetStatus::Exhausted => {
-                // No more search iterations worth doing
-                0
-            }
-        }
-    }
-
-    /// Whether another search iteration is worthwhile given budget and confidence.
-    pub fn should_continue_search(&self, current_confidence: f32, iteration: usize) -> bool {
-        if self.is_exhausted() {
-            return false;
-        }
-        // Don't continue if confidence is already good
-        if current_confidence > 0.8 && iteration >= 1 {
-            return false;
-        }
-        // Don't continue if budget is constrained and we have some results
-        if self.status() == BudgetStatus::Constrained && current_confidence > 0.4 {
-            return false;
-        }
-        true
-    }
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-
-    #[test]
-    fn test_budget_healthy() {
-        let budget = RetrievalBudgetController::new(1000);
-        assert_eq!(budget.status(), BudgetStatus::Healthy);
-        assert!(!budget.is_exhausted());
-        assert_eq!(budget.remaining(), 1000);
-    }
-
-    #[test]
-    fn test_budget_constrained() {
-        let budget = RetrievalBudgetController::new(1000);
-        budget.record_tokens(750); // 75% used, above 70% threshold
-        assert_eq!(budget.status(), BudgetStatus::Constrained);
-        assert!(budget.status().allow_llm());
-    }
-
-    #[test]
-    fn test_budget_exhausted() {
-        let budget = RetrievalBudgetController::new(1000);
-        budget.record_tokens(1000);
-        assert_eq!(budget.status(), BudgetStatus::Exhausted);
-        assert!(budget.status().should_stop());
-        assert!(!budget.status().allow_llm());
-    }
-
-    #[test]
-    fn test_budget_exhausted_over() {
-        let budget = RetrievalBudgetController::new(1000);
-        budget.record_tokens(1500);
-        assert_eq!(budget.status(), BudgetStatus::Exhausted);
-    }
-
-    #[test]
-    fn test_budget_signal_exhausted() {
-        let budget = RetrievalBudgetController::new(1000);
-        budget.signal_exhausted();
-        assert_eq!(budget.status(), BudgetStatus::Exhausted);
-        assert_eq!(budget.consumed(), 0); // No tokens actually consumed
-    }
-
-    #[test]
-    fn test_budget_reset() {
-        let budget = RetrievalBudgetController::new(1000);
-        budget.record_tokens(800);
-        assert_eq!(budget.status(), BudgetStatus::Constrained);
-        budget.reset();
-        assert_eq!(budget.status(), BudgetStatus::Healthy);
-        assert_eq!(budget.consumed(), 0);
-    }
-
-    #[test]
-    fn test_suggested_beam_width() {
-        let budget = RetrievalBudgetController::new(1000);
-        // Healthy: keep current beam
-        assert_eq!(budget.suggested_beam_width(4, 0), 4);
-
-        // Constrained: first iteration keeps beam, later reduces
-        budget.record_tokens(750);
-        assert_eq!(budget.suggested_beam_width(4, 0), 4);
-        assert_eq!(budget.suggested_beam_width(4, 2), 2);
-
-        // Exhausted: zero
-        budget.record_tokens(300);
-        assert_eq!(budget.suggested_beam_width(4, 0), 0);
-    }
-
-    #[test]
-    fn test_should_continue_search() {
-        let budget = RetrievalBudgetController::new(1000);
-
-        // Fresh, low confidence: continue
-        assert!(budget.should_continue_search(0.2, 0));
-
-        // High confidence after 1 iteration: stop
-        assert!(!budget.should_continue_search(0.9, 1));
-
-        // Medium confidence, healthy budget: continue
-        assert!(budget.should_continue_search(0.5, 1));
-
-        // Constrained, decent confidence: stop
-        budget.record_tokens(750);
-        assert!(!budget.should_continue_search(0.5, 2));
-
-        // Constrained, low confidence: continue
-        assert!(budget.should_continue_search(0.2, 2));
-    }
-
-    #[test]
-    fn test_utilization() {
-        let budget = RetrievalBudgetController::new(1000);
-        assert!((budget.utilization() - 0.0).abs() < 0.01);
-
-        budget.record_tokens(500);
-        assert!((budget.utilization() - 0.5).abs() < 0.01);
-    }
-
-    #[test]
-    fn test_custom_constrain_threshold() {
-        let budget = RetrievalBudgetController::new(1000).with_constrain_threshold(0.5);
-        budget.record_tokens(500);
-        assert_eq!(budget.status(), BudgetStatus::Constrained);
-    }
-}
diff --git a/rust/src/retrieval/pipeline/context.rs b/rust/src/retrieval/pipeline/context.rs
deleted file mode 100644
index 047182e7..00000000
--- a/rust/src/retrieval/pipeline/context.rs
+++ /dev/null
@@ -1,509 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Retrieval pipeline context.
-//!
-//! Context passed between retrieval stages, accumulating data throughout
-//! the retrieval process.
-
-use std::collections::HashMap;
-use std::sync::Arc;
-use std::time::Instant;
-
-use crate::document::{DocumentTree, NodeId, ReasoningIndex, RetrievalIndex};
-use crate::graph::DocumentGraph;
-use crate::retrieval::cache::{HotNodeTracker, ReasoningCache};
-use crate::retrieval::pilot::Pilot;
-use crate::retrieval::pipeline::budget::RetrievalBudgetController;
-use crate::retrieval::types::{
-    NavigationDecision, QueryComplexity, ReasoningChain, ReasoningStep, RetrieveOptions,
-    RetrieveResponse, SearchPath, StageName, StrategyPreference, SufficiencyLevel,
-};
-
-/// Search algorithm type.
-#[derive(Debug, Clone, Copy, PartialEq, Eq)]
-pub enum SearchAlgorithm {
-    /// Pure Pilot: beam=1, Pilot picks top-1 child at each layer.
-    PurePilot,
-    /// Beam search with Pilot scoring.
-    Beam,
-    /// MCTS with Pilot priors.
-    Mcts,
-}
-
-impl Default for SearchAlgorithm {
-    fn default() -> Self {
-        Self::Beam
-    }
-}
-
-impl SearchAlgorithm {
-    /// Get algorithm name.
-    pub fn name(&self) -> &'static str {
-        match self {
-            Self::PurePilot => "pure_pilot",
-            Self::Beam => "beam",
-            Self::Mcts => "mcts",
-        }
-    }
-
-    /// Parse algorithm from config string.
-    /// Returns None for unrecognized names.
-    pub fn from_name(name: &str) -> Option<Self> {
-        match name {
-            "pure_pilot" | "greedy" => Some(Self::PurePilot),
-            "beam" => Some(Self::Beam),
-            "mcts" => Some(Self::Mcts),
-            _ => None,
-        }
-    }
-}
-
-/// Search configuration.
-#[derive(Debug, Clone)]
-pub struct SearchConfig {
-    /// Beam width for multi-path search.
-    pub beam_width: usize,
-    /// Maximum depth to search.
-    pub max_depth: usize,
-    /// Minimum relevance score.
-    pub min_score: f32,
-    /// Maximum search iterations.
-    pub max_iterations: usize,
-}
-
-impl Default for SearchConfig {
-    fn default() -> Self {
-        Self {
-            beam_width: 3,
-            max_depth: 10,
-            min_score: 0.1,
-            max_iterations: 5,
-        }
-    }
-}
-
-/// Candidate node from search.
-#[derive(Debug, Clone)]
-pub struct CandidateNode {
-    /// Node ID in the tree.
-    pub node_id: NodeId,
-    /// Relevance score (0.0 - 1.0).
-    pub score: f32,
-    /// Depth in the tree.
-    pub depth: usize,
-    /// Whether this is a leaf node.
-    pub is_leaf: bool,
-}
-
-impl CandidateNode {
-    /// Create a new candidate node.
-    pub fn new(node_id: NodeId, score: f32, depth: usize, is_leaf: bool) -> Self {
-        Self {
-            node_id,
-            score,
-            depth,
-            is_leaf,
-        }
-    }
-}
-
-/// Stage execution result.
-#[derive(Debug, Clone)]
-pub struct StageResult {
-    /// Stage name.
-    pub stage: String,
-    /// Whether successful.
-    pub success: bool,
-    /// Duration in milliseconds.
-    pub duration_ms: u64,
-    /// Optional message.
-    pub message: Option<String>,
-}
-
-impl StageResult {
-    /// Create a successful result.
-    pub fn success(stage: impl Into<String>) -> Self {
-        Self {
-            stage: stage.into(),
-            success: true,
-            duration_ms: 0,
-            message: None,
-        }
-    }
-
-    /// Create a failed result.
-    pub fn failure(stage: impl Into<String>, message: impl Into<String>) -> Self {
-        Self {
-            stage: stage.into(),
-            success: false,
-            duration_ms: 0,
-            message: Some(message.into()),
-        }
-    }
-
-    /// Set duration.
-    pub fn with_duration(mut self, ms: u64) -> Self {
-        self.duration_ms = ms;
-        self
-    }
-}
-
-/// Retrieval performance metrics.
-#[derive(Debug, Clone, Default)]
-pub struct RetrievalMetrics {
-    /// Time spent in analyze stage (ms).
-    pub analyze_time_ms: u64,
-    /// Time spent in plan stage (ms).
-    pub plan_time_ms: u64,
-    /// Time spent in search stage (ms).
-    pub search_time_ms: u64,
-    /// Time spent in evaluate stage (ms).
-    pub evaluate_time_ms: u64,
-    /// Total time (ms).
-    pub total_time_ms: u64,
-    /// Number of nodes visited.
-    pub nodes_visited: usize,
-    /// Number of LLM calls.
-    pub llm_calls: usize,
-    /// Tokens consumed.
-    pub tokens_used: usize,
-    /// Cache hits.
-    pub cache_hits: usize,
-    /// Cache misses.
-    pub cache_misses: usize,
-    /// Search iterations performed.
-    pub search_iterations: usize,
-    /// Backtrack count.
-    pub backtracks: usize,
-}
-
-impl RetrievalMetrics {
-    /// Create new metrics.
-    pub fn new() -> Self {
-        Self::default()
-    }
-
-    /// Merge another metrics into this one.
-    pub fn merge(&mut self, other: &RetrievalMetrics) {
-        self.analyze_time_ms += other.analyze_time_ms;
-        self.plan_time_ms += other.plan_time_ms;
-        self.search_time_ms += other.search_time_ms;
-        self.evaluate_time_ms += other.evaluate_time_ms;
-        self.nodes_visited += other.nodes_visited;
-        self.llm_calls += other.llm_calls;
-        self.tokens_used = other.tokens_used; // Use latest
-        self.cache_hits += other.cache_hits;
-        self.cache_misses += other.cache_misses;
-        self.search_iterations = other.search_iterations; // Use latest
-        self.backtracks += other.backtracks;
-    }
-}
-
-/// Retrieval pipeline context.
-///
-/// Passed between stages and accumulates data throughout the retrieval process.
-pub struct PipelineContext {
-    // ============ Input ============
-    /// Original query string.
-    pub query: String,
-    /// Document tree to search.
-    pub tree: Arc<DocumentTree>,
-    /// Pre-computed retrieval index for efficient operations.
-    pub retrieval_index: Option<RetrievalIndex>,
-    /// Retrieval options.
-    pub options: RetrieveOptions,
-    /// Optional Pilot for navigation guidance.
-    pub pilot: Option<Arc<dyn Pilot>>,
-    /// Adaptive token budget controller for the entire pipeline.
-    /// Shared via Arc so Pilot can read/check the same budget.
-    pub budget_controller: Arc<RetrievalBudgetController>,
-    /// Tiered reasoning cache (L1 exact, L2 path pattern, L3 strategy score).
-    pub reasoning_cache: Arc<ReasoningCache>,
-
-    /// Pre-computed reasoning index for fast path resolution.
-    pub reasoning_index: Option<Arc<ReasoningIndex>>,
-
-    /// Hot node tracker for recording retrieval frequency (session-scoped).
-    pub hot_tracker: Option<Arc<HotNodeTracker>>,
-
-    /// Cross-document relationship graph for graph-aware retrieval.
-    pub document_graph: Option<Arc<DocumentGraph>>,
-
-    // ============ Analyze Stage Output ============
-    /// Detected query complexity.
-    pub complexity: Option<QueryComplexity>,
-    /// Extracted keywords.
-    pub keywords: Vec<String>,
-    /// Target sections from ToC matching.
-    pub target_sections: Vec<String>,
-    /// Resolved structural path hints — node IDs extracted from the query
-    /// (e.g. "第3章" → NodeId of Chapter 3). Search should start from these nodes.
-    pub resolved_path_hints: Vec<(String, NodeId)>,
-    /// Decomposed sub-queries (if query was decomposed).
-    pub decomposition: Option<crate::retrieval::decompose::DecompositionResult>,
-
-    // ============ Plan Stage Output ============
-    /// Selected retrieval strategy.
-    pub selected_strategy: Option<StrategyPreference>,
-    /// Selected search algorithm.
-    pub selected_algorithm: Option<SearchAlgorithm>,
-    /// Search configuration.
-    pub search_config: Option<SearchConfig>,
-    /// Ordered fallback chain for search algorithms.
-    /// When the primary algorithm's result is insufficient, try the next.
-    pub search_fallback_chain: Vec<SearchAlgorithm>,
-
-    // ============ Search Stage Output ============
-    /// Candidate nodes from search.
-    pub candidates: Vec<CandidateNode>,
-    /// Search paths explored.
-    pub search_paths: Vec<SearchPath>,
-    /// Reasoning chain — ordered steps explaining every retrieval decision.
-    pub reasoning_chain: ReasoningChain,
-    /// Number of search iterations performed.
-    pub search_iterations: usize,
-
-    // ============ Evaluate Stage Output ============
-    /// Current sufficiency level.
-    pub sufficiency: SufficiencyLevel,
-    /// Accumulated content from candidates.
-    pub accumulated_content: String,
-    /// Estimated token count.
-    pub token_count: usize,
-    /// Fingerprint of candidate node IDs from previous evaluate call.
-    /// Used to detect stagnant loops (same candidates → same evaluation).
-    pub prev_candidate_fingerprint: Option<u64>,
-    /// Per-node content cache to avoid duplicate computation.
-    /// Populated by `aggregate_content()`, read by `build_response()`.
-    pub node_content_cache: HashMap<NodeId, String>,
-
-    // ============ Final Result ============
-    /// Final retrieval response.
-    pub result: Option<RetrieveResponse>,
-
-    // ============ Metadata ============
-    /// Stage execution results.
-    pub stage_results: HashMap<String, StageResult>,
-    /// Performance metrics.
-    pub metrics: RetrievalMetrics,
-    /// Start time of current stage.
-    pub stage_start: Option<Instant>,
-}
-
-impl PipelineContext {
-    /// Create a new retrieval context.
-    pub fn new(
-        tree: Arc<DocumentTree>,
-        query: impl Into<String>,
-        options: RetrieveOptions,
-    ) -> Self {
-        // Build retrieval index for efficient operations
-        let retrieval_index = Some(tree.build_retrieval_index());
-        let budget_controller = Arc::new(RetrievalBudgetController::new(options.max_tokens));
-
-        Self {
-            query: query.into(),
-            tree,
-            retrieval_index,
-            options,
-            pilot: None,
-            budget_controller,
-            reasoning_cache: Arc::new(ReasoningCache::new()),
-            reasoning_index: None,
-            hot_tracker: None,
-            document_graph: None,
-            complexity: None,
-            keywords: Vec::new(),
-            target_sections: Vec::new(),
-            resolved_path_hints: Vec::new(),
-            decomposition: None,
-            selected_strategy: None,
-            selected_algorithm: None,
-            search_config: None,
-            search_fallback_chain: vec![
-                SearchAlgorithm::Beam,
-                SearchAlgorithm::Mcts,
-                SearchAlgorithm::PurePilot,
-            ],
-            candidates: Vec::new(),
-            search_paths: Vec::new(),
-            reasoning_chain: ReasoningChain::new(),
-            search_iterations: 0,
-            sufficiency: SufficiencyLevel::default(),
-            accumulated_content: String::new(),
-            token_count: 0,
-            prev_candidate_fingerprint: None,
-            node_content_cache: HashMap::new(),
-            result: None,
-            stage_results: HashMap::new(),
-            metrics: RetrievalMetrics::default(),
-            stage_start: None,
-        }
-    }
-
-    /// Create a new retrieval context with Pilot.
-    pub fn with_pilot(
-        tree: Arc<DocumentTree>,
-        query: impl Into<String>,
-        options: RetrieveOptions,
-        pilot: Option<Arc<dyn Pilot>>,
-    ) -> Self {
-        let mut ctx = Self::new(tree, query, options);
-        ctx.pilot = pilot;
-        ctx
-    }
-
-    /// Set the Pilot for this context.
-    pub fn set_pilot(&mut self, pilot: Option<Arc<dyn Pilot>>) {
-        self.pilot = pilot;
-    }
-
-    /// Set the reasoning index for this retrieval context.
-    pub fn with_reasoning_index(mut self, index: ReasoningIndex) -> Self {
-        self.reasoning_index = Some(Arc::new(index));
-        self
-    }
-
-    /// Set the hot node tracker for this retrieval context.
-    pub fn with_hot_tracker(mut self, tracker: HotNodeTracker) -> Self {
-        self.hot_tracker = Some(Arc::new(tracker));
-        self
-    }
-
-    /// Set the document graph for graph-aware retrieval.
-    pub fn with_document_graph(mut self, graph: Arc<DocumentGraph>) -> Self {
-        self.document_graph = Some(graph);
-        self
-    }
-
-    /// Get the Pilot reference, if available.
-    pub fn pilot(&self) -> Option<&dyn Pilot> {
-        self.pilot.as_deref()
-    }
-
-    /// Start timing a stage.
-    pub fn start_stage(&mut self) {
-        self.stage_start = Some(Instant::now());
-    }
-
-    /// End timing and record for a stage.
-    pub fn end_stage(&mut self, stage_name: &str, success: bool, message: Option<String>) {
-        let duration_ms = self
-            .stage_start
-            .map(|s| s.elapsed().as_millis() as u64)
-            .unwrap_or(0);
-
-        let result = StageResult {
-            stage: stage_name.to_string(),
-            success,
-            duration_ms,
-            message,
-        };
-
-        // Update metrics based on stage
-        match stage_name {
-            "analyze" => self.metrics.analyze_time_ms += duration_ms,
-            "plan" => self.metrics.plan_time_ms += duration_ms,
-            "search" => self.metrics.search_time_ms += duration_ms,
-            "evaluate" => self.metrics.evaluate_time_ms += duration_ms,
-            _ => {}
-        }
-
-        self.stage_results.insert(stage_name.to_string(), result);
-        self.stage_start = None;
-    }
-
-    /// Check if we can perform more search iterations.
-    pub fn can_search_more(&self) -> bool {
-        self.search_iterations < self.options.max_iterations
-    }
-
-    /// Increment search iteration count.
-    pub fn increment_search_iteration(&mut self) {
-        self.search_iterations += 1;
-        self.metrics.search_iterations = self.search_iterations;
-    }
-
-    /// Increment backtrack count.
-    pub fn increment_backtrack(&mut self) {
-        self.metrics.backtracks += 1;
-    }
-
-    /// Compute a fingerprint of the current candidate node IDs.
-    fn candidate_fingerprint(&self) -> u64 {
-        use std::hash::{Hash, Hasher};
-        let mut hasher = std::collections::hash_map::DefaultHasher::new();
-        for c in &self.candidates {
-            format!("{:?}", c.node_id).hash(&mut hasher);
-        }
-        hasher.finish()
-    }
-
-    /// Check if candidates changed since the last call, and update the stored fingerprint.
-    /// Returns `true` if candidates are the same as before (stagnant loop detected).
-    pub fn check_candidates_stagnant(&mut self) -> bool {
-        let fp = self.candidate_fingerprint();
-        let stagnant = self.prev_candidate_fingerprint == Some(fp);
-        self.prev_candidate_fingerprint = Some(fp);
-        stagnant
-    }
-
-    /// Check if token limit is reached.
-    pub fn is_token_limit_reached(&self) -> bool {
-        self.token_count >= self.options.max_tokens
-    }
-
-    /// Calculate token utilization percentage.
-    pub fn token_utilization(&self) -> f32 {
-        if self.options.max_tokens == 0 {
-            0.0
-        } else {
-            (self.token_count as f32 / self.options.max_tokens as f32).min(1.0)
-        }
-    }
-
-    /// Append a reasoning step to the chain.
-    pub fn push_reasoning_step(&mut self, step: ReasoningStep) {
-        self.reasoning_chain.push(step);
-    }
-
-    /// Convenience: push a simple reasoning step with no node association.
-    pub fn record_reasoning(
-        &mut self,
-        stage: StageName,
-        reasoning: impl Into<String>,
-        decision: NavigationDecision,
-    ) {
-        self.push_reasoning_step(ReasoningStep {
-            stage,
-            node_id: None,
-            title: None,
-            score: 0.0,
-            decision,
-            depth: 0,
-            reasoning: reasoning.into(),
-            candidates: Vec::new(),
-            strategy_used: None,
-            llm_call: None,
-            references_followed: Vec::new(),
-        });
-    }
-
-    /// Finalize the context into a response.
-    pub fn finalize(self) -> RetrieveResponse {
-        self.result.unwrap_or_else(|| RetrieveResponse {
-            results: Vec::new(),
-            content: self.accumulated_content,
-            confidence: 0.0,
-            is_sufficient: self.sufficiency == SufficiencyLevel::Sufficient,
-            strategy_used: self
-                .selected_strategy
-                .map(|s| format!("{:?}", s))
-                .unwrap_or_else(|| "unknown".to_string()),
-            complexity: self.complexity.unwrap_or_default(),
-            reasoning_chain: self.reasoning_chain,
-            tokens_used: self.token_count,
-        })
-    }
-}
diff --git a/rust/src/retrieval/pipeline/mod.rs b/rust/src/retrieval/pipeline/mod.rs
deleted file mode 100644
index 5c84a509..00000000
--- a/rust/src/retrieval/pipeline/mod.rs
+++ /dev/null
@@ -1,50 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Retrieval pipeline infrastructure.
-//!
-//! This module provides the core pipeline infrastructure for retrieval:
-//! - [`RetrievalStage`] - Trait for pipeline stages
-//! - [`PipelineContext`] - Context passed between stages
-//! - [`StageOutcome`] - Controls pipeline flow (continue, backtrack, etc.)
-//! - [`RetrievalOrchestrator`] - Manages stage execution
-//!
-//!
-//! # Flow Control
-//!
-//! Unlike the index pipeline, retrieval stages can control flow:
-//!
-//! - `Continue` - Proceed to next stage
-//! - `Complete` - Retrieval is done, return results
-//! - `NeedMoreData` - Backtrack to search for more data
-//! - `Backtrack` - Return to a specific stage
-//! - `Skip` - Skip remaining stages
-//!
-//! # Example
-//!
-//! ```rust,ignore
-//! use vectorless::retrieval::pipeline::{RetrievalOrchestrator, RetrievalStage};
-//!
-//! let orchestrator = RetrievalOrchestrator::new()
-//!     .stage(AnalyzeStage::new())
-//!     .stage(PlanStage::new())
-//!     .stage(SearchStage::new())
-//!     .stage(EvaluateStage::new());
-//!
-//! let response = orchestrator.execute(tree, query, options).await?;
-//! ```
-
-mod budget;
-mod context;
-mod orchestrator;
-mod outcome;
-mod stage;
-
-pub use budget::{BudgetStatus, RetrievalBudgetController};
-pub use context::{CandidateNode, PipelineContext, SearchAlgorithm, SearchConfig};
-pub use orchestrator::RetrievalOrchestrator;
-pub use outcome::StageOutcome;
-pub use stage::RetrievalStage;
-
-// Re-export FailurePolicy from index for convenience
-pub use crate::index::pipeline::FailurePolicy;
diff --git a/rust/src/retrieval/pipeline/orchestrator.rs b/rust/src/retrieval/pipeline/orchestrator.rs
deleted file mode 100644
index 63e18b4e..00000000
--- a/rust/src/retrieval/pipeline/orchestrator.rs
+++ /dev/null
@@ -1,1226 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Retrieval pipeline orchestrator.
-//!
-//! Manages stage execution with support for:
-//! - Dependency-based ordering
-//! - Parallel execution of independent stages
-//! - Backtracking for incremental retrieval
-//! - Failure policies
-//! - Pilot integration for navigation guidance
-
-use std::collections::HashMap;
-use std::sync::Arc;
-use std::time::Instant;
-use tracing::{debug, error, info, warn};
-
-use crate::document::DocumentTree;
-use crate::document::ReasoningIndex;
-use crate::error::Result;
-use crate::retrieval::pilot::{Pilot, SearchState};
-// FailurePolicy is re-exported for stages
-use crate::retrieval::stream::{
-    DEFAULT_STREAM_BOUND, RetrieveEvent, RetrieveEventReceiver, RetrieveEventSender,
-};
-use crate::retrieval::types::{RetrieveOptions, RetrieveResponse};
-
-use super::context::{CandidateNode, PipelineContext};
-use super::outcome::StageOutcome;
-use super::stage::RetrievalStage;
-
-/// Stage entry with metadata.
-struct StageEntry {
-    stage: Box<dyn RetrievalStage>,
-    priority: i32,
-    depends_on: Vec<String>,
-}
-
-impl std::fmt::Debug for StageEntry {
-    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
-        f.debug_struct("StageEntry")
-            .field("name", &self.stage.name())
-            .field("priority", &self.priority)
-            .field("depends_on", &self.depends_on)
-            .finish()
-    }
-}
-
-/// Group of stages at the same dependency level (can run in parallel).
-#[derive(Debug, Clone)]
-pub struct ExecutionGroup {
-    /// Indices of stages in this group.
-    pub stage_indices: Vec<usize>,
-    /// Whether this group has multiple stages (parallelizable).
-    pub parallel: bool,
-}
-
-/// Retrieval pipeline orchestrator.
-///
-/// Coordinates stage execution with:
-/// - Dependency resolution via topological sort
-/// - Parallel execution of independent stages
-/// - Backtracking support for incremental retrieval
-/// - Configurable failure policies
-/// - Pilot integration for intelligent navigation
-///
-/// # Example
-///
-/// ```rust,ignore
-/// let orchestrator = RetrievalOrchestrator::new()
-///     .stage(AnalyzeStage::new())
-///     .stage(PlanStage::new())
-///     .stage(SearchStage::new())
-///     .stage(EvaluateStage::new())
-///     .with_pilot(pilot)
-///     .with_max_backtracks(3);
-///
-/// let response = orchestrator.execute(tree, query, options).await?;
-/// ```
-pub struct RetrievalOrchestrator {
-    stages: Vec<StageEntry>,
-    pilot: Option<Arc<dyn Pilot>>,
-    max_backtracks: usize,
-    max_total_iterations: usize,
-}
-
-impl Default for RetrievalOrchestrator {
-    fn default() -> Self {
-        Self::new()
-    }
-}
-
-impl RetrievalOrchestrator {
-    /// Create a new empty orchestrator.
-    pub fn new() -> Self {
-        Self {
-            stages: Vec::new(),
-            pilot: None,
-            max_backtracks: 5,
-            max_total_iterations: 10,
-        }
-    }
-
-    /// Add a stage to the pipeline.
-    ///
-    /// Dependencies are read from the stage's `depends_on()` method.
-    pub fn stage<S>(mut self, stage: S) -> Self
-    where
-        S: RetrievalStage + 'static,
-    {
-        let deps = stage.depends_on();
-        let priority = stage.priority();
-        self.stages.push(StageEntry {
-            stage: Box::new(stage),
-            priority,
-            depends_on: deps.into_iter().map(|s| s.to_string()).collect(),
-        });
-        self
-    }
-
-    /// Add Pilot for navigation guidance during backtracking.
-    ///
-    /// When set, the Pilot will be consulted during backtracking
-    /// to provide intelligent guidance on alternative search paths.
-    pub fn with_pilot(mut self, pilot: Arc<dyn Pilot>) -> Self {
-        self.pilot = Some(pilot);
-        self
-    }
-
-    /// Set maximum number of backtracks allowed.
-    pub fn with_max_backtracks(mut self, n: usize) -> Self {
-        self.max_backtracks = n;
-        self
-    }
-
-    /// Set maximum total iterations.
-    pub fn with_max_iterations(mut self, n: usize) -> Self {
-        self.max_total_iterations = n;
-        self
-    }
-
-    /// Resolve dependencies and return stage indices in execution order.
-    fn resolve_order(&self) -> Result<Vec<usize>> {
-        // Build name -> index map
-        let name_to_idx: HashMap<&str, usize> = self
-            .stages
-            .iter()
-            .enumerate()
-            .map(|(i, entry)| (entry.stage.name(), i))
-            .collect();
-
-        // Validate dependencies
-        for entry in &self.stages {
-            for dep in &entry.depends_on {
-                if !name_to_idx.contains_key(dep.as_str()) {
-                    return Err(crate::Error::Config(format!(
-                        "Stage '{}' depends on non-existent stage '{}'",
-                        entry.stage.name(),
-                        dep
-                    )));
-                }
-            }
-        }
-
-        // Topological sort (Kahn's algorithm)
-        let n = self.stages.len();
-        let mut in_degree: Vec<usize> = vec![0; n];
-        let mut adjacency: HashMap<usize, Vec<usize>> = HashMap::new();
-
-        for (i, entry) in self.stages.iter().enumerate() {
-            for dep in &entry.depends_on {
-                if let Some(&dep_idx) = name_to_idx.get(dep.as_str()) {
-                    adjacency.entry(dep_idx).or_default().push(i);
-                    in_degree[i] += 1;
-                }
-            }
-        }
-
-        // Collect stages with no dependencies, sorted by priority
-        let mut ready: Vec<usize> = (0..n)
-            .filter(|&i| in_degree[i] == 0) // 0 means no dependencies
-            .collect();
-        ready.sort_by_key(|&i| (self.stages[i].priority, i));
-
-        let mut result: Vec<usize> = Vec::new();
-
-        while let Some(idx) = ready.first().cloned() {
-            ready.remove(0);
-            result.push(idx);
-
-            if let Some(neighbors) = adjacency.get(&idx) {
-                for &neighbor in neighbors {
-                    in_degree[neighbor] -= 1;
-                    if in_degree[neighbor] == 0 {
-                        let entry = &self.stages[neighbor];
-                        let pos = ready
-                            .binary_search_by_key(&(entry.priority, neighbor), |&i| {
-                                (self.stages[i].priority, i)
-                            })
-                            .unwrap_or_else(|e| e);
-                        ready.insert(pos, neighbor);
-                    }
-                }
-            }
-        }
-
-        // Check for cycles
-        if result.len() != n {
-            let remaining: Vec<&str> = (0..n)
-                .filter(|i| !result.contains(i))
-                .map(|i| self.stages[i].stage.name())
-                .collect();
-            return Err(crate::Error::Config(format!(
-                "Circular dependency detected involving stages: {:?}",
-                remaining
-            )));
-        }
-
-        Ok(result)
-    }
-
-    /// Compute execution groups from resolved order.
-    fn compute_execution_groups(&self, order: &[usize]) -> Vec<ExecutionGroup> {
-        if order.is_empty() {
-            return Vec::new();
-        }
-
-        // Build name -> index map
-        let name_to_idx: HashMap<&str, usize> = self
-            .stages
-            .iter()
-            .enumerate()
-            .map(|(i, entry)| (entry.stage.name(), i))
-            .collect();
-
-        // Calculate level for each stage based on dependencies
-        let mut levels: HashMap<usize, usize> = HashMap::new();
-
-        for &idx in order {
-            let entry = &self.stages[idx];
-            let level = if entry.depends_on.is_empty() {
-                0
-            } else {
-                entry
-                    .depends_on
-                    .iter()
-                    .filter_map(|dep| {
-                        name_to_idx
-                            .get(dep.as_str())
-                            .and_then(|&dep_idx| levels.get(&dep_idx))
-                    })
-                    .max()
-                    .map(|&l| l + 1)
-                    .unwrap_or(0)
-            };
-            levels.insert(idx, level);
-        }
-
-        // Group stages by level
-        let mut level_groups: HashMap<usize, Vec<usize>> = HashMap::new();
-        for &idx in order {
-            let level = levels[&idx];
-            level_groups.entry(level).or_default().push(idx);
-        }
-
-        // Convert to execution groups
-        let max_level = *levels.values().max().unwrap_or(&0);
-        (0..=max_level)
-            .filter_map(|level| {
-                level_groups.get(&level).map(|indices| ExecutionGroup {
-                    stage_indices: indices.clone(),
-                    parallel: indices.len() > 1,
-                })
-            })
-            .collect()
-    }
-
-    /// Find the index of a stage by name.
-    fn find_stage_index(&self, name: &str) -> usize {
-        self.stages
-            .iter()
-            .enumerate()
-            .find(|(_, entry)| entry.stage.name() == name)
-            .map(|(i, _)| i)
-            .unwrap_or(0)
-    }
-
-    /// Find which group contains a stage index.
-    fn find_group_for_stage(&self, groups: &[ExecutionGroup], stage_idx: usize) -> Option<usize> {
-        groups
-            .iter()
-            .enumerate()
-            .find(|(_, g)| g.stage_indices.contains(&stage_idx))
-            .map(|(i, _)| i)
-    }
-
-    /// Execute the retrieval pipeline.
-    pub async fn execute(
-        &mut self,
-        tree: Arc<DocumentTree>,
-        query: &str,
-        options: RetrieveOptions,
-    ) -> Result<RetrieveResponse> {
-        let total_start = Instant::now();
-        info!(
-            "Starting retrieval pipeline for query: '{}' ({} stages)",
-            query,
-            self.stages.len()
-        );
-
-        // Resolve execution order
-        let order = self.resolve_order()?;
-        let stage_names: Vec<&str> = order.iter().map(|&i| self.stages[i].stage.name()).collect();
-        info!("Execution order: {:?}", stage_names);
-
-        // Compute execution groups
-        let groups = self.compute_execution_groups(&order);
-        info!(
-            "Execution groups: {} ({} parallelizable)",
-            groups.len(),
-            groups.iter().filter(|g| g.parallel).count()
-        );
-
-        // Create context with Pilot
-        let document_graph = options.document_graph.clone();
-        let mut ctx = PipelineContext::with_pilot(tree, query, options, self.pilot.clone());
-        if let Some(graph) = document_graph {
-            ctx = ctx.with_document_graph(graph);
-        }
-
-        // Share the pipeline budget with the Pilot (unified budget)
-        if let Some(ref pilot) = self.pilot {
-            if let Some(llm_pilot) = pilot
-                .as_any()
-                .downcast_ref::<crate::retrieval::pilot::LlmPilot>()
-            {
-                llm_pilot.set_pipeline_budget(ctx.budget_controller.clone());
-            }
-        }
-
-        // Track execution state
-        let mut backtrack_count = 0;
-        let mut total_iterations = 0;
-        let mut group_idx = 0;
-
-        // Execute pipeline with backtracking support
-        while group_idx < groups.len() {
-            if backtrack_count >= self.max_backtracks {
-                warn!("Max backtracks reached, completing with current results");
-                break;
-            }
-
-            if total_iterations >= self.max_total_iterations {
-                warn!("Max total iterations reached, completing");
-                break;
-            }
-
-            let group = &groups[group_idx];
-
-            // Execute stages in this group
-            for &stage_idx in &group.stage_indices {
-                let entry = &self.stages[stage_idx];
-                let stage_name = entry.stage.name();
-                let policy = entry.stage.failure_policy();
-
-                ctx.start_stage();
-                info!("Executing stage: {}", stage_name);
-
-                match entry.stage.execute(&mut ctx).await {
-                    Ok(outcome) => {
-                        ctx.end_stage(stage_name, true, None);
-                        total_iterations += 1;
-
-                        match outcome {
-                            StageOutcome::Continue => {
-                                // Continue to next stage
-                            }
-                            StageOutcome::Complete => {
-                                // Retrieval complete
-                                ctx.metrics.total_time_ms =
-                                    total_start.elapsed().as_millis() as u64;
-                                info!("Retrieval completed by stage: {}", stage_name);
-                                return Ok(ctx.finalize());
-                            }
-                            StageOutcome::NeedMoreData {
-                                additional_beam,
-                                go_deeper,
-                            } => {
-                                // Backtrack to search stage
-                                if let Some(search_idx) =
-                                    self.stages.iter().position(|e| e.stage.name() == "search")
-                                {
-                                    info!(
-                                        "Need more data, backtracking to search (beam +{}, deeper: {})",
-                                        additional_beam, go_deeper
-                                    );
-
-                                    // Consult Pilot for backtrack guidance
-                                    if let Some(ref pilot) = self.pilot {
-                                        if pilot.config().guide_at_backtrack {
-                                            // Build search state for Pilot
-                                            let visited: std::collections::HashSet<_> = ctx
-                                                .search_paths
-                                                .iter()
-                                                .flat_map(|p| p.nodes.iter().copied())
-                                                .collect();
-                                            let candidates: Vec<_> =
-                                                ctx.candidates.iter().map(|c| c.node_id).collect();
-
-                                            let state = SearchState::new(
-                                                &ctx.tree,
-                                                &ctx.query,
-                                                &[],
-                                                &candidates,
-                                                &visited,
-                                            );
-
-                                            match pilot.guide_backtrack(&state).await {
-                                                Some(guidance) => {
-                                                    debug!(
-                                                        "Pilot backtrack guidance: confidence={}, candidates={}",
-                                                        guidance.confidence,
-                                                        guidance.ranked_candidates.len()
-                                                    );
-                                                    // Update candidates with Pilot's suggestions
-                                                    if guidance.has_candidates() {
-                                                        ctx.candidates = guidance
-                                                            .ranked_candidates
-                                                            .iter()
-                                                            .map(|rc| CandidateNode {
-                                                                node_id: rc.node_id,
-                                                                score: rc.score,
-                                                                depth: 0,
-                                                                is_leaf: false,
-                                                            })
-                                                            .collect();
-                                                    }
-                                                }
-                                                None => {
-                                                    debug!("Pilot provided no backtrack guidance");
-                                                }
-                                            }
-                                        }
-                                    }
-
-                                    // Update search config
-                                    if let Some(ref mut config) = ctx.search_config {
-                                        config.beam_width += additional_beam;
-                                        if go_deeper {
-                                            config.max_depth += 1;
-                                        }
-                                    }
-
-                                    ctx.increment_backtrack();
-                                    backtrack_count += 1;
-
-                                    // Find group containing search stage
-                                    if let Some(target_group) =
-                                        self.find_group_for_stage(&groups, search_idx)
-                                    {
-                                        group_idx = target_group;
-                                        continue;
-                                    }
-                                }
-                            }
-                            StageOutcome::Backtrack {
-                                target_stage,
-                                reason,
-                            } => {
-                                info!("Backtracking to {}: {}", target_stage, reason);
-
-                                if let Some(target_idx) = self
-                                    .stages
-                                    .iter()
-                                    .position(|e| e.stage.name() == target_stage)
-                                {
-                                    // Consult Pilot for backtrack guidance if going to search
-                                    if target_stage == "search" {
-                                        if let Some(ref pilot) = self.pilot {
-                                            if pilot.config().guide_at_backtrack {
-                                                let visited: std::collections::HashSet<_> = ctx
-                                                    .search_paths
-                                                    .iter()
-                                                    .flat_map(|p| p.nodes.iter().copied())
-                                                    .collect();
-                                                let candidates: Vec<_> = ctx
-                                                    .candidates
-                                                    .iter()
-                                                    .map(|c| c.node_id)
-                                                    .collect();
-
-                                                let state = SearchState::new(
-                                                    &ctx.tree,
-                                                    &ctx.query,
-                                                    &[],
-                                                    &candidates,
-                                                    &visited,
-                                                );
-
-                                                if let Some(guidance) =
-                                                    pilot.guide_backtrack(&state).await
-                                                {
-                                                    debug!(
-                                                        "Pilot backtrack guidance for explicit backtrack: confidence={}",
-                                                        guidance.confidence
-                                                    );
-                                                    if guidance.has_candidates() {
-                                                        ctx.candidates = guidance
-                                                            .ranked_candidates
-                                                            .iter()
-                                                            .map(|rc| CandidateNode {
-                                                                node_id: rc.node_id,
-                                                                score: rc.score,
-                                                                depth: 0,
-                                                                is_leaf: false,
-                                                            })
-                                                            .collect();
-                                                    }
-                                                }
-                                            }
-                                        }
-                                    }
-
-                                    ctx.increment_backtrack();
-                                    backtrack_count += 1;
-
-                                    if let Some(target_group) =
-                                        self.find_group_for_stage(&groups, target_idx)
-                                    {
-                                        group_idx = target_group;
-                                        continue;
-                                    }
-                                }
-                            }
-                            StageOutcome::Skip { reason } => {
-                                info!("Skipping remaining stages: {}", reason);
-                                ctx.metrics.total_time_ms =
-                                    total_start.elapsed().as_millis() as u64;
-                                return Ok(ctx.finalize());
-                            }
-                        }
-                    }
-                    Err(e) => {
-                        ctx.end_stage(stage_name, false, Some(e.to_string()));
-
-                        if policy.allows_continuation() {
-                            warn!(
-                                "Stage {} failed but policy allows continuation: {}",
-                                stage_name, e
-                            );
-                        } else {
-                            error!("Stage {} failed: {}", stage_name, e);
-                            return Err(e);
-                        }
-                    }
-                }
-            }
-
-            group_idx += 1;
-        }
-
-        ctx.metrics.total_time_ms = total_start.elapsed().as_millis() as u64;
-        info!(
-            "Retrieval completed in {}ms ({} iterations, {} backtracks)",
-            ctx.metrics.total_time_ms, total_iterations, backtrack_count
-        );
-
-        Ok(ctx.finalize())
-    }
-
-    /// Execute the retrieval pipeline with a pre-computed reasoning index.
-    ///
-    /// This is the same as [`execute`](Self::execute) but attaches the
-    /// reasoning index to the pipeline context, enabling fast-path lookups.
-    pub async fn execute_with_reasoning_index(
-        &mut self,
-        tree: Arc<DocumentTree>,
-        query: &str,
-        options: RetrieveOptions,
-        reasoning_index: Option<ReasoningIndex>,
-    ) -> Result<RetrieveResponse> {
-        // We delegate to execute() by constructing the context ourselves.
-        // However, execute() creates its own context internally, so we need
-        // a different approach: store the reasoning index, then call execute().
-        //
-        // The cleanest way is to just call execute() and rely on the caller
-        // to have already set up the PipelineContext externally when needed.
-        // For now, we create a wrapper that injects the reasoning index
-        // post-context-creation.
-        //
-        // Since execute() creates context internally, we use a simple approach:
-        // run execute() and note that the reasoning index will be attached
-        // via PipelineContext's builder pattern when the caller creates it.
-        //
-        // This method exists as a convenience API. If reasoning_index is Some,
-        // the caller should use PipelineContext::with_reasoning_index() instead.
-
-        // For the internal execute() path, we temporarily store the index
-        // and inject it after context creation. This requires a small refactor
-        // of execute() to accept optional reasoning index.
-
-        // Simple implementation: delegate to a modified execute flow.
-        let total_start = Instant::now();
-        info!(
-            "Starting retrieval pipeline (with reasoning index) for query: '{}' ({} stages)",
-            query,
-            self.stages.len()
-        );
-
-        let order = self.resolve_order()?;
-        let stage_names: Vec<&str> = order.iter().map(|&i| self.stages[i].stage.name()).collect();
-        info!("Execution order: {:?}", stage_names);
-
-        let groups = self.compute_execution_groups(&order);
-
-        // Create context with Pilot and reasoning index
-        let document_graph = options.document_graph.clone();
-        let mut ctx = PipelineContext::with_pilot(tree, query, options, self.pilot.clone());
-        if let Some(ri) = reasoning_index {
-            ctx = ctx.with_reasoning_index(ri);
-        }
-        if let Some(graph) = document_graph {
-            ctx = ctx.with_document_graph(graph);
-        }
-
-        // Share the pipeline budget with the Pilot (unified budget)
-        if let Some(ref pilot) = self.pilot {
-            if let Some(llm_pilot) = pilot
-                .as_any()
-                .downcast_ref::<crate::retrieval::pilot::LlmPilot>()
-            {
-                llm_pilot.set_pipeline_budget(ctx.budget_controller.clone());
-            }
-        }
-
-        let mut backtrack_count = 0;
-        let mut total_iterations = 0;
-        let mut group_idx = 0;
-
-        while group_idx < groups.len() {
-            if backtrack_count >= self.max_backtracks {
-                warn!("Max backtracks reached, completing with current results");
-                break;
-            }
-
-            if total_iterations >= self.max_total_iterations {
-                warn!("Max total iterations reached, completing");
-                break;
-            }
-
-            let group = &groups[group_idx];
-
-            for &stage_idx in &group.stage_indices {
-                let entry = &self.stages[stage_idx];
-                let stage_name = entry.stage.name();
-                let policy = entry.stage.failure_policy();
-
-                ctx.start_stage();
-                info!("Executing stage: {}", stage_name);
-
-                match entry.stage.execute(&mut ctx).await {
-                    Ok(outcome) => {
-                        ctx.end_stage(stage_name, true, None);
-                        total_iterations += 1;
-
-                        match outcome {
-                            StageOutcome::Continue => {}
-                            StageOutcome::Complete => {
-                                ctx.metrics.total_time_ms =
-                                    total_start.elapsed().as_millis() as u64;
-                                info!("Retrieval completed by stage: {}", stage_name);
-                                return Ok(ctx.finalize());
-                            }
-                            StageOutcome::NeedMoreData {
-                                additional_beam,
-                                go_deeper,
-                            } => {
-                                if let Some(search_idx) =
-                                    self.stages.iter().position(|e| e.stage.name() == "search")
-                                {
-                                    info!(
-                                        "Need more data, backtracking to search (beam +{}, deeper: {})",
-                                        additional_beam, go_deeper
-                                    );
-
-                                    if let Some(ref pilot) = self.pilot {
-                                        if pilot.config().guide_at_backtrack {
-                                            let visited: std::collections::HashSet<_> = ctx
-                                                .search_paths
-                                                .iter()
-                                                .flat_map(|p| p.nodes.iter().copied())
-                                                .collect();
-                                            let candidates: Vec<_> =
-                                                ctx.candidates.iter().map(|c| c.node_id).collect();
-
-                                            let state = SearchState::new(
-                                                &ctx.tree,
-                                                &ctx.query,
-                                                &[],
-                                                &candidates,
-                                                &visited,
-                                            );
-
-                                            match pilot.guide_backtrack(&state).await {
-                                                Some(guidance) => {
-                                                    debug!(
-                                                        "Pilot backtrack guidance: confidence={}, candidates={}",
-                                                        guidance.confidence,
-                                                        guidance.ranked_candidates.len()
-                                                    );
-                                                    if guidance.has_candidates() {
-                                                        ctx.candidates = guidance
-                                                            .ranked_candidates
-                                                            .iter()
-                                                            .map(|rc| CandidateNode {
-                                                                node_id: rc.node_id,
-                                                                score: rc.score,
-                                                                depth: 0,
-                                                                is_leaf: false,
-                                                            })
-                                                            .collect();
-                                                    }
-                                                }
-                                                None => {
-                                                    debug!("Pilot provided no backtrack guidance");
-                                                }
-                                            }
-                                        }
-                                    }
-
-                                    if let Some(ref mut config) = ctx.search_config {
-                                        config.beam_width += additional_beam;
-                                        if go_deeper {
-                                            config.max_depth += 1;
-                                        }
-                                    }
-
-                                    ctx.increment_backtrack();
-                                    backtrack_count += 1;
-
-                                    if let Some(target_group) =
-                                        self.find_group_for_stage(&groups, search_idx)
-                                    {
-                                        group_idx = target_group;
-                                        continue;
-                                    }
-                                }
-                            }
-                            StageOutcome::Backtrack {
-                                target_stage,
-                                reason,
-                            } => {
-                                info!("Backtracking to {}: {}", target_stage, reason);
-
-                                if let Some(target_idx) = self
-                                    .stages
-                                    .iter()
-                                    .position(|e| e.stage.name() == target_stage)
-                                {
-                                    if target_stage == "search" {
-                                        if let Some(ref pilot) = self.pilot {
-                                            if pilot.config().guide_at_backtrack {
-                                                let visited: std::collections::HashSet<_> = ctx
-                                                    .search_paths
-                                                    .iter()
-                                                    .flat_map(|p| p.nodes.iter().copied())
-                                                    .collect();
-                                                let candidates: Vec<_> = ctx
-                                                    .candidates
-                                                    .iter()
-                                                    .map(|c| c.node_id)
-                                                    .collect();
-
-                                                let state = SearchState::new(
-                                                    &ctx.tree,
-                                                    &ctx.query,
-                                                    &[],
-                                                    &candidates,
-                                                    &visited,
-                                                );
-
-                                                if let Some(guidance) =
-                                                    pilot.guide_backtrack(&state).await
-                                                {
-                                                    debug!(
-                                                        "Pilot backtrack guidance for explicit backtrack: confidence={}",
-                                                        guidance.confidence
-                                                    );
-                                                    if guidance.has_candidates() {
-                                                        ctx.candidates = guidance
-                                                            .ranked_candidates
-                                                            .iter()
-                                                            .map(|rc| CandidateNode {
-                                                                node_id: rc.node_id,
-                                                                score: rc.score,
-                                                                depth: 0,
-                                                                is_leaf: false,
-                                                            })
-                                                            .collect();
-                                                    }
-                                                }
-                                            }
-                                        }
-                                    }
-
-                                    ctx.increment_backtrack();
-                                    backtrack_count += 1;
-
-                                    if let Some(target_group) =
-                                        self.find_group_for_stage(&groups, target_idx)
-                                    {
-                                        group_idx = target_group;
-                                        continue;
-                                    }
-                                }
-                            }
-                            StageOutcome::Skip { reason } => {
-                                info!("Skipping remaining stages: {}", reason);
-                                ctx.metrics.total_time_ms =
-                                    total_start.elapsed().as_millis() as u64;
-                                return Ok(ctx.finalize());
-                            }
-                        }
-                    }
-                    Err(e) => {
-                        ctx.end_stage(stage_name, false, Some(e.to_string()));
-
-                        if policy.allows_continuation() {
-                            warn!(
-                                "Stage {} failed but policy allows continuation: {}",
-                                stage_name, e
-                            );
-                        } else {
-                            error!("Stage {} failed: {}", stage_name, e);
-                            return Err(e);
-                        }
-                    }
-                }
-            }
-
-            group_idx += 1;
-        }
-
-        ctx.metrics.total_time_ms = total_start.elapsed().as_millis() as u64;
-        info!(
-            "Retrieval completed in {}ms ({} iterations, {} backtracks)",
-            ctx.metrics.total_time_ms, total_iterations, backtrack_count
-        );
-
-        Ok(ctx.finalize())
-    }
-
-    /// Execute the retrieval pipeline with streaming events.
-    ///
-    /// Consumes the orchestrator and spawns a background task that runs the
-    /// pipeline. The caller receives a channel of [`RetrieveEvent`]s that
-    /// fire at each stage boundary. The stream always terminates with either
-    /// [`Completed`](RetrieveEvent::Completed) or
-    /// [`Error`](RetrieveEvent::Error).
-    ///
-    /// The existing [`execute()`](Self::execute) method is **not** affected.
-    ///
-    /// # Example
-    ///
-    /// ```rust,ignore
-    /// let (handle, mut rx) = orchestrator.execute_streaming(tree, query, options);
-    ///
-    /// while let Some(event) = rx.recv().await {
-    ///     match event {
-    ///         RetrieveEvent::StageCompleted { stage, .. } => println!("{stage} done"),
-    ///         RetrieveEvent::Completed { response } => break,
-    ///         RetrieveEvent::Error { message } => { eprintln!("{message}"); break; }
-    ///         _ => {}
-    ///     }
-    /// }
-    /// let _ = handle.await;
-    /// ```
-    pub fn execute_streaming(
-        mut self,
-        tree: Arc<DocumentTree>,
-        query: &str,
-        options: RetrieveOptions,
-    ) -> (tokio::task::JoinHandle<()>, RetrieveEventReceiver) {
-        let (tx, rx) = tokio::sync::mpsc::channel(DEFAULT_STREAM_BOUND);
-        let query_owned = query.to_string();
-
-        let handle = tokio::spawn(async move {
-            if let Err(e) = self.run_streaming(tree, &query_owned, options, &tx).await {
-                let _ = tx
-                    .send(RetrieveEvent::Error {
-                        message: e.to_string(),
-                    })
-                    .await;
-            }
-        });
-
-        (handle, rx)
-    }
-
-    /// Internal streaming pipeline execution.
-    async fn run_streaming(
-        &mut self,
-        tree: Arc<DocumentTree>,
-        query: &str,
-        options: RetrieveOptions,
-        tx: &RetrieveEventSender,
-    ) -> Result<()> {
-        let total_start = Instant::now();
-
-        let _ = tx
-            .send(RetrieveEvent::Started {
-                query: query.to_string(),
-                strategy: format!("{:?}", options.strategy),
-            })
-            .await;
-
-        info!(
-            "Starting streaming retrieval pipeline for query: '{}' ({} stages)",
-            query,
-            self.stages.len()
-        );
-
-        let order = self.resolve_order()?;
-        let groups = self.compute_execution_groups(&order);
-        let document_graph = options.document_graph.clone();
-        let mut ctx = PipelineContext::with_pilot(tree, query, options, self.pilot.clone());
-        if let Some(graph) = document_graph {
-            ctx = ctx.with_document_graph(graph);
-        }
-
-        // Share the pipeline budget with the Pilot (unified budget)
-        if let Some(ref pilot) = self.pilot {
-            if let Some(llm_pilot) = pilot
-                .as_any()
-                .downcast_ref::<crate::retrieval::pilot::LlmPilot>()
-            {
-                llm_pilot.set_pipeline_budget(ctx.budget_controller.clone());
-            }
-        }
-
-        let mut backtrack_count = 0;
-        let mut total_iterations = 0;
-        let mut group_idx = 0;
-
-        while group_idx < groups.len() {
-            if backtrack_count >= self.max_backtracks {
-                warn!("Max backtracks reached, completing with current results");
-                break;
-            }
-            if total_iterations >= self.max_total_iterations {
-                warn!("Max total iterations reached, completing");
-                break;
-            }
-
-            let group = &groups[group_idx];
-
-            for &stage_idx in &group.stage_indices {
-                let entry = &self.stages[stage_idx];
-                let stage_name = entry.stage.name();
-                let policy = entry.stage.failure_policy();
-
-                let stage_start = Instant::now();
-                ctx.start_stage();
-                info!("Executing stage: {}", stage_name);
-
-                match entry.stage.execute(&mut ctx).await {
-                    Ok(outcome) => {
-                        let elapsed = stage_start.elapsed().as_millis() as u64;
-                        ctx.end_stage(stage_name, true, None);
-                        total_iterations += 1;
-
-                        let _ = tx
-                            .send(RetrieveEvent::StageCompleted {
-                                stage: stage_name.to_string(),
-                                elapsed_ms: elapsed,
-                            })
-                            .await;
-
-                        match outcome {
-                            StageOutcome::Continue => {}
-                            StageOutcome::Complete => {
-                                ctx.metrics.total_time_ms =
-                                    total_start.elapsed().as_millis() as u64;
-                                info!("Retrieval completed by stage: {}", stage_name);
-                                let response = ctx.finalize();
-                                let _ = tx.send(RetrieveEvent::Completed { response }).await;
-                                return Ok(());
-                            }
-                            StageOutcome::NeedMoreData {
-                                additional_beam,
-                                go_deeper,
-                            } => {
-                                if let Some(search_idx) =
-                                    self.stages.iter().position(|e| e.stage.name() == "search")
-                                {
-                                    info!(
-                                        "Need more data, backtracking to search (beam +{}, deeper: {})",
-                                        additional_beam, go_deeper
-                                    );
-
-                                    let _ = tx
-                                        .send(RetrieveEvent::Backtracking {
-                                            from: stage_name.to_string(),
-                                            to: "search".to_string(),
-                                            reason: format!(
-                                                "NeedMoreData: beam +{}, deeper: {}",
-                                                additional_beam, go_deeper
-                                            ),
-                                        })
-                                        .await;
-
-                                    // Consult Pilot
-                                    if let Some(ref pilot) = self.pilot {
-                                        if pilot.config().guide_at_backtrack {
-                                            let visited: std::collections::HashSet<_> = ctx
-                                                .search_paths
-                                                .iter()
-                                                .flat_map(|p| p.nodes.iter().copied())
-                                                .collect();
-                                            let candidates: Vec<_> =
-                                                ctx.candidates.iter().map(|c| c.node_id).collect();
-
-                                            let state = SearchState::new(
-                                                &ctx.tree,
-                                                &ctx.query,
-                                                &[],
-                                                &candidates,
-                                                &visited,
-                                            );
-
-                                            match pilot.guide_backtrack(&state).await {
-                                                Some(guidance) => {
-                                                    debug!(
-                                                        "Pilot backtrack guidance: confidence={}, candidates={}",
-                                                        guidance.confidence,
-                                                        guidance.ranked_candidates.len()
-                                                    );
-                                                    if guidance.has_candidates() {
-                                                        ctx.candidates = guidance
-                                                            .ranked_candidates
-                                                            .iter()
-                                                            .map(|rc| CandidateNode {
-                                                                node_id: rc.node_id,
-                                                                score: rc.score,
-                                                                depth: 0,
-                                                                is_leaf: false,
-                                                            })
-                                                            .collect();
-                                                    }
-                                                }
-                                                None => {
-                                                    debug!("Pilot provided no backtrack guidance");
-                                                }
-                                            }
-                                        }
-                                    }
-
-                                    if let Some(ref mut config) = ctx.search_config {
-                                        config.beam_width += additional_beam;
-                                        if go_deeper {
-                                            config.max_depth += 1;
-                                        }
-                                    }
-
-                                    ctx.increment_backtrack();
-                                    backtrack_count += 1;
-
-                                    if let Some(target_group) =
-                                        self.find_group_for_stage(&groups, search_idx)
-                                    {
-                                        group_idx = target_group;
-                                        continue;
-                                    }
-                                }
-                            }
-                            StageOutcome::Backtrack {
-                                target_stage,
-                                reason,
-                            } => {
-                                info!("Backtracking to {}: {}", target_stage, reason);
-
-                                let _ = tx
-                                    .send(RetrieveEvent::Backtracking {
-                                        from: stage_name.to_string(),
-                                        to: target_stage.clone(),
-                                        reason: reason.clone(),
-                                    })
-                                    .await;
-
-                                if let Some(target_idx) = self
-                                    .stages
-                                    .iter()
-                                    .position(|e| e.stage.name() == target_stage)
-                                {
-                                    if target_stage == "search" {
-                                        if let Some(ref pilot) = self.pilot {
-                                            if pilot.config().guide_at_backtrack {
-                                                let visited: std::collections::HashSet<_> = ctx
-                                                    .search_paths
-                                                    .iter()
-                                                    .flat_map(|p| p.nodes.iter().copied())
-                                                    .collect();
-                                                let candidates: Vec<_> = ctx
-                                                    .candidates
-                                                    .iter()
-                                                    .map(|c| c.node_id)
-                                                    .collect();
-
-                                                let state = SearchState::new(
-                                                    &ctx.tree,
-                                                    &ctx.query,
-                                                    &[],
-                                                    &candidates,
-                                                    &visited,
-                                                );
-
-                                                if let Some(guidance) =
-                                                    pilot.guide_backtrack(&state).await
-                                                {
-                                                    debug!(
-                                                        "Pilot backtrack guidance for explicit backtrack: confidence={}",
-                                                        guidance.confidence
-                                                    );
-                                                    if guidance.has_candidates() {
-                                                        ctx.candidates = guidance
-                                                            .ranked_candidates
-                                                            .iter()
-                                                            .map(|rc| CandidateNode {
-                                                                node_id: rc.node_id,
-                                                                score: rc.score,
-                                                                depth: 0,
-                                                                is_leaf: false,
-                                                            })
-                                                            .collect();
-                                                    }
-                                                }
-                                            }
-                                        }
-                                    }
-
-                                    ctx.increment_backtrack();
-                                    backtrack_count += 1;
-
-                                    if let Some(target_group) =
-                                        self.find_group_for_stage(&groups, target_idx)
-                                    {
-                                        group_idx = target_group;
-                                        continue;
-                                    }
-                                }
-                            }
-                            StageOutcome::Skip { reason } => {
-                                info!("Skipping remaining stages: {}", reason);
-                                ctx.metrics.total_time_ms =
-                                    total_start.elapsed().as_millis() as u64;
-                                let response = ctx.finalize();
-                                let _ = tx.send(RetrieveEvent::Completed { response }).await;
-                                return Ok(());
-                            }
-                        }
-                    }
-                    Err(e) => {
-                        ctx.end_stage(stage_name, false, Some(e.to_string()));
-
-                        if policy.allows_continuation() {
-                            warn!(
-                                "Stage {} failed but policy allows continuation: {}",
-                                stage_name, e
-                            );
-                        } else {
-                            error!("Stage {} failed: {}", stage_name, e);
-                            let _ = tx
-                                .send(RetrieveEvent::Error {
-                                    message: e.to_string(),
-                                })
-                                .await;
-                            return Err(e);
-                        }
-                    }
-                }
-            }
-
-            group_idx += 1;
-        }
-
-        ctx.metrics.total_time_ms = total_start.elapsed().as_millis() as u64;
-        info!(
-            "Streaming retrieval completed in {}ms ({} iterations, {} backtracks)",
-            ctx.metrics.total_time_ms, total_iterations, backtrack_count
-        );
-
-        let response = ctx.finalize();
-        let _ = tx.send(RetrieveEvent::Completed { response }).await;
-        Ok(())
-    }
-
-    /// Get list of stage names in execution order.
-    pub fn stage_names(&self) -> Result<Vec<&str>> {
-        let order = self.resolve_order()?;
-        Ok(order.iter().map(|&i| self.stages[i].stage.name()).collect())
-    }
-
-    /// Get execution groups for visualization.
-    pub fn get_execution_groups(&self) -> Result<Vec<ExecutionGroup>> {
-        let order = self.resolve_order()?;
-        Ok(self.compute_execution_groups(&order))
-    }
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-
-    #[test]
-    fn test_orchestrator_creation() {
-        let orchestrator = RetrievalOrchestrator::new();
-        assert!(orchestrator.stages.is_empty());
-    }
-
-    #[test]
-    fn test_stage_names_empty() {
-        let orchestrator = RetrievalOrchestrator::new();
-        let names = orchestrator.stage_names().unwrap();
-        assert!(names.is_empty());
-    }
-}
diff --git a/rust/src/retrieval/pipeline/outcome.rs b/rust/src/retrieval/pipeline/outcome.rs
deleted file mode 100644
index d005b61f..00000000
--- a/rust/src/retrieval/pipeline/outcome.rs
+++ /dev/null
@@ -1,140 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Stage execution outcomes for retrieval pipeline.
-//!
-//! The [`StageOutcome`] enum controls the flow of the retrieval pipeline,
-//! allowing stages to continue, complete, request more data, or backtrack.
-
-/// Result of a stage execution, controlling pipeline flow.
-#[derive(Debug, Clone)]
-pub enum StageOutcome {
-    /// Stage completed successfully, continue to next stage.
-    Continue,
-
-    /// Entire retrieval is complete, return results.
-    Complete,
-
-    /// Need more data, go back to Search stage for another iteration.
-    ///
-    /// This enables incremental retrieval where the Evaluate stage can
-    /// request additional search rounds if current results are insufficient.
-    NeedMoreData {
-        /// Additional beam width to add for next search iteration.
-        additional_beam: usize,
-        /// Whether to search deeper in the tree.
-        go_deeper: bool,
-    },
-
-    /// Backtrack to a specific stage for re-planning.
-    ///
-    /// Used when current strategy isn't working and a different approach
-    /// is needed.
-    Backtrack {
-        /// Target stage name to backtrack to.
-        target_stage: String,
-        /// Reason for backtracking.
-        reason: String,
-    },
-
-    /// Skip remaining stages and return current results.
-    ///
-    /// Used when results are "good enough" or when further processing
-    /// wouldn't improve the outcome.
-    Skip {
-        /// Reason for skipping.
-        reason: String,
-    },
-}
-
-impl StageOutcome {
-    /// Create a Continue outcome.
-    pub fn cont() -> Self {
-        Self::Continue
-    }
-
-    /// Create a Complete outcome.
-    pub fn complete() -> Self {
-        Self::Complete
-    }
-
-    /// Create a NeedMoreData outcome.
-    pub fn need_more(additional_beam: usize, go_deeper: bool) -> Self {
-        Self::NeedMoreData {
-            additional_beam,
-            go_deeper,
-        }
-    }
-
-    /// Create a Backtrack outcome.
-    pub fn backtrack(target: impl Into<String>, reason: impl Into<String>) -> Self {
-        Self::Backtrack {
-            target_stage: target.into(),
-            reason: reason.into(),
-        }
-    }
-
-    /// Create a Skip outcome.
-    pub fn skip(reason: impl Into<String>) -> Self {
-        Self::Skip {
-            reason: reason.into(),
-        }
-    }
-
-    /// Check if this outcome indicates pipeline completion.
-    pub fn is_complete(&self) -> bool {
-        matches!(self, Self::Complete | Self::Skip { .. })
-    }
-
-    /// Check if this outcome requires backtracking.
-    pub fn needs_backtrack(&self) -> bool {
-        matches!(self, Self::Backtrack { .. } | Self::NeedMoreData { .. })
-    }
-
-    /// Get the target stage for backtracking, if any.
-    pub fn backtrack_target(&self) -> Option<&str> {
-        match self {
-            Self::Backtrack { target_stage, .. } => Some(target_stage),
-            Self::NeedMoreData { .. } => Some("search"),
-            _ => None,
-        }
-    }
-}
-
-impl Default for StageOutcome {
-    fn default() -> Self {
-        Self::Continue
-    }
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-
-    #[test]
-    fn test_outcome_constructors() {
-        assert!(matches!(StageOutcome::cont(), StageOutcome::Continue));
-        assert!(matches!(StageOutcome::complete(), StageOutcome::Complete));
-    }
-
-    #[test]
-    fn test_need_more() {
-        let outcome = StageOutcome::need_more(2, true);
-        assert!(outcome.needs_backtrack());
-        assert_eq!(outcome.backtrack_target(), Some("search"));
-    }
-
-    #[test]
-    fn test_backtrack() {
-        let outcome = StageOutcome::backtrack("plan", "strategy not working");
-        assert!(outcome.needs_backtrack());
-        assert_eq!(outcome.backtrack_target(), Some("plan"));
-    }
-
-    #[test]
-    fn test_is_complete() {
-        assert!(StageOutcome::complete().is_complete());
-        assert!(StageOutcome::skip("reason").is_complete());
-        assert!(!StageOutcome::cont().is_complete());
-    }
-}
diff --git a/rust/src/retrieval/pipeline/stage.rs b/rust/src/retrieval/pipeline/stage.rs
deleted file mode 100644
index 67736380..00000000
--- a/rust/src/retrieval/pipeline/stage.rs
+++ /dev/null
@@ -1,113 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Retrieval stage trait definition.
-//!
-//! Defines the [`RetrievalStage`] trait for pipeline stages, similar to
-//! [`IndexStage`](crate::index::stages::IndexStage) but with additional
-//! capabilities for backtracking and incremental retrieval.
-
-use async_trait::async_trait;
-
-use crate::error::Result;
-use crate::index::pipeline::FailurePolicy;
-
-use super::context::PipelineContext;
-use super::outcome::StageOutcome;
-
-/// Retrieval pipeline stage.
-///
-/// Each stage represents a discrete step in the retrieval process.
-/// Unlike indexing stages, retrieval stages can trigger backtracking
-/// or request additional data collection.
-///
-/// # Stage Lifecycle
-///
-/// 1. Stage is registered with the orchestrator
-/// 2. Dependencies are resolved and execution order is determined
-/// 3. `execute()` is called with the shared context
-/// 4. Returns `StageOutcome` to control pipeline flow
-///
-/// # Example
-///
-/// ```rust,ignore
-/// struct MyStage;
-///
-/// #[async_trait]
-/// impl RetrievalStage for MyStage {
-///     fn name(&self) -> &str { "my_stage" }
-///
-///     fn depends_on(&self) -> Vec<&'static str> {
-///         vec!["analyze"]
-///     }
-///
-///     async fn execute(&self, ctx: &mut PipelineContext) -> Result<StageOutcome> {
-///         // Process the context...
-///         Ok(StageOutcome::cont())
-///     }
-/// }
-/// ```
-#[async_trait]
-pub trait RetrievalStage: Send + Sync {
-    /// Stage name (must be unique within pipeline).
-    fn name(&self) -> &str;
-
-    /// Execute the stage.
-    ///
-    /// This method receives a mutable reference to the shared context,
-    /// allowing stages to read from and write to it.
-    ///
-    /// Returns a `StageOutcome` to control pipeline flow:
-    /// - `Continue`: Proceed to next stage
-    /// - `Complete`: Retrieval is done, return results
-    /// - `NeedMoreData`: Go back to search for more data
-    /// - `Backtrack`: Return to a specific stage
-    /// - `Skip`: Skip remaining stages
-    async fn execute(&self, ctx: &mut PipelineContext) -> Result<StageOutcome>;
-
-    /// Names of stages this stage depends on.
-    ///
-    /// Dependencies are validated during pipeline construction.
-    /// A stage will only execute after all its dependencies have completed.
-    fn depends_on(&self) -> Vec<&'static str> {
-        Vec::new()
-    }
-
-    /// Whether this stage is optional (can be skipped on failure).
-    ///
-    /// Optional stages that fail will not stop the pipeline.
-    /// Default: `false`
-    fn is_optional(&self) -> bool {
-        false
-    }
-
-    /// Failure policy for this stage.
-    ///
-    /// Determines how the pipeline handles failures in this stage:
-    /// - `Fail`: Stop the entire pipeline (default for required stages)
-    /// - `Skip`: Skip this stage, continue pipeline
-    /// - `Retry`: Retry with exponential backoff
-    fn failure_policy(&self) -> FailurePolicy {
-        if self.is_optional() {
-            FailurePolicy::skip()
-        } else {
-            FailurePolicy::fail()
-        }
-    }
-
-    /// Whether this stage can trigger backtracking.
-    ///
-    /// Stages like Evaluate that evaluate sufficiency may need to
-    /// trigger additional search iterations.
-    fn can_backtrack(&self) -> bool {
-        false
-    }
-
-    /// Priority for ordering (lower = earlier).
-    ///
-    /// Used when stages have no dependency relationship.
-    /// Default: 100
-    fn priority(&self) -> i32 {
-        100
-    }
-}
diff --git a/rust/src/retrieval/pipeline_retriever.rs b/rust/src/retrieval/pipeline_retriever.rs
deleted file mode 100644
index 9471bc3d..00000000
--- a/rust/src/retrieval/pipeline_retriever.rs
+++ /dev/null
@@ -1,295 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Pipeline-based retriever implementation.
-//!
-//! This module provides a `Retriever` implementation that uses the
-//! new pipeline architecture (RetrievalOrchestrator) internally.
-
-use async_trait::async_trait;
-use std::sync::Arc;
-
-use super::content::ContentAggregatorConfig;
-use super::pipeline::RetrievalOrchestrator;
-use super::retriever::{CostEstimate, Retriever, RetrieverError, RetrieverResult};
-use super::stages::{AnalyzeStage, EvaluateStage, PlanStage, SearchStage};
-use super::strategy::LlmStrategy;
-use super::stream::RetrieveEventReceiver;
-use super::types::{RetrieveOptions, RetrieveResponse};
-use crate::document::{DocumentTree, ReasoningIndex};
-use crate::llm::LlmClient;
-use crate::llm::memo::MemoStore;
-use crate::retrieval::pilot::{LlmPilot, PilotConfig};
-
-/// Pipeline-based retriever using the stage architecture.
-///
-/// This retriever uses the new pipeline architecture with:
-/// - Analyze stage: Query complexity and keyword extraction
-/// - Plan stage: Strategy and algorithm selection
-/// - Search stage: Tree traversal
-/// - Evaluate stage: Sufficiency checking
-///
-/// # Example
-///
-/// ```rust,ignore
-/// let retriever = PipelineRetriever::new()
-///     .with_llm_client(llm_client);
-///
-/// let response = retriever.retrieve(&tree, "query", &options).await?;
-/// ```
-pub struct PipelineRetriever {
-    llm_client: Option<LlmClient>,
-    max_backtracks: usize,
-    max_iterations: usize,
-    /// Content aggregator configuration.
-    content_config: Option<ContentAggregatorConfig>,
-    /// Memo store for caching LLM decisions.
-    memo_store: Option<MemoStore>,
-}
-
-impl Default for PipelineRetriever {
-    fn default() -> Self {
-        Self::new()
-    }
-}
-
-impl PipelineRetriever {
-    /// Create a new pipeline retriever.
-    pub fn new() -> Self {
-        Self {
-            llm_client: None,
-            max_backtracks: 5,
-            max_iterations: 10,
-            content_config: None,
-            memo_store: None,
-        }
-    }
-
-    /// Add LLM client for enhanced retrieval.
-    pub fn with_llm_client(mut self, client: LlmClient) -> Self {
-        self.llm_client = Some(client);
-        self
-    }
-
-    /// Set maximum backtracks for incremental retrieval.
-    pub fn with_max_backtracks(mut self, n: usize) -> Self {
-        self.max_backtracks = n;
-        self
-    }
-
-    /// Set maximum total iterations.
-    pub fn with_max_iterations(mut self, n: usize) -> Self {
-        self.max_iterations = n;
-        self
-    }
-
-    /// Set content aggregator configuration.
-    ///
-    /// When enabled, the Evaluate stage uses precision-focused content
-    /// aggregation with relevance scoring and token budget control.
-    pub fn with_content_config(mut self, config: ContentAggregatorConfig) -> Self {
-        self.content_config = Some(config);
-        self
-    }
-
-    /// Add a memo store for caching LLM decisions.
-    ///
-    /// When enabled, the pilot will cache navigation decisions based on
-    /// context fingerprints, avoiding redundant API calls for similar
-    /// navigation scenarios.
-    pub fn with_memo_store(mut self, store: MemoStore) -> Self {
-        self.memo_store = Some(store);
-        self
-    }
-
-    /// Build the orchestrator with all stages.
-    fn build_orchestrator(&self) -> RetrievalOrchestrator {
-        let mut orchestrator = RetrievalOrchestrator::new()
-            .with_max_backtracks(self.max_backtracks)
-            .with_max_iterations(self.max_iterations);
-
-        // Add analyze stage (with LLM client for complexity detection)
-        let mut analyze_stage = AnalyzeStage::new();
-        if let Some(ref client) = self.llm_client {
-            analyze_stage = analyze_stage.with_llm_client(client.clone());
-        }
-        if let Some(ref store) = self.memo_store {
-            analyze_stage = analyze_stage.with_memo_store(store.clone());
-        }
-        orchestrator = orchestrator.stage(analyze_stage);
-
-        // Add plan stage
-        let mut plan_stage = PlanStage::new();
-        if let Some(ref client) = self.llm_client {
-            plan_stage = plan_stage.with_llm_client(client.clone());
-        }
-        orchestrator = orchestrator.stage(plan_stage);
-
-        // Add search stage with Pilot for semantic navigation
-        let mut search_stage = SearchStage::new().with_llm_client(self.llm_client.clone());
-        if let Some(ref client) = self.llm_client {
-            // Create LLM-based Pilot for semantic navigation guidance
-            let mut pilot = LlmPilot::new(client.clone(), PilotConfig::default());
-
-            // Add memo store if available
-            if let Some(ref store) = self.memo_store {
-                pilot = pilot.with_memo_store(store.clone());
-            }
-
-            search_stage = search_stage.with_pilot(Arc::new(pilot));
-
-            // Create LLM strategy with memo store for node evaluation
-            let mut llm_strategy = LlmStrategy::new(client.clone());
-            if let Some(ref store) = self.memo_store {
-                llm_strategy = llm_strategy.with_memo_store(store.clone());
-            }
-            search_stage = search_stage.with_llm_strategy(llm_strategy);
-        }
-        orchestrator = orchestrator.stage(search_stage);
-
-        // Add evaluate stage with optional content aggregator
-        let mut evaluate_stage = EvaluateStage::new();
-        if let Some(ref store) = self.memo_store {
-            evaluate_stage = evaluate_stage.with_memo_store(store.clone());
-        }
-        if let Some(ref client) = self.llm_client {
-            evaluate_stage = evaluate_stage.with_llm_judge(client.clone());
-        }
-        // Configure content aggregator if provided
-        if let Some(ref config) = self.content_config {
-            evaluate_stage = evaluate_stage.with_content_aggregator(config.clone());
-        }
-        orchestrator = orchestrator.stage(evaluate_stage);
-
-        orchestrator
-    }
-
-    /// Convert pipeline options to retriever options format.
-    fn options_to_retrieve_options(&self, options: &RetrieveOptions) -> RetrieveOptions {
-        options.clone()
-    }
-
-    /// Retrieve with optional reasoning index for fast-path lookup.
-    pub async fn retrieve_with_reasoning_index(
-        &self,
-        tree: &DocumentTree,
-        query: &str,
-        options: &RetrieveOptions,
-        reasoning_index: Option<ReasoningIndex>,
-    ) -> RetrieverResult<RetrieveResponse> {
-        let mut orchestrator = self.build_orchestrator();
-        let tree_arc = Arc::new(tree.clone());
-
-        let response = orchestrator
-            .execute_with_reasoning_index(
-                tree_arc,
-                query,
-                self.options_to_retrieve_options(options),
-                reasoning_index,
-            )
-            .await
-            .map_err(|e| RetrieverError::Internal(e.to_string()))?;
-
-        Ok(response)
-    }
-
-    /// Execute streaming retrieval.
-    ///
-    /// Returns a channel receiver that yields [`RetrieveEvent`]s as the
-    /// pipeline progresses. The stream always terminates with either
-    /// `Completed` or `Error`.
-    ///
-    /// This is the streaming counterpart of [`retrieve`](Retriever::retrieve).
-    /// The non-streaming path is not affected.
-    pub fn retrieve_streaming(
-        &self,
-        tree: &DocumentTree,
-        query: &str,
-        options: &RetrieveOptions,
-    ) -> (tokio::task::JoinHandle<()>, RetrieveEventReceiver) {
-        let orchestrator = self.build_orchestrator();
-        let tree_arc = Arc::new(tree.clone());
-        let opts = self.options_to_retrieve_options(options);
-
-        orchestrator.execute_streaming(tree_arc, query, opts)
-    }
-}
-
-#[async_trait]
-impl Retriever for PipelineRetriever {
-    async fn retrieve(
-        &self,
-        tree: &DocumentTree,
-        query: &str,
-        options: &RetrieveOptions,
-    ) -> RetrieverResult<RetrieveResponse> {
-        let mut orchestrator = self.build_orchestrator();
-        let tree_arc = Arc::new(tree.clone());
-
-        // Execute the pipeline
-        let response = orchestrator
-            .execute(tree_arc, query, self.options_to_retrieve_options(options))
-            .await
-            .map_err(|e| RetrieverError::Internal(e.to_string()))?;
-
-        Ok(response)
-    }
-
-    fn name(&self) -> &'static str {
-        "pipeline"
-    }
-
-    fn supports_options(&self, _options: &RetrieveOptions) -> bool {
-        true // Pipeline retriever supports all options
-    }
-
-    fn estimate_cost(&self, tree: &DocumentTree, options: &RetrieveOptions) -> CostEstimate {
-        // Estimate based on tree size and options
-        let node_count = tree.node_count();
-        let base_llm_calls = if options.sufficiency_check { 2 } else { 1 };
-
-        CostEstimate {
-            llm_calls: base_llm_calls + (node_count / 10), // Rough estimate
-            tokens: node_count * 50,                       // Conservative estimate
-        }
-    }
-}
-
-impl Clone for PipelineRetriever {
-    fn clone(&self) -> Self {
-        Self {
-            llm_client: self.llm_client.clone(),
-            max_backtracks: self.max_backtracks,
-            max_iterations: self.max_iterations,
-            content_config: self.content_config.clone(),
-            memo_store: self.memo_store.clone(),
-        }
-    }
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-
-    #[test]
-    fn test_pipeline_retriever_creation() {
-        let retriever = PipelineRetriever::new();
-        assert_eq!(retriever.name(), "pipeline");
-        assert!(retriever.llm_client.is_none());
-    }
-
-    #[test]
-    fn test_pipeline_retriever_clone() {
-        let retriever = PipelineRetriever::new().with_max_backtracks(3);
-        let cloned = retriever.clone();
-        assert_eq!(cloned.name(), "pipeline");
-        assert_eq!(cloned.max_backtracks, 3);
-    }
-
-    #[test]
-    fn test_pipeline_retriever_with_content_config() {
-        let config = ContentAggregatorConfig::default();
-        let retriever = PipelineRetriever::new().with_content_config(config);
-        assert!(retriever.content_config.is_some());
-    }
-}
diff --git a/rust/src/retrieval/postprocessor.rs b/rust/src/retrieval/postprocessor.rs
new file mode 100644
index 00000000..fddc8c5e
--- /dev/null
+++ b/rust/src/retrieval/postprocessor.rs
@@ -0,0 +1,130 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! Post-processing of agent output into client-facing results.
+//!
+//! Converts raw agent [`Output`] into one or more [`QueryResultItem`]s.
+//! When evidence comes from multiple documents (distinct `doc_name` values),
+//! results are split by document so the caller can see per-doc attribution.
+
+use std::collections::BTreeMap;
+
+use crate::agent::config::{Evidence, Metrics, Output};
+use crate::client::{Confidence, EvidenceItem, QueryMetrics, QueryResultItem};
+
+/// Convert agent output to query result items, split by document.
+///
+/// Groups evidence by `doc_name` and creates one `QueryResultItem` per document.
+/// For single-document queries (all evidence has the same or no `doc_name`),
+/// returns a single item with the given `doc_id`.
+///
+/// The synthesized answer is shared across all items (it was produced from
+/// cross-document evidence). Each item gets its own subset of evidence.
+pub fn to_results(output: &Output, doc_id: &str) -> Vec<QueryResultItem> {
+    if output.evidence.is_empty() {
+        return vec![empty_item(doc_id, &output.answer, output.confidence)];
+    }
+
+    // Group evidence by doc_name
+    let groups = group_by_doc(&output.evidence);
+
+    if groups.len() <= 1 {
+        // Single doc — return one item
+        return vec![build_item(
+            doc_id,
+            &output.answer,
+            output.confidence,
+            &output.evidence,
+            &output.metrics,
+        )];
+    }
+
+    // Multi-doc — one item per document
+    groups
+        .into_iter()
+        .map(|(name, refs)| {
+            let did = name.as_deref().unwrap_or(doc_id);
+            let evidence: Vec<Evidence> = refs.iter().map(|e| (*e).clone()).collect();
+            build_item(
+                did,
+                &output.answer,
+                output.confidence,
+                &evidence,
+                &output.metrics,
+            )
+        })
+        .collect()
+}
+
+/// Group evidence by `doc_name`, preserving order.
+fn group_by_doc(evidence: &[Evidence]) -> BTreeMap<Option<String>, Vec<&Evidence>> {
+    let mut groups: BTreeMap<Option<String>, Vec<&Evidence>> = BTreeMap::new();
+    for ev in evidence {
+        groups.entry(ev.doc_name.clone()).or_default().push(ev);
+    }
+    groups
+}
+
+/// Build a single enriched result item.
+fn build_item(
+    doc_id: &str,
+    answer: &str,
+    confidence: Confidence,
+    evidence: &[Evidence],
+    metrics: &Metrics,
+) -> QueryResultItem {
+    let node_ids: Vec<String> = evidence.iter().map(|e| e.source_path.clone()).collect();
+    let evidence_items: Vec<EvidenceItem> = evidence
+        .iter()
+        .map(|e| EvidenceItem {
+            title: e.node_title.clone(),
+            path: e.source_path.clone(),
+            content: e.content.clone(),
+            doc_name: e.doc_name.clone(),
+        })
+        .collect();
+
+    let content = if answer.is_empty() {
+        evidence
+            .iter()
+            .map(|e| format!("## {}\n{}", e.node_title, e.content))
+            .collect::<Vec<_>>()
+            .join("\n\n---\n\n")
+    } else {
+        answer.to_string()
+    };
+
+    let evidence_count = evidence.len();
+
+    QueryResultItem {
+        doc_id: doc_id.to_string(),
+        node_ids,
+        content,
+        evidence: evidence_items,
+        metrics: Some(QueryMetrics {
+            llm_calls: metrics.llm_calls,
+            rounds_used: metrics.rounds_used,
+            nodes_visited: metrics.nodes_visited,
+            evidence_count,
+            evidence_chars: metrics.evidence_chars,
+        }),
+        confidence,
+    }
+}
+
+/// Build an empty result item (no evidence).
+fn empty_item(doc_id: &str, answer: &str, confidence: Confidence) -> QueryResultItem {
+    let content = if answer.is_empty() {
+        String::new()
+    } else {
+        answer.to_string()
+    };
+    QueryResultItem {
+        doc_id: doc_id.to_string(),
+        node_ids: Vec::new(),
+        content,
+        evidence: Vec::new(),
+        metrics: None,
+        confidence,
+    }
+}
diff --git a/rust/src/retrieval/reference.rs b/rust/src/retrieval/reference.rs
deleted file mode 100644
index cf681f6b..00000000
--- a/rust/src/retrieval/reference.rs
+++ /dev/null
@@ -1,522 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Reference following for in-document cross-references.
-//!
-//! This module implements the ability to follow references found within
-//! document content, such as "see Appendix G" or "refer to Table 5.3".
-//!
-//! # Architecture
-//!
-//! ```text
-//! ┌─────────────────────────────────────────────────────────────┐
-//! │                   ReferenceFollower                          │
-//! ├─────────────────────────────────────────────────────────────┤
-//! │                                                              │
-//! │  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐         │
-//! │  │ Extract     │─▶│ Resolve     │─▶│ Expand      │         │
-//! │  │ References  │  │ References  │  │ Context     │         │
-//! │  └─────────────┘  └─────────────┘  └─────────────┘         │
-//! │                                                              │
-//! │  Features:                                                   │
-//! │  • Follow "see Section X" references                        │
-//! │  • Follow "see Appendix G" references                       │
-//! │  • Follow "Table/Figure X" references                       │
-//! │  • Depth-limited expansion                                  │
-//! │  • Reference cycle detection                                │
-//! └─────────────────────────────────────────────────────────────┘
-//! ```
-//!
-//! # Integration with Retrieval
-//!
-//! Reference following is triggered when:
-//! 1. Search finds content containing references
-//! 2. Evaluate determines current content is insufficient
-//! 3. Pilot suggests following a specific reference
-//!
-//! # Example
-//!
-//! ```ignore
-//! use vectorless::retrieval::reference::{ReferenceFollower, ReferenceConfig};
-//!
-//! let follower = ReferenceFollower::new(ReferenceConfig {
-//!     max_depth: 3,
-//!     max_references: 10,
-//!     ..Default::default()
-//! });
-//!
-//! // Follow references from a node
-//! let expanded = follower.follow_from_node(&tree, &index, node_id, &query);
-//! for (ref_node_id, ref_text) in expanded {
-//!     println!("Found referenced node: {} via '{}'", ref_node_id, ref_text);
-//! }
-//! ```
-
-use std::collections::HashSet;
-
-use crate::document::{
-    DocumentTree, NodeId, NodeReference, RefType, ReferenceExtractor, RetrievalIndex,
-};
-
-/// Configuration for reference following.
-#[derive(Debug, Clone)]
-pub struct ReferenceConfig {
-    /// Maximum depth for following chained references.
-    pub max_depth: usize,
-    /// Maximum total references to follow per query.
-    pub max_references: usize,
-    /// Whether to follow page references.
-    pub follow_pages: bool,
-    /// Whether to follow table/figure references.
-    pub follow_tables_figures: bool,
-    /// Minimum confidence threshold for resolution.
-    pub min_confidence: f32,
-    /// Reference types to include.
-    pub include_types: Vec<RefType>,
-}
-
-impl Default for ReferenceConfig {
-    fn default() -> Self {
-        Self {
-            max_depth: 3,
-            max_references: 10,
-            follow_pages: true,
-            follow_tables_figures: true,
-            min_confidence: 0.5,
-            include_types: vec![
-                RefType::Section,
-                RefType::Appendix,
-                RefType::Table,
-                RefType::Figure,
-                RefType::Page,
-            ],
-        }
-    }
-}
-
-impl ReferenceConfig {
-    /// Create a conservative configuration (fewer references).
-    pub fn conservative() -> Self {
-        Self {
-            max_depth: 2,
-            max_references: 5,
-            ..Default::default()
-        }
-    }
-
-    /// Create an aggressive configuration (more references).
-    pub fn aggressive() -> Self {
-        Self {
-            max_depth: 5,
-            max_references: 20,
-            ..Default::default()
-        }
-    }
-
-    /// Check if a reference type should be followed.
-    pub fn should_follow(&self, ref_type: RefType) -> bool {
-        if !self.include_types.contains(&ref_type) {
-            return false;
-        }
-        match ref_type {
-            RefType::Page => self.follow_pages,
-            RefType::Table | RefType::Figure => self.follow_tables_figures,
-            _ => true,
-        }
-    }
-}
-
-/// Result of following a reference.
-#[derive(Debug, Clone)]
-pub struct FollowedReference {
-    /// The node that contained the reference.
-    pub source_node: NodeId,
-    /// The reference that was followed.
-    pub reference: NodeReference,
-    /// The resolved target node (if found).
-    pub target_node: Option<NodeId>,
-    /// Depth in the reference chain (0 = direct from content).
-    pub depth: usize,
-}
-
-impl FollowedReference {
-    /// Check if this reference was resolved.
-    pub fn is_resolved(&self) -> bool {
-        self.target_node.is_some()
-    }
-}
-
-/// Reference follower for expanding content via cross-references.
-#[derive(Debug, Clone)]
-pub struct ReferenceFollower {
-    config: ReferenceConfig,
-}
-
-impl Default for ReferenceFollower {
-    fn default() -> Self {
-        Self::new(ReferenceConfig::default())
-    }
-}
-
-impl ReferenceFollower {
-    /// Create a new reference follower with configuration.
-    pub fn new(config: ReferenceConfig) -> Self {
-        Self { config }
-    }
-
-    /// Create with default configuration.
-    pub fn with_defaults() -> Self {
-        Self::default()
-    }
-
-    /// Follow all references from a node's content.
-    ///
-    /// Returns a list of followed references with their resolved targets.
-    pub fn follow_from_node(
-        &self,
-        tree: &DocumentTree,
-        index: &RetrievalIndex,
-        node_id: NodeId,
-    ) -> Vec<FollowedReference> {
-        let mut results = Vec::new();
-        let mut visited = HashSet::new();
-        visited.insert(node_id);
-
-        self.follow_from_node_inner(tree, index, node_id, 0, &mut visited, &mut results);
-
-        // Sort by confidence and limit
-        results.sort_by(|a, b| {
-            b.reference
-                .confidence
-                .partial_cmp(&a.reference.confidence)
-                .unwrap_or(std::cmp::Ordering::Equal)
-        });
-        results.truncate(self.config.max_references);
-
-        results
-    }
-
-    fn follow_from_node_inner(
-        &self,
-        tree: &DocumentTree,
-        index: &RetrievalIndex,
-        node_id: NodeId,
-        depth: usize,
-        visited: &mut HashSet<NodeId>,
-        results: &mut Vec<FollowedReference>,
-    ) {
-        if depth >= self.config.max_depth {
-            return;
-        }
-
-        if results.len() >= self.config.max_references {
-            return;
-        }
-
-        // Get node content
-        let node = match tree.get(node_id) {
-            Some(n) => n,
-            None => return,
-        };
-
-        // Use pre-extracted references if available, otherwise extract
-        let _refs = if !node.references.is_empty() {
-            node.references.clone()
-        } else {
-            ReferenceExtractor::extract(&node.content)
-        };
-
-        // Resolve references
-        let resolved_refs = ReferenceExtractor::extract_and_resolve(&node.content, tree, index);
-
-        for r#ref in resolved_refs {
-            // Check if we should follow this type
-            if !self.config.should_follow(r#ref.ref_type) {
-                continue;
-            }
-
-            // Check confidence
-            if r#ref.confidence < self.config.min_confidence {
-                continue;
-            }
-
-            let followed = FollowedReference {
-                source_node: node_id,
-                reference: r#ref.clone(),
-                target_node: r#ref.target_node,
-                depth,
-            };
-
-            results.push(followed);
-
-            // Recursively follow if resolved and not visited
-            if let Some(target_id) = r#ref.target_node {
-                if !visited.contains(&target_id) {
-                    visited.insert(target_id);
-                    self.follow_from_node_inner(
-                        tree,
-                        index,
-                        target_id,
-                        depth + 1,
-                        visited,
-                        results,
-                    );
-                }
-            }
-        }
-    }
-
-    /// Follow references from multiple nodes.
-    ///
-    /// Useful for expanding content after initial search.
-    pub fn follow_from_nodes(
-        &self,
-        tree: &DocumentTree,
-        index: &RetrievalIndex,
-        node_ids: &[NodeId],
-    ) -> Vec<FollowedReference> {
-        let mut all_results = Vec::new();
-        let mut visited = HashSet::new();
-        visited.extend(node_ids.iter().copied());
-
-        for &node_id in node_ids {
-            self.follow_from_node_inner(tree, index, node_id, 0, &mut visited, &mut all_results);
-        }
-
-        // Deduplicate by target node
-        let mut seen_targets = HashSet::new();
-        all_results.retain(|r| {
-            if let Some(target) = r.target_node {
-                seen_targets.insert(target)
-            } else {
-                true // Keep unresolved references
-            }
-        });
-
-        // Sort and limit
-        all_results.sort_by(|a, b| {
-            b.reference
-                .confidence
-                .partial_cmp(&a.reference.confidence)
-                .unwrap_or(std::cmp::Ordering::Equal)
-        });
-        all_results.truncate(self.config.max_references);
-
-        all_results
-    }
-
-    /// Find all nodes reachable via references from a starting node.
-    ///
-    /// Returns a set of node IDs that can be reached by following references.
-    pub fn find_reachable_nodes(
-        &self,
-        tree: &DocumentTree,
-        index: &RetrievalIndex,
-        start_node: NodeId,
-    ) -> HashSet<NodeId> {
-        let mut reachable = HashSet::new();
-        let mut stack = vec![start_node];
-
-        while let Some(node_id) = stack.pop() {
-            if reachable.contains(&node_id) {
-                continue;
-            }
-            reachable.insert(node_id);
-
-            // Get references from this node
-            if let Some(node) = tree.get(node_id) {
-                let _refs = if !node.references.is_empty() {
-                    node.references.clone()
-                } else {
-                    ReferenceExtractor::extract(&node.content)
-                };
-
-                // Resolve and add targets to stack
-                let resolved = ReferenceExtractor::extract_and_resolve(&node.content, tree, index);
-                for r#ref in resolved {
-                    if self.config.should_follow(r#ref.ref_type)
-                        && r#ref.confidence >= self.config.min_confidence
-                    {
-                        if let Some(target_id) = r#ref.target_node {
-                            if !reachable.contains(&target_id) {
-                                stack.push(target_id);
-                            }
-                        }
-                    }
-                }
-            }
-
-            // Limit exploration
-            if reachable.len() >= self.config.max_references * 2 {
-                break;
-            }
-        }
-
-        reachable
-    }
-
-    /// Get the configuration.
-    pub fn config(&self) -> &ReferenceConfig {
-        &self.config
-    }
-}
-
-/// Reference expansion result for content aggregation.
-#[derive(Debug, Clone)]
-pub struct ReferenceExpansion {
-    /// Original node IDs.
-    pub original_nodes: Vec<NodeId>,
-    /// Expanded node IDs (via references).
-    pub expanded_nodes: Vec<NodeId>,
-    /// References that were followed.
-    pub references: Vec<FollowedReference>,
-    /// Total expansion depth.
-    pub depth: usize,
-}
-
-impl ReferenceExpansion {
-    /// Get all nodes (original + expanded).
-    pub fn all_nodes(&self) -> Vec<NodeId> {
-        let mut all = self.original_nodes.clone();
-        all.extend(self.expanded_nodes.iter().copied());
-        all
-    }
-
-    /// Get only the expanded nodes.
-    pub fn new_nodes(&self) -> &[NodeId] {
-        &self.expanded_nodes
-    }
-
-    /// Check if any references were followed.
-    pub fn has_expansion(&self) -> bool {
-        !self.expanded_nodes.is_empty()
-    }
-}
-
-/// Expand search results by following references.
-///
-/// This is a convenience function that combines search results with
-/// reference following.
-pub fn expand_with_references(
-    tree: &DocumentTree,
-    index: &RetrievalIndex,
-    initial_nodes: &[NodeId],
-    config: Option<ReferenceConfig>,
-) -> ReferenceExpansion {
-    let config = config.unwrap_or_default();
-    let follower = ReferenceFollower::new(config);
-
-    let references = follower.follow_from_nodes(tree, index, initial_nodes);
-
-    // Collect expanded nodes
-    let mut expanded_nodes = Vec::new();
-    let mut seen = HashSet::new();
-    seen.extend(initial_nodes.iter().copied());
-
-    for r#ref in &references {
-        if let Some(target_id) = r#ref.target_node {
-            if !seen.contains(&target_id) {
-                seen.insert(target_id);
-                expanded_nodes.push(target_id);
-            }
-        }
-    }
-
-    // Calculate max depth
-    let depth = references.iter().map(|r| r.depth).max().unwrap_or(0);
-
-    ReferenceExpansion {
-        original_nodes: initial_nodes.to_vec(),
-        expanded_nodes,
-        references,
-        depth,
-    }
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-
-    #[test]
-    fn test_reference_config_default() {
-        let config = ReferenceConfig::default();
-        assert_eq!(config.max_depth, 3);
-        assert_eq!(config.max_references, 10);
-        assert!(config.follow_pages);
-        assert!(config.follow_tables_figures);
-    }
-
-    #[test]
-    fn test_reference_config_conservative() {
-        let config = ReferenceConfig::conservative();
-        assert_eq!(config.max_depth, 2);
-        assert_eq!(config.max_references, 5);
-    }
-
-    #[test]
-    fn test_reference_config_aggressive() {
-        let config = ReferenceConfig::aggressive();
-        assert_eq!(config.max_depth, 5);
-        assert_eq!(config.max_references, 20);
-    }
-
-    #[test]
-    fn test_reference_config_should_follow() {
-        let config = ReferenceConfig::default();
-
-        assert!(config.should_follow(RefType::Section));
-        assert!(config.should_follow(RefType::Appendix));
-        assert!(config.should_follow(RefType::Table));
-        assert!(config.should_follow(RefType::Page));
-        assert!(!config.should_follow(RefType::Unknown));
-    }
-
-    #[test]
-    fn test_followed_reference_is_resolved() {
-        use indextree::Arena;
-
-        let mut arena = Arena::new();
-        let node = arena.new_node(crate::document::TreeNode::default());
-        let node_id = NodeId(node);
-
-        let resolved = FollowedReference {
-            source_node: node_id,
-            reference: NodeReference::new(
-                "Section 2.1".to_string(),
-                "2.1".to_string(),
-                RefType::Section,
-                0,
-            ),
-            target_node: Some(node_id),
-            depth: 0,
-        };
-
-        let unresolved = FollowedReference {
-            source_node: node_id,
-            reference: NodeReference::new(
-                "Section 99".to_string(),
-                "99".to_string(),
-                RefType::Section,
-                0,
-            ),
-            target_node: None,
-            depth: 0,
-        };
-
-        assert!(resolved.is_resolved());
-        assert!(!unresolved.is_resolved());
-    }
-
-    #[test]
-    fn test_reference_expansion() {
-        let expansion = ReferenceExpansion {
-            original_nodes: vec![],
-            expanded_nodes: vec![],
-            references: vec![],
-            depth: 0,
-        };
-
-        assert!(!expansion.has_expansion());
-        assert_eq!(expansion.all_nodes().len(), 0);
-    }
-}
diff --git a/rust/src/retrieval/retriever.rs b/rust/src/retrieval/retriever.rs
deleted file mode 100644
index 1e37d7a0..00000000
--- a/rust/src/retrieval/retriever.rs
+++ /dev/null
@@ -1,157 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Core Retriever trait and related types.
-
-use async_trait::async_trait;
-
-use super::types::{RetrieveOptions, RetrieveResponse};
-use crate::document::DocumentTree;
-
-/// Result type for retriever operations.
-pub type RetrieverResult<T> = Result<T, RetrieverError>;
-
-/// Errors that can occur during retrieval.
-#[derive(Debug, thiserror::Error)]
-pub enum RetrieverError {
-    /// The document tree is empty or invalid.
-    #[error("Invalid document tree: {0}")]
-    InvalidTree(String),
-
-    /// No relevant nodes found for the query.
-    #[error("No relevant nodes found for query")]
-    NoResults,
-
-    /// LLM call failed during retrieval.
-    #[error("LLM error: {0}")]
-    LlmError(String),
-
-    /// Embedding generation failed.
-    #[error("Embedding error: {0}")]
-    EmbeddingError(String),
-
-    /// Cache operation failed.
-    #[error("Cache error: {0}")]
-    CacheError(String),
-
-    /// Configuration error.
-    #[error("Configuration error: {0}")]
-    ConfigError(String),
-
-    /// Internal error during retrieval.
-    #[error("Internal error: {0}")]
-    Internal(String),
-}
-
-/// Trait for document retrieval strategies.
-///
-/// Implementations provide different approaches to navigating
-/// the document tree and finding relevant content.
-#[async_trait]
-pub trait Retriever: Send + Sync {
-    /// Retrieve relevant content for the given query.
-    ///
-    /// # Arguments
-    ///
-    /// * `tree` - The document tree to search
-    /// * `query` - The user's query string
-    /// * `options` - Retrieval options controlling behavior
-    ///
-    /// # Returns
-    ///
-    /// A `RetrieveResponse` containing the retrieved content and metadata.
-    async fn retrieve(
-        &self,
-        tree: &DocumentTree,
-        query: &str,
-        options: &RetrieveOptions,
-    ) -> RetrieverResult<RetrieveResponse>;
-
-    /// Get the name of this retriever for logging/debugging.
-    fn name(&self) -> &str;
-
-    /// Check if this retriever supports the given options.
-    ///
-    /// Some retrievers may not support all features (e.g., sufficiency checking).
-    fn supports_options(&self, _options: &RetrieveOptions) -> bool {
-        true
-    }
-
-    /// Estimate the cost of a retrieval operation.
-    ///
-    /// Returns an estimated number of LLM calls or tokens that will be used.
-    /// Useful for cost-aware strategy selection.
-    fn estimate_cost(&self, tree: &DocumentTree, _options: &RetrieveOptions) -> CostEstimate {
-        let node_count = tree.node_count();
-        CostEstimate {
-            llm_calls: node_count / 2, // Rough estimate
-            tokens: node_count * 100,
-        }
-    }
-}
-
-/// Cost estimate for a retrieval operation.
-#[derive(Debug, Clone, Copy, Default)]
-pub struct CostEstimate {
-    /// Estimated number of LLM calls.
-    pub llm_calls: usize,
-    /// Estimated number of tokens.
-    pub tokens: usize,
-}
-
-/// Context passed to strategies during retrieval.
-#[derive(Debug, Clone)]
-pub struct RetrievalContext {
-    /// The original query.
-    pub query: String,
-    /// Normalized/lowercase query for matching.
-    pub query_normalized: String,
-    /// Query tokens for keyword matching.
-    pub query_tokens: Vec<String>,
-    /// Current depth in the tree.
-    pub current_depth: usize,
-    /// Number of results collected so far.
-    pub results_count: usize,
-    /// Total tokens collected so far.
-    pub tokens_collected: usize,
-    /// Maximum tokens allowed.
-    pub max_tokens: usize,
-    /// Whether sufficiency check is enabled.
-    pub sufficiency_enabled: bool,
-}
-
-impl RetrievalContext {
-    /// Create a new retrieval context from a query.
-    pub fn new(query: &str, max_tokens: usize, sufficiency_enabled: bool) -> Self {
-        let query_normalized = query.to_lowercase();
-        let query_tokens: Vec<String> = query_normalized
-            .split_whitespace()
-            .map(|s| s.to_string())
-            .collect();
-
-        Self {
-            query: query.to_string(),
-            query_normalized,
-            query_tokens,
-            current_depth: 0,
-            results_count: 0,
-            tokens_collected: 0,
-            max_tokens,
-            sufficiency_enabled,
-        }
-    }
-
-    /// Check if we've reached the token limit.
-    pub fn is_token_limit_reached(&self) -> bool {
-        self.tokens_collected >= self.max_tokens
-    }
-
-    /// Calculate token utilization percentage.
-    pub fn token_utilization(&self) -> f32 {
-        if self.max_tokens == 0 {
-            0.0
-        } else {
-            (self.tokens_collected as f32 / self.max_tokens as f32).min(1.0)
-        }
-    }
-}
diff --git a/rust/src/retrieval/scoring/mod.rs b/rust/src/retrieval/scoring/mod.rs
deleted file mode 100644
index 0682ed7d..00000000
--- a/rust/src/retrieval/scoring/mod.rs
+++ /dev/null
@@ -1,12 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Scoring utilities for text relevance assessment.
-//!
-//! This module provides text scoring algorithms (BM25, keyword matching)
-//! that are used across the retrieval pipeline. These are general-purpose
-//! tools, not tied to any specific search algorithm.
-
-pub mod bm25;
-
-pub use bm25::{Bm25Engine, Bm25Params, FieldDocument, STOPWORDS, extract_keywords};
diff --git a/rust/src/retrieval/search/beam.rs b/rust/src/retrieval/search/beam.rs
deleted file mode 100644
index 977fba9b..00000000
--- a/rust/src/retrieval/search/beam.rs
+++ /dev/null
@@ -1,711 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Beam search algorithm with Pilot as primary scorer and backtracking support.
-//!
-//! Explores multiple paths in parallel, keeping only the top-k candidates
-//! at each level. Pilot provides semantic guidance; NodeScorer is the
-//! fallback when Pilot is unavailable.
-//!
-//! # Backtracking
-//!
-//! When the main beam exhausts all paths without finding enough results,
-//! the search pops entries from a fallback stack and tries alternative
-//! branches. This prevents the search from getting stuck in dead ends
-//! caused by Pilot misjudgments at early layers.
-
-use async_trait::async_trait;
-use std::collections::HashSet;
-use tracing::debug;
-
-use super::super::RetrievalContext;
-use super::super::types::{NavigationDecision, NavigationStep, SearchPath};
-use super::{SearchConfig, SearchResult, SearchTree};
-use crate::document::{DocumentTree, NodeId};
-use crate::retrieval::pilot::{Pilot, SearchState};
-use crate::retrieval::pilot::{PilotDecisionCache, score_candidates, score_candidates_detailed};
-
-/// Maximum entries in the fallback stack relative to beam width.
-const FALLBACK_STACK_MULTIPLIER: usize = 3;
-
-/// An entry in the fallback stack representing a viable alternative path
-/// that was truncated from the main beam.
-#[derive(Debug, Clone)]
-struct FallbackEntry {
-    /// The alternative search path.
-    path: SearchPath,
-    /// Score when this path was shelved.
-    score: f32,
-}
-
-/// Beam search — explores multiple paths simultaneously with backtracking.
-///
-/// Keeps top `beam_width` candidates at each level, providing
-/// a balance between exploration and computational cost.
-///
-/// # Pilot Integration
-///
-/// Pilot is the primary scorer (weight=0.7). NodeScorer supplements
-/// for candidates Pilot didn't rank. Decisions are cached by
-/// (query, parent_node_id) to avoid redundant LLM calls.
-///
-/// # Backtracking
-///
-/// Paths truncated from the beam that still have reasonable scores
-/// are kept in a fallback stack. When the main beam empties without
-/// finding enough results, the search pops from the fallback stack,
-/// calls `Pilot::guide_backtrack()` for re-guidance, and continues
-/// from the alternative path.
-pub struct BeamSearch {
-    beam_width: usize,
-}
-
-impl BeamSearch {
-    /// Create a new beam search with default beam width.
-    pub fn new() -> Self {
-        Self { beam_width: 3 }
-    }
-
-    /// Create beam search with specified width.
-    pub fn with_width(width: usize) -> Self {
-        Self {
-            beam_width: width.max(1),
-        }
-    }
-
-    /// Push a path into the fallback stack if it meets the score threshold.
-    fn push_fallback(
-        fallback_stack: &mut Vec<FallbackEntry>,
-        entry: FallbackEntry,
-        min_score: f32,
-        fallback_score_ratio: f32,
-        max_size: usize,
-    ) {
-        let threshold = min_score * fallback_score_ratio;
-        if entry.score < threshold {
-            return;
-        }
-
-        // Evict lowest-score entry if at capacity
-        if fallback_stack.len() >= max_size {
-            if let Some(min_idx) = fallback_stack
-                .iter()
-                .enumerate()
-                .min_by(|(_, a), (_, b)| {
-                    a.score
-                        .partial_cmp(&b.score)
-                        .unwrap_or(std::cmp::Ordering::Equal)
-                })
-                .map(|(i, _)| i)
-            {
-                if entry.score > fallback_stack[min_idx].score {
-                    fallback_stack.swap_remove(min_idx);
-                } else {
-                    return; // New entry isn't better than worst in stack
-                }
-            }
-        }
-
-        fallback_stack.push(entry);
-    }
-
-    /// Pop the highest-score entry from the fallback stack.
-    fn pop_fallback(fallback_stack: &mut Vec<FallbackEntry>) -> Option<FallbackEntry> {
-        if fallback_stack.is_empty() {
-            return None;
-        }
-        // Find and remove the highest-score entry
-        let max_idx = fallback_stack
-            .iter()
-            .enumerate()
-            .max_by(|(_, a), (_, b)| {
-                a.score
-                    .partial_cmp(&b.score)
-                    .unwrap_or(std::cmp::Ordering::Equal)
-            })
-            .map(|(i, _)| i)?;
-        Some(fallback_stack.swap_remove(max_idx))
-    }
-
-    /// Attempt backtracking by popping from the fallback stack and
-    /// consulting Pilot for re-guidance.
-    async fn try_backtrack(
-        &self,
-        tree: &DocumentTree,
-        context: &RetrievalContext,
-        pilot: Option<&dyn Pilot>,
-        _cache: &PilotDecisionCache,
-        visited: &HashSet<NodeId>,
-        fallback_stack: &mut Vec<FallbackEntry>,
-        result: &mut SearchResult,
-        pilot_interventions: &mut usize,
-    ) -> Option<SearchPath> {
-        let entry = Self::pop_fallback(fallback_stack)?;
-        let dead_end_title = entry
-            .path
-            .leaf
-            .and_then(|id| tree.get(id))
-            .map(|n| n.title.clone())
-            .unwrap_or_else(|| "unknown".to_string());
-
-        debug!(
-            "Backtracking: trying alternative path (score={:.2}, dead_end='{}')",
-            entry.score, dead_end_title
-        );
-
-        // Record backtrack in trace
-        result.trace.push(NavigationStep {
-            node_id: format!("{:?}", entry.path.leaf),
-            title: dead_end_title.clone(),
-            score: entry.score,
-            decision: NavigationDecision::BacktrackFrom(dead_end_title),
-            depth: entry.path.nodes.len(),
-        });
-
-        // Consult Pilot for re-guidance at the backtracking point
-        if let Some(p) = pilot {
-            // Get siblings of the dead-end node (alternatives at the same level)
-            let parent_node = if entry.path.nodes.len() >= 2 {
-                entry.path.nodes[entry.path.nodes.len() - 2]
-            } else {
-                tree.root()
-            };
-            let siblings = tree.children(parent_node);
-            let unvisited_siblings: Vec<NodeId> = siblings
-                .into_iter()
-                .filter(|id| !visited.contains(id))
-                .collect();
-
-            if !unvisited_siblings.is_empty() {
-                let path_ref = &entry.path.nodes[..];
-                let state = SearchState {
-                    tree,
-                    query: &context.query,
-                    path: path_ref,
-                    candidates: &unvisited_siblings,
-                    visited,
-                    depth: entry.path.nodes.len(),
-                    iteration: result.iterations,
-                    best_score: result.paths.iter().map(|p| p.score).fold(0.0f32, f32::max),
-                    is_backtracking: true,
-                    step_reasons: Some(&entry.path.step_reasons),
-                };
-
-                if let Some(decision) = p.guide_backtrack(&state).await {
-                    *pilot_interventions += 1;
-
-                    // Use Pilot's ranked candidates to pick the best alternative
-                    if let Some(top) = decision.top_candidate() {
-                        let new_path = entry.path.extend(top.node_id, top.score);
-                        let child_node = tree.get(top.node_id);
-                        result.trace.push(NavigationStep {
-                            node_id: format!("{:?}", top.node_id),
-                            title: child_node.map(|n| n.title.clone()).unwrap_or_default(),
-                            score: top.score,
-                            decision: NavigationDecision::GoToChild(
-                                unvisited_siblings
-                                    .iter()
-                                    .position(|&c| c == top.node_id)
-                                    .unwrap_or(0),
-                            ),
-                            depth: child_node.map(|n| n.depth).unwrap_or(0),
-                        });
-                        result.nodes_visited += 1;
-                        debug!(
-                            "Pilot re-guided to '{}' (score={:.2})",
-                            child_node.map(|n| n.title.clone()).unwrap_or_default(),
-                            top.score
-                        );
-                        return Some(new_path);
-                    }
-                }
-            }
-        }
-
-        // No Pilot guidance or Pilot returned None — use the path as-is
-        // (continue expanding from where it was shelved)
-        debug!("No Pilot guidance during backtrack, using shelved path as-is");
-        Some(entry.path)
-    }
-
-    /// Core beam search logic parameterized by start node.
-    async fn search_impl(
-        &self,
-        tree: &DocumentTree,
-        context: &RetrievalContext,
-        config: &SearchConfig,
-        pilot: Option<&dyn Pilot>,
-        start_node: NodeId,
-    ) -> SearchResult {
-        let mut result = SearchResult::default();
-        let beam_width = config.beam_width.min(self.beam_width);
-        let max_fallback_size = beam_width * FALLBACK_STACK_MULTIPLIER;
-        let mut visited: HashSet<NodeId> = HashSet::new();
-        let cache = PilotDecisionCache::new();
-
-        visited.insert(start_node);
-
-        debug!(
-            "BeamSearch: query='{}', start_node={:?}, beam_width={}, min_score={:.2}, max_backtracks={}",
-            context.query, start_node, beam_width, config.min_score, config.max_backtracks
-        );
-
-        let mut pilot_interventions = 0;
-        let mut backtrack_count = 0;
-
-        // Fallback stack holds viable paths truncated from the beam
-        let mut fallback_stack: Vec<FallbackEntry> = Vec::new();
-
-        // Initialize with start_node's children (includes resolved cross-references)
-        let start_children = tree.children_with_refs(start_node);
-        debug!("Start node has {} children", start_children.len());
-
-        let initial_candidates = score_candidates_detailed(
-            tree,
-            &start_children,
-            &context.query,
-            pilot,
-            &[],
-            &visited,
-            0.7, // Beam: Pilot weight = 0.7
-            Some(&cache),
-            None, // No reasoning history at start
-        )
-        .await;
-
-        if pilot.is_some() && !start_children.is_empty() {
-            pilot_interventions += 1;
-        }
-
-        // Split initial candidates into beam and fallback
-        let mut sorted_initial: Vec<_> = initial_candidates
-            .into_iter()
-            .map(|s| {
-                let mut path = SearchPath::from_node(s.node_id, s.score);
-                // Record reason for initial selection
-                if let Some(reason) = s.reason {
-                    path.step_reasons = vec![Some(reason)];
-                }
-                path
-            })
-            .collect();
-        sorted_initial.sort_by(|a, b| {
-            b.score
-                .partial_cmp(&a.score)
-                .unwrap_or(std::cmp::Ordering::Equal)
-        });
-
-        let mut current_beam: Vec<SearchPath> =
-            sorted_initial.iter().take(beam_width).cloned().collect();
-
-        // Remaining candidates go to fallback stack
-        for path in sorted_initial.iter().skip(beam_width) {
-            Self::push_fallback(
-                &mut fallback_stack,
-                FallbackEntry {
-                    path: path.clone(),
-                    score: path.score,
-                },
-                config.min_score,
-                config.fallback_score_ratio,
-                max_fallback_size,
-            );
-        }
-
-        debug!(
-            "Initial beam={}, fallback_stack={}",
-            current_beam.len(),
-            fallback_stack.len()
-        );
-
-        for iteration in 0..config.max_iterations {
-            result.iterations = iteration + 1;
-
-            // === BACKTRACKING CHECK ===
-            // If beam is empty but we have fallback entries and haven't
-            // found enough results, try backtracking.
-            if current_beam.is_empty() && result.paths.len() < config.top_k {
-                if backtrack_count < config.max_backtracks {
-                    if let Some(new_path) = self
-                        .try_backtrack(
-                            tree,
-                            context,
-                            pilot,
-                            &cache,
-                            &visited,
-                            &mut fallback_stack,
-                            &mut result,
-                            &mut pilot_interventions,
-                        )
-                        .await
-                    {
-                        backtrack_count += 1;
-                        current_beam = vec![new_path];
-                        debug!(
-                            "Backtrack #{}: injected path into beam (remaining fallback={})",
-                            backtrack_count,
-                            fallback_stack.len()
-                        );
-                        // Continue the search from this path
-                        continue;
-                    }
-                }
-                // No more fallback options or max backtracks reached
-                break;
-            }
-
-            if current_beam.is_empty() {
-                break;
-            }
-
-            let mut next_beam = Vec::new();
-
-            for path in &current_beam {
-                if let Some(leaf_id) = path.leaf {
-                    visited.insert(leaf_id);
-
-                    // Check if this is a leaf node
-                    if tree.is_leaf(leaf_id) {
-                        if path.score >= config.min_score {
-                            result.paths.push(path.clone());
-                        }
-                        result.nodes_visited += 1;
-                        continue;
-                    }
-
-                    // Expand this path (includes resolved cross-references)
-                    let children = tree.children_with_refs(leaf_id);
-
-                    let scored_children = score_candidates_detailed(
-                        tree,
-                        &children,
-                        &context.query,
-                        pilot,
-                        &path.nodes,
-                        &visited,
-                        0.7, // Beam: Pilot weight = 0.7
-                        Some(&cache),
-                        Some(&path.step_reasons),
-                    )
-                    .await;
-
-                    if pilot.is_some() && !children.is_empty() {
-                        pilot_interventions += 1;
-                    }
-
-                    for sc in scored_children.into_iter().take(beam_width) {
-                        let new_path = if let Some(ref reason) = sc.reason {
-                            path.extend_with_reason(sc.node_id, sc.score, reason)
-                        } else {
-                            path.extend(sc.node_id, sc.score)
-                        };
-
-                        let child_node = tree.get(sc.node_id);
-                        result.trace.push(NavigationStep {
-                            node_id: format!("{:?}", sc.node_id),
-                            title: child_node.map(|n| n.title.clone()).unwrap_or_default(),
-                            score: sc.score,
-                            decision: NavigationDecision::GoToChild(
-                                children.iter().position(|&c| c == sc.node_id).unwrap_or(0),
-                            ),
-                            depth: child_node.map(|n| n.depth).unwrap_or(0),
-                        });
-
-                        next_beam.push(new_path);
-                        result.nodes_visited += 1;
-                    }
-                }
-            }
-
-            // Sort next beam and split into beam + fallback
-            next_beam.sort_by(|a, b| {
-                b.score
-                    .partial_cmp(&a.score)
-                    .unwrap_or(std::cmp::Ordering::Equal)
-            });
-
-            // Keep top beam_width in the beam, shelve the rest
-            let mut beam_candidates = next_beam;
-            let overflow: Vec<SearchPath> =
-                beam_candidates.split_off(beam_width.min(beam_candidates.len()));
-
-            for path in overflow {
-                let score = path.score;
-                Self::push_fallback(
-                    &mut fallback_stack,
-                    FallbackEntry { path, score },
-                    config.min_score,
-                    config.fallback_score_ratio,
-                    max_fallback_size,
-                );
-            }
-
-            current_beam = beam_candidates;
-
-            if result.paths.len() >= config.top_k {
-                break;
-            }
-        }
-
-        // Add any remaining paths in the beam to results
-        for path in current_beam {
-            if path.score >= config.min_score && result.paths.len() < config.top_k {
-                result.paths.push(path);
-            }
-        }
-
-        // Fallback: if no results found, add best candidates regardless of score
-        if result.paths.is_empty() && config.min_score > 0.0 {
-            debug!("No results above min_score, adding best candidates as fallback");
-            let all_children = tree.children_with_refs(start_node);
-            let fallback = score_candidates(
-                tree,
-                &all_children,
-                &context.query,
-                None, // No Pilot for fallback
-                &[],
-                &visited,
-                0.7,
-                None,
-                None, // No reasoning history for fallback
-            )
-            .await;
-            for (node_id, score) in fallback.into_iter().take(config.top_k) {
-                result.paths.push(SearchPath::from_node(node_id, score));
-            }
-        }
-
-        // Sort final results by score
-        result.paths.sort_by(|a, b| {
-            b.score
-                .partial_cmp(&a.score)
-                .unwrap_or(std::cmp::Ordering::Equal)
-        });
-        result.paths.truncate(config.top_k);
-
-        result.pilot_interventions = pilot_interventions;
-
-        debug!(
-            "BeamSearch complete: paths={}, iterations={}, backtracks={}, pilot_interventions={}",
-            result.paths.len(),
-            result.iterations,
-            backtrack_count,
-            pilot_interventions
-        );
-
-        result
-    }
-}
-
-impl Default for BeamSearch {
-    fn default() -> Self {
-        Self::new()
-    }
-}
-
-#[async_trait]
-impl SearchTree for BeamSearch {
-    async fn search(
-        &self,
-        tree: &DocumentTree,
-        context: &RetrievalContext,
-        config: &SearchConfig,
-        pilot: Option<&dyn Pilot>,
-    ) -> SearchResult {
-        self.search_impl(tree, context, config, pilot, tree.root())
-            .await
-    }
-
-    async fn search_from(
-        &self,
-        tree: &DocumentTree,
-        context: &RetrievalContext,
-        config: &SearchConfig,
-        pilot: Option<&dyn Pilot>,
-        start_node: NodeId,
-    ) -> SearchResult {
-        self.search_impl(tree, context, config, pilot, start_node)
-            .await
-    }
-
-    fn name(&self) -> &'static str {
-        "beam"
-    }
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-    use crate::document::TreeNode;
-    use indextree::Arena;
-
-    /// Helper to create a NodeId from an Arena for tests.
-    fn make_node_id(arena: &mut Arena<TreeNode>) -> NodeId {
-        NodeId(arena.new_node(TreeNode::default()))
-    }
-
-    #[test]
-    fn test_beam_search_creation() {
-        let search = BeamSearch::new();
-        assert_eq!(search.beam_width, 3);
-
-        let search_wide = BeamSearch::with_width(5);
-        assert_eq!(search_wide.beam_width, 5);
-    }
-
-    #[test]
-    fn test_beam_search_minimum_width() {
-        let search = BeamSearch::with_width(0);
-        assert_eq!(search.beam_width, 1);
-    }
-
-    #[test]
-    fn test_fallback_push_and_pop() {
-        let mut arena = Arena::new();
-        let id0 = make_node_id(&mut arena);
-        let id1 = make_node_id(&mut arena);
-        let id2 = make_node_id(&mut arena);
-        let mut stack = Vec::new();
-
-        BeamSearch::push_fallback(
-            &mut stack,
-            FallbackEntry {
-                path: SearchPath::from_node(id0, 0.3),
-                score: 0.3,
-            },
-            0.1,
-            0.5,
-            100,
-        );
-        BeamSearch::push_fallback(
-            &mut stack,
-            FallbackEntry {
-                path: SearchPath::from_node(id1, 0.7),
-                score: 0.7,
-            },
-            0.1,
-            0.5,
-            100,
-        );
-        BeamSearch::push_fallback(
-            &mut stack,
-            FallbackEntry {
-                path: SearchPath::from_node(id2, 0.5),
-                score: 0.5,
-            },
-            0.1,
-            0.5,
-            100,
-        );
-
-        assert_eq!(stack.len(), 3);
-
-        // Pop should return highest score (0.7)
-        let popped = BeamSearch::pop_fallback(&mut stack);
-        assert!(popped.is_some());
-        assert!((popped.unwrap().score - 0.7).abs() < 0.001);
-
-        // Next pop should return 0.5
-        let popped = BeamSearch::pop_fallback(&mut stack);
-        assert!(popped.is_some());
-        assert!((popped.unwrap().score - 0.5).abs() < 0.001);
-    }
-
-    #[test]
-    fn test_fallback_score_threshold() {
-        let mut arena = Arena::new();
-        let id0 = make_node_id(&mut arena);
-        let id1 = make_node_id(&mut arena);
-        let mut stack = Vec::new();
-
-        // Score 0.01 with threshold 0.1 * 0.5 = 0.05 → should be rejected
-        BeamSearch::push_fallback(
-            &mut stack,
-            FallbackEntry {
-                path: SearchPath::from_node(id0, 0.01),
-                score: 0.01,
-            },
-            0.1,
-            0.5,
-            100,
-        );
-        assert_eq!(stack.len(), 0, "Score below threshold should be rejected");
-
-        // Score 0.06 with threshold 0.05 → should be accepted
-        BeamSearch::push_fallback(
-            &mut stack,
-            FallbackEntry {
-                path: SearchPath::from_node(id1, 0.06),
-                score: 0.06,
-            },
-            0.1,
-            0.5,
-            100,
-        );
-        assert_eq!(stack.len(), 1, "Score above threshold should be accepted");
-    }
-
-    #[test]
-    fn test_fallback_capacity_eviction() {
-        let mut arena = Arena::new();
-        let id0 = make_node_id(&mut arena);
-        let id1 = make_node_id(&mut arena);
-        let id2 = make_node_id(&mut arena);
-        let mut stack = Vec::new();
-
-        // Fill to capacity (max_size=2)
-        BeamSearch::push_fallback(
-            &mut stack,
-            FallbackEntry {
-                path: SearchPath::from_node(id0, 0.3),
-                score: 0.3,
-            },
-            0.1,
-            0.5,
-            2,
-        );
-        BeamSearch::push_fallback(
-            &mut stack,
-            FallbackEntry {
-                path: SearchPath::from_node(id1, 0.5),
-                score: 0.5,
-            },
-            0.1,
-            0.5,
-            2,
-        );
-        assert_eq!(stack.len(), 2);
-
-        // Push a higher-score entry → should evict the lowest (0.3)
-        BeamSearch::push_fallback(
-            &mut stack,
-            FallbackEntry {
-                path: SearchPath::from_node(id2, 0.8),
-                score: 0.8,
-            },
-            0.1,
-            0.5,
-            2,
-        );
-        assert_eq!(stack.len(), 2);
-
-        // Verify the 0.3 entry was evicted
-        let scores: Vec<f32> = stack.iter().map(|e| e.score).collect();
-        assert!(scores.contains(&0.5));
-        assert!(scores.contains(&0.8));
-        assert!(!scores.contains(&0.3));
-    }
-
-    #[test]
-    fn test_fallback_empty_pop() {
-        let mut stack: Vec<FallbackEntry> = Vec::new();
-        assert!(BeamSearch::pop_fallback(&mut stack).is_none());
-    }
-
-    #[test]
-    fn test_search_config_backtrack_defaults() {
-        let config = SearchConfig::default();
-        assert_eq!(config.max_backtracks, 3);
-        assert!((config.fallback_score_ratio - 0.5).abs() < 0.001);
-    }
-}
diff --git a/rust/src/retrieval/search/greedy.rs b/rust/src/retrieval/search/greedy.rs
deleted file mode 100644
index 506509b6..00000000
--- a/rust/src/retrieval/search/greedy.rs
+++ /dev/null
@@ -1,185 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Pure Pilot search — LLM-guided single-path tree navigation.
-//!
-//! At each layer, the Pilot scores all children and picks the top-1.
-//! This is the most accurate (but slowest) approach: one LLM call per layer.
-//! Falls back to NodeScorer when Pilot is unavailable.
-
-use async_trait::async_trait;
-use std::collections::HashSet;
-use tracing::debug;
-
-use super::super::RetrievalContext;
-use super::super::types::{NavigationDecision, NavigationStep, SearchPath};
-use super::{SearchConfig, SearchResult, SearchTree};
-use crate::document::{DocumentTree, NodeId};
-use crate::retrieval::pilot::Pilot;
-use crate::retrieval::pilot::{PilotDecisionCache, score_candidates};
-
-/// Pure Pilot search — Pilot picks the best child at each layer.
-///
-/// beam=1: at each level, Pilot evaluates all children and the search
-/// follows only the top-ranked one. When Pilot is unavailable,
-/// falls back to NodeScorer (keyword/BM25).
-pub struct PurePilotSearch;
-
-impl PurePilotSearch {
-    /// Create a new Pure Pilot search.
-    pub fn new() -> Self {
-        Self
-    }
-
-    /// Core search logic parameterized by start node.
-    async fn search_impl(
-        &self,
-        tree: &DocumentTree,
-        context: &RetrievalContext,
-        config: &SearchConfig,
-        pilot: Option<&dyn Pilot>,
-        start_node: NodeId,
-    ) -> SearchResult {
-        let mut result = SearchResult::default();
-        let mut current_path = SearchPath::new();
-        let mut current_node = start_node;
-        let mut visited: HashSet<NodeId> = HashSet::new();
-        let cache = PilotDecisionCache::new();
-
-        debug!(
-            "PurePilotSearch: query='{}', start_node={:?}, max_iterations={}, min_score={:.2}",
-            context.query, start_node, config.max_iterations, config.min_score
-        );
-
-        let mut pilot_interventions = 0;
-
-        for iteration in 0..config.max_iterations {
-            result.iterations = iteration + 1;
-
-            let children = tree.children_with_refs(current_node);
-
-            if children.is_empty() {
-                current_path.leaf = Some(current_node);
-                if !config.leaf_only || tree.is_leaf(current_node) {
-                    result.paths.push(current_path.clone());
-                }
-                break;
-            }
-
-            // Pilot as primary scorer (weight=1.0), NodeScorer as fallback.
-            // Always consult Pilot — no should_intervene guard.
-            let scored_children = score_candidates(
-                tree,
-                &children,
-                &context.query,
-                pilot,
-                &current_path.nodes,
-                &visited,
-                1.0, // PurePilot: Pilot weight = 1.0
-                Some(&cache),
-                None, // No reasoning history tracked
-            )
-            .await;
-
-            if pilot.is_some() {
-                pilot_interventions += 1;
-            }
-
-            // Take only top-1
-            let mut best_child = None;
-            let mut best_score = 0.0;
-
-            for (child_id, score) in scored_children {
-                if score >= config.min_score {
-                    best_child = Some(child_id);
-                    best_score = score;
-                    break;
-                }
-            }
-
-            if let Some(child_id) = best_child {
-                visited.insert(child_id);
-
-                let child_node = tree.get(child_id);
-                result.trace.push(NavigationStep {
-                    node_id: format!("{:?}", child_id),
-                    title: child_node.map(|n| n.title.clone()).unwrap_or_default(),
-                    score: best_score,
-                    decision: NavigationDecision::GoToChild(
-                        children.iter().position(|&c| c == child_id).unwrap_or(0),
-                    ),
-                    depth: child_node.map(|n| n.depth).unwrap_or(0),
-                });
-
-                current_path = current_path.extend(child_id, best_score);
-                current_node = child_id;
-                result.nodes_visited += 1;
-
-                if result.paths.len() >= config.top_k {
-                    break;
-                }
-            } else {
-                current_path.leaf = Some(current_node);
-                if current_path.score > 0.0 {
-                    result.paths.push(current_path);
-                }
-                break;
-            }
-        }
-
-        result.pilot_interventions = pilot_interventions;
-        result
-    }
-}
-
-impl Default for PurePilotSearch {
-    fn default() -> Self {
-        Self::new()
-    }
-}
-
-#[async_trait]
-impl SearchTree for PurePilotSearch {
-    async fn search(
-        &self,
-        tree: &DocumentTree,
-        context: &RetrievalContext,
-        config: &SearchConfig,
-        pilot: Option<&dyn Pilot>,
-    ) -> SearchResult {
-        self.search_impl(tree, context, config, pilot, tree.root())
-            .await
-    }
-
-    async fn search_from(
-        &self,
-        tree: &DocumentTree,
-        context: &RetrievalContext,
-        config: &SearchConfig,
-        pilot: Option<&dyn Pilot>,
-        start_node: NodeId,
-    ) -> SearchResult {
-        self.search_impl(tree, context, config, pilot, start_node)
-            .await
-    }
-
-    fn name(&self) -> &'static str {
-        "pure_pilot"
-    }
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-
-    #[test]
-    fn test_pure_pilot_creation() {
-        let _search = PurePilotSearch::new();
-    }
-
-    #[test]
-    fn test_pure_pilot_default() {
-        let search = PurePilotSearch::default();
-        assert_eq!(search.name(), "pure_pilot");
-    }
-}
diff --git a/rust/src/retrieval/search/mcts.rs b/rust/src/retrieval/search/mcts.rs
deleted file mode 100644
index 9f17cccc..00000000
--- a/rust/src/retrieval/search/mcts.rs
+++ /dev/null
@@ -1,413 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Monte Carlo Tree Search (MCTS) with Pilot-provided priors.
-//!
-//! Uses UCT (Upper Confidence Bound for Trees) to balance exploration
-//! and exploitation. Pilot provides prior scores for the UCT formula,
-//! and guides the simulation (rollout) phase. NodeScorer is the fallback
-//! when Pilot is unavailable.
-//!
-//! # Async
-//!
-//! Both selection and simulation phases are async because Pilot.decide()
-//! requires an LLM call. Pilot decisions are cached by (query, parent_node_id)
-//! so repeated visits to the same node don't trigger redundant LLM calls.
-
-use async_trait::async_trait;
-use std::collections::{HashMap, HashSet};
-use tracing::debug;
-
-use super::super::RetrievalContext;
-use super::super::types::{NavigationDecision, NavigationStep, SearchPath};
-use super::{SearchConfig, SearchResult, SearchTree};
-use crate::document::{DocumentTree, NodeId};
-use crate::retrieval::pilot::Pilot;
-use crate::retrieval::pilot::{NodeScorer, PilotDecisionCache, ScoringContext, score_candidates};
-
-/// Statistics for a node in MCTS.
-#[derive(Debug, Clone, Default)]
-struct NodeStats {
-    /// Number of visits.
-    visits: usize,
-    /// Cumulative score.
-    total_score: f32,
-}
-
-/// MCTS search with Pilot integration.
-///
-/// Pilot provides prior scores that seed the UCT formula. This gives
-/// MCTS semantic guidance while preserving the exploration/exploitation
-/// balance. NodeScorer is used as fallback when Pilot is unavailable.
-pub struct MctsSearch {
-    /// Exploration constant for UCT.
-    exploration_weight: f32,
-}
-
-impl MctsSearch {
-    /// Create a new MCTS search.
-    pub fn new() -> Self {
-        Self {
-            exploration_weight: 1.414, // sqrt(2), classic UCT default
-        }
-    }
-
-    /// Set exploration weight.
-    pub fn with_exploration(mut self, weight: f32) -> Self {
-        self.exploration_weight = weight;
-        self
-    }
-
-    /// Calculate UCT score for a child node.
-    ///
-    /// `prior_score` comes from Pilot (or NodeScorer fallback).
-    fn uct_score(&self, child_stats: &NodeStats, parent_visits: usize, prior_score: f32) -> f32 {
-        if child_stats.visits == 0 {
-            // Unvisited nodes get high priority + prior bonus
-            return f32::INFINITY;
-        }
-
-        let exploitation = child_stats.total_score / child_stats.visits as f32;
-        let exploration = self.exploration_weight * (parent_visits as f32).ln().sqrt()
-            / child_stats.visits as f32;
-
-        // Blend exploitation with Pilot prior
-        0.5 * (exploitation + prior_score) + exploration
-    }
-
-    /// Select best child using UCT with Pilot priors.
-    ///
-    /// When Pilot is available, fetches priors via the cache.
-    /// Falls back to NodeScorer when Pilot is unavailable.
-    async fn select_child(
-        &self,
-        tree: &DocumentTree,
-        context: &RetrievalContext,
-        node_id: NodeId,
-        stats: &HashMap<NodeId, NodeStats>,
-        pilot: Option<&dyn Pilot>,
-        cache: &PilotDecisionCache,
-        visited: &HashSet<NodeId>,
-    ) -> Option<(NodeId, f32)> {
-        let children = tree.children_with_refs(node_id);
-        if children.is_empty() {
-            return None;
-        }
-
-        let parent_stats = stats.get(&node_id).cloned().unwrap_or_default();
-        let parent_visits = parent_stats.visits.max(1);
-
-        // Get Pilot priors for all children (cached)
-        let priors = score_candidates(
-            tree,
-            &children,
-            &context.query,
-            pilot,
-            &[node_id], // simplified path for UCT context
-            visited,
-            0.5, // MCTS prior: balanced Pilot/Scorer
-            Some(cache),
-            None, // No reasoning history tracked
-        )
-        .await;
-
-        // Build prior map
-        let prior_map: HashMap<NodeId, f32> = priors.into_iter().collect();
-
-        let mut best_child = None;
-        let mut best_score = f32::NEG_INFINITY;
-
-        for &child_id in &children {
-            let prior = prior_map.get(&child_id).copied().unwrap_or_else(|| {
-                let scorer = NodeScorer::new(ScoringContext::new(&context.query));
-                scorer.score(tree, child_id)
-            });
-            let child_stats = stats.get(&child_id).cloned().unwrap_or_default();
-            let uct = self.uct_score(&child_stats, parent_visits, prior);
-
-            if uct > best_score {
-                best_score = uct;
-                best_child = Some((child_id, prior));
-            }
-        }
-
-        best_child
-    }
-
-    /// Simulate a rollout from a node using Pilot-guided greedy descent.
-    ///
-    /// When Pilot is available, each layer picks the top-1 Pilot-scored child.
-    /// Falls back to NodeScorer when Pilot is unavailable.
-    async fn simulate(
-        &self,
-        tree: &DocumentTree,
-        context: &RetrievalContext,
-        node_id: NodeId,
-        max_depth: usize,
-        pilot: Option<&dyn Pilot>,
-        cache: &PilotDecisionCache,
-        visited: &HashSet<NodeId>,
-    ) -> f32 {
-        let mut current = node_id;
-        let mut depth = 0;
-        let mut path = vec![node_id];
-        let mut total_score = 0.0f32;
-        let mut count = 0;
-
-        // Initial score
-        let scorer = NodeScorer::new(ScoringContext::new(&context.query));
-        total_score += scorer.score(tree, current);
-        count += 1;
-
-        while depth < max_depth {
-            let children = tree.children_with_refs(current);
-            if children.is_empty() {
-                break;
-            }
-
-            // Use Pilot for greedy descent (cached)
-            let scored = score_candidates(
-                tree,
-                &children,
-                &context.query,
-                pilot,
-                &path,
-                visited,
-                0.5, // MCTS simulation: balanced
-                Some(cache),
-                None, // No reasoning history tracked
-            )
-            .await;
-
-            if let Some(&(child_id, score)) = scored.first() {
-                total_score += score;
-                path.push(child_id);
-                current = child_id;
-            } else {
-                break;
-            }
-            depth += 1;
-            count += 1;
-        }
-
-        total_score / count.max(1) as f32
-    }
-
-    /// Backpropagate score up the tree.
-    fn backpropagate(&self, stats: &mut HashMap<NodeId, NodeStats>, path: &[NodeId], score: f32) {
-        for &node_id in path {
-            let node_stats = stats.entry(node_id).or_default();
-            node_stats.visits += 1;
-            node_stats.total_score += score;
-        }
-    }
-
-    /// Core MCTS logic parameterized by start node.
-    async fn search_impl(
-        &self,
-        tree: &DocumentTree,
-        context: &RetrievalContext,
-        config: &SearchConfig,
-        pilot: Option<&dyn Pilot>,
-        start_node: NodeId,
-    ) -> SearchResult {
-        let mut result = SearchResult::default();
-        let mut stats: HashMap<NodeId, NodeStats> = HashMap::new();
-        let cache = PilotDecisionCache::new();
-        let visited: HashSet<NodeId> = HashSet::new();
-
-        // Initialize root stats
-        stats.insert(start_node, NodeStats::default());
-
-        debug!(
-            "MctsSearch: query='{}', start_node={:?}, max_iterations={}, exploration={:.2}",
-            context.query, start_node, config.max_iterations, self.exploration_weight
-        );
-
-        let mut pilot_interventions = 0;
-
-        for iteration in 0..config.max_iterations {
-            result.iterations = iteration + 1;
-
-            // === Selection phase: traverse tree using UCT ===
-            let mut path = vec![start_node];
-            let mut current = start_node;
-
-            while !tree.is_leaf(current) {
-                if let Some((child_id, _score)) = self
-                    .select_child(tree, context, current, &stats, pilot, &cache, &visited)
-                    .await
-                {
-                    path.push(child_id);
-                    current = child_id;
-                    if pilot.is_some() {
-                        pilot_interventions += 1;
-                    }
-                } else {
-                    break;
-                }
-            }
-
-            result.nodes_visited += path.len();
-
-            // === Simulation phase: Pilot-guided rollout ===
-            let leaf = *path.last().unwrap_or(&start_node);
-            let sim_score = self
-                .simulate(tree, context, leaf, 5, pilot, &cache, &visited)
-                .await;
-
-            if pilot.is_some() {
-                pilot_interventions += 1;
-            }
-
-            // === Backpropagation phase ===
-            self.backpropagate(&mut stats, &path, sim_score);
-
-            // Record trace for the last node
-            if let Some(&last_id) = path.last() {
-                let node = tree.get(last_id);
-                result.trace.push(NavigationStep {
-                    node_id: format!("{:?}", last_id),
-                    title: node.map(|n| n.title.clone()).unwrap_or_default(),
-                    score: sim_score,
-                    decision: NavigationDecision::ExploreMore,
-                    depth: node.map(|n| n.depth).unwrap_or(0),
-                });
-            }
-
-            // Periodically extract paths (every 10 iterations)
-            if iteration > 0 && iteration % 10 == 0 {
-                self.extract_paths(
-                    tree,
-                    start_node,
-                    &stats,
-                    config.min_score,
-                    config.top_k,
-                    &mut result,
-                );
-            }
-        }
-
-        // Final extraction of best paths
-        self.extract_paths(
-            tree,
-            start_node,
-            &stats,
-            config.min_score,
-            config.top_k,
-            &mut result,
-        );
-
-        result.pilot_interventions = pilot_interventions;
-        result
-    }
-
-    /// Extract best paths from MCTS statistics.
-    fn extract_paths(
-        &self,
-        tree: &DocumentTree,
-        root: NodeId,
-        stats: &HashMap<NodeId, NodeStats>,
-        min_score: f32,
-        top_k: usize,
-        result: &mut SearchResult,
-    ) {
-        let root_children = tree.children_with_refs(root);
-        let mut scored_children: Vec<_> = root_children
-            .iter()
-            .filter_map(|&child_id| {
-                stats.get(&child_id).map(|s| {
-                    let avg_score = if s.visits > 0 {
-                        s.total_score / s.visits as f32
-                    } else {
-                        0.0
-                    };
-                    (child_id, avg_score)
-                })
-            })
-            .collect();
-
-        scored_children.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap_or(std::cmp::Ordering::Equal));
-
-        // Clear existing paths and re-extract
-        result.paths = scored_children
-            .into_iter()
-            .filter(|(_, score)| *score >= min_score)
-            .take(top_k)
-            .map(|(node_id, score)| SearchPath::from_node(node_id, score))
-            .collect();
-    }
-}
-
-impl Default for MctsSearch {
-    fn default() -> Self {
-        Self::new()
-    }
-}
-
-#[async_trait]
-impl SearchTree for MctsSearch {
-    async fn search(
-        &self,
-        tree: &DocumentTree,
-        context: &RetrievalContext,
-        config: &SearchConfig,
-        pilot: Option<&dyn Pilot>,
-    ) -> SearchResult {
-        self.search_impl(tree, context, config, pilot, tree.root())
-            .await
-    }
-
-    async fn search_from(
-        &self,
-        tree: &DocumentTree,
-        context: &RetrievalContext,
-        config: &SearchConfig,
-        pilot: Option<&dyn Pilot>,
-        start_node: NodeId,
-    ) -> SearchResult {
-        self.search_impl(tree, context, config, pilot, start_node)
-            .await
-    }
-
-    fn name(&self) -> &'static str {
-        "mcts"
-    }
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-
-    #[test]
-    fn test_mcts_creation() {
-        let search = MctsSearch::new();
-        assert!((search.exploration_weight - 1.414).abs() < 0.01);
-    }
-
-    #[test]
-    fn test_mcts_custom_exploration() {
-        let search = MctsSearch::new().with_exploration(2.0);
-        assert!((search.exploration_weight - 2.0).abs() < 0.01);
-    }
-
-    #[test]
-    fn test_uct_unvisited() {
-        let search = MctsSearch::new();
-        let stats = NodeStats::default();
-        let score = search.uct_score(&stats, 10, 0.5);
-        assert!(score.is_infinite());
-    }
-
-    #[test]
-    fn test_uct_visited() {
-        let search = MctsSearch::new();
-        let stats = NodeStats {
-            visits: 5,
-            total_score: 3.0,
-        };
-        let score = search.uct_score(&stats, 20, 0.8);
-        assert!(score.is_finite());
-        assert!(score > 0.0);
-    }
-}
diff --git a/rust/src/retrieval/search/mod.rs b/rust/src/retrieval/search/mod.rs
deleted file mode 100644
index f2111625..00000000
--- a/rust/src/retrieval/search/mod.rs
+++ /dev/null
@@ -1,20 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Search algorithms for tree traversal.
-//!
-//! This module contains only tree traversal strategies (Beam, MCTS, Greedy).
-//! All scoring intelligence lives in the `pilot` module.
-//! BM25 and keyword utilities live in the `scoring` module.
-
-mod beam;
-mod greedy;
-mod mcts;
-mod toc_navigator;
-mod r#trait;
-
-pub use beam::BeamSearch;
-pub use greedy::PurePilotSearch;
-pub use mcts::MctsSearch;
-pub use toc_navigator::{SearchCue, ToCNavigator};
-pub use r#trait::{SearchConfig, SearchResult, SearchTree};
diff --git a/rust/src/retrieval/search/toc_navigator.rs b/rust/src/retrieval/search/toc_navigator.rs
deleted file mode 100644
index 77be5243..00000000
--- a/rust/src/retrieval/search/toc_navigator.rs
+++ /dev/null
@@ -1,470 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Hierarchical ToC-based node locator.
-//!
-//! Replaces the monolithic `build_toc_for_llm` with a two-phase approach:
-//! - Phase A: BM25 scoring on top-level nodes for fast filtering
-//! - Phase B: Optional LLM refinement when top scores are below a threshold
-
-use std::sync::Arc;
-
-use serde::Deserialize;
-use tracing::{debug, info, warn};
-
-use crate::document::DocumentTree;
-use crate::document::NodeId;
-use crate::llm::LlmClient;
-use crate::llm::memo::MemoStore;
-use crate::retrieval::pilot::NodeScorer;
-
-/// A navigation cue produced by the ToCNavigator.
-#[derive(Debug, Clone)]
-pub struct SearchCue {
-    /// The node to start searching from.
-    pub root: NodeId,
-    /// Confidence score from the locate phase (0.0 - 1.0).
-    pub confidence: f32,
-}
-
-/// Hierarchical ToC navigator — locates relevant subtrees before tree traversal.
-pub struct ToCNavigator {
-    /// Optional LLM client for Phase B refinement.
-    llm_client: Option<LlmClient>,
-    /// Optional memo store for caching locate results.
-    memo_store: Option<Arc<MemoStore>>,
-    /// Maximum number of top branches to return.
-    max_branches: usize,
-    /// Score threshold below which LLM refinement is attempted.
-    llm_threshold: f32,
-}
-
-impl Default for ToCNavigator {
-    fn default() -> Self {
-        Self::new()
-    }
-}
-
-impl ToCNavigator {
-    /// Create a new ToCNavigator with defaults.
-    pub fn new() -> Self {
-        Self {
-            llm_client: None,
-            memo_store: None,
-            max_branches: 3,
-            llm_threshold: 0.6,
-        }
-    }
-
-    /// Set the LLM client for Phase B refinement.
-    pub fn with_llm_client(mut self, client: LlmClient) -> Self {
-        self.llm_client = Some(client);
-        self
-    }
-
-    /// Set the memo store for caching results.
-    pub fn with_memo_store(mut self, store: Arc<MemoStore>) -> Self {
-        self.memo_store = Some(store);
-        self
-    }
-
-    /// Set the maximum number of branches to return.
-    pub fn with_max_branches(mut self, n: usize) -> Self {
-        self.max_branches = n.max(1);
-        self
-    }
-
-    /// Locate relevant subtrees for the given query.
-    ///
-    /// Phase A: Score top-level nodes with BM25 and keep the top-N.
-    /// Phase B: If the best BM25 score is below `llm_threshold` and an LLM
-    ///          client is available, ask the LLM to refine the selection.
-    /// Phase C: If BM25 produced no results and LLM is unavailable, fall back
-    ///          to keyword-overlap matching against section summaries.
-    pub async fn locate(
-        &self,
-        query: &str,
-        tree: &DocumentTree,
-        top_level_nodes: &[NodeId],
-    ) -> Vec<SearchCue> {
-        if top_level_nodes.is_empty() {
-            return vec![SearchCue {
-                root: tree.root(),
-                confidence: 0.5,
-            }];
-        }
-
-        // Phase A: BM25 scoring
-        let scorer = NodeScorer::for_query(query);
-        let scored: Vec<(NodeId, f32)> = top_level_nodes
-            .iter()
-            .map(|&id| (id, scorer.score(tree, id)))
-            .filter(|(_, s)| *s > 0.05)
-            .collect();
-
-        let top_branches = take_top_n(scored, self.max_branches);
-
-        debug!(
-            "ToCNavigator Phase A: {} top-level nodes scored, {} kept after filter",
-            top_level_nodes.len(),
-            top_branches.len()
-        );
-
-        // Phase B: LLM refinement (only when best score is below threshold)
-        if let Some(ref client) = self.llm_client {
-            let best_score = top_branches.first().map(|(_, s)| *s).unwrap_or(0.0);
-            if best_score < self.llm_threshold {
-                info!(
-                    "Top BM25 score {:.3} below threshold {:.3}, attempting LLM refinement",
-                    best_score, self.llm_threshold
-                );
-                return self.llm_refine(query, tree, top_level_nodes, client).await;
-            }
-        }
-
-        if !top_branches.is_empty() {
-            // Return BM25 results as cues
-            return top_branches
-                .into_iter()
-                .map(|(node_id, score)| SearchCue {
-                    root: node_id,
-                    confidence: score,
-                })
-                .collect();
-        }
-
-        // Phase C: BM25 produced nothing — try keyword overlap on summaries.
-        // This handles abstract queries like "What is this project about?"
-        // where the query keywords don't appear in section titles but the
-        // summaries contain relevant semantic matches.
-        let summary_cues = self.match_by_summary(query, tree, top_level_nodes);
-        if !summary_cues.is_empty() {
-            return summary_cues;
-        }
-
-        // Final fallback: search from root
-        debug!("ToCNavigator: no branches above threshold, falling back to root");
-        vec![SearchCue {
-            root: tree.root(),
-            confidence: 0.5,
-        }]
-    }
-
-    /// Match query against section summaries using keyword overlap.
-    ///
-    /// This is a lightweight fallback for abstract queries where BM25
-    /// fails because query terms don't appear verbatim in section titles
-    /// or short content snippets.
-    ///
-    /// For overview-style queries (e.g. "What is this project about?"),
-    /// if no keywords match any section, returns all top-level sections
-    /// with the overview/introduction section boosted.
-    fn match_by_summary(
-        &self,
-        query: &str,
-        tree: &DocumentTree,
-        top_level_nodes: &[NodeId],
-    ) -> Vec<SearchCue> {
-        let query_lower = query.to_lowercase();
-        let query_words: Vec<&str> = query_lower
-            .split_whitespace()
-            .filter(|w| w.len() > 2)
-            .collect();
-
-        let is_overview = Self::is_overview_query(query);
-
-        if query_words.is_empty() && !is_overview {
-            return Vec::new();
-        }
-
-        let mut scored: Vec<(NodeId, f32)> = Vec::new();
-
-        for &node_id in top_level_nodes {
-            if let Some(node) = tree.get(node_id) {
-                let text =
-                    format!("{} {} {}", node.title, node.summary, node.content).to_lowercase();
-
-                let match_count = query_words.iter().filter(|w| text.contains(*w)).count();
-
-                let mut score = if query_words.is_empty() {
-                    0.0
-                } else {
-                    match_count as f32 / query_words.len() as f32
-                };
-
-                // For overview queries, also check if the section title/summary
-                // contains overview-like terms
-                if is_overview {
-                    let title_lower = node.title.to_lowercase();
-                    let summary_lower = node.summary.to_lowercase();
-                    let looks_like_overview = title_lower.contains("overview")
-                        || title_lower.contains("introduction")
-                        || title_lower.contains("summary")
-                        || title_lower.contains("简介")
-                        || title_lower.contains("概述")
-                        || summary_lower.contains("overview")
-                        || summary_lower.contains("introduction");
-
-                    if looks_like_overview {
-                        score = (score + 0.5).min(1.0);
-                    }
-                }
-
-                if score > 0.1 {
-                    scored.push((node_id, score));
-                }
-            }
-        }
-
-        // For overview queries with no matches at all, return the first
-        // section as a reasonable default (it's usually the introduction).
-        if scored.is_empty() && is_overview {
-            if let Some(&first_id) = top_level_nodes.first() {
-                info!(
-                    "ToCNavigator: overview query with no keyword matches, using first section as default"
-                );
-                return vec![SearchCue {
-                    root: first_id,
-                    confidence: 0.6,
-                }];
-            }
-            return Vec::new();
-        }
-
-        scored.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap_or(std::cmp::Ordering::Equal));
-        scored.truncate(self.max_branches);
-
-        if !scored.is_empty() {
-            info!(
-                "ToCNavigator summary match: {} cues from {} nodes",
-                scored.len(),
-                top_level_nodes.len()
-            );
-        }
-
-        scored
-            .into_iter()
-            .map(|(node_id, score)| SearchCue {
-                root: node_id,
-                confidence: score,
-            })
-            .collect()
-    }
-
-    /// Check if a query is asking for a general overview or summary of a document.
-    fn is_overview_query(query: &str) -> bool {
-        let lower = query.to_lowercase();
-
-        let patterns = [
-            "about",
-            "overview",
-            "summary",
-            "introduction",
-            "describe",
-            "what is this",
-            "tell me about",
-            "main idea",
-            "key points",
-            "purpose",
-        ];
-
-        patterns.iter().any(|p| lower.contains(p))
-    }
-
-    /// Phase B: Ask the LLM to pick the most relevant subtrees.
-    ///
-    /// Presents the full top-level TOC to the LLM and lets it select the
-    /// most relevant entries. Uses direct tree traversal so that we can
-    /// correctly map LLM-selected indices back to real NodeIds.
-    async fn llm_refine(
-        &self,
-        query: &str,
-        tree: &DocumentTree,
-        top_level_nodes: &[NodeId],
-        client: &LlmClient,
-    ) -> Vec<SearchCue> {
-        // Collect (title, summary) and the corresponding NodeId directly
-        // from the tree, maintaining index correspondence for LLM response mapping.
-        let mut toc_entries: Vec<(String, Option<String>)> = Vec::new();
-        let mut node_ids: Vec<NodeId> = Vec::new();
-
-        for &node_id in top_level_nodes {
-            collect_tree_entries(tree, node_id, &mut toc_entries, &mut node_ids, 0, 2);
-        }
-
-        if node_ids.is_empty() {
-            warn!("LLM refinement: no nodes collected from top-level branches");
-            return vec![SearchCue {
-                root: tree.root(),
-                confidence: 0.5,
-            }];
-        }
-
-        let toc_str = toc_entries
-            .iter()
-            .enumerate()
-            .map(|(i, (title, summary))| {
-                format!(
-                    "[{}] Title: \"{}\"\n    Summary: \"{}\"",
-                    i + 1,
-                    title,
-                    summary.as_deref().unwrap_or("(no summary)")
-                )
-            })
-            .collect::<Vec<_>>()
-            .join("\n\n");
-
-        let system_prompt = r#"You are a document navigation assistant. Select the most relevant sections for the user's query.
-
-CRITICAL INSTRUCTIONS:
-1. Analyze the user query carefully
-2. Examine the provided Table of Contents entries
-3. Select the TOP 3 most relevant entries
-4. Respond with ONLY valid JSON (no markdown code blocks)
-
-Response format:
-{
-  "reasoning": "Brief analysis",
-  "candidates": [
-    {"node_id": 1, "relevance_score": 0.95, "reason": "Why relevant"},
-    {"node_id": 2, "relevance_score": 0.80, "reason": "Why relevant"},
-    {"node_id": 3, "relevance_score": 0.65, "reason": "Why relevant"}
-  ]
-}
-
-Rules:
-- node_id: MUST be a number from the TOC entries (the number in [N] brackets)
-- relevance_score: 0.0 to 1.0
-- candidates: exactly 3 items, ordered by relevance"#;
-
-        let user_prompt = format!(
-            "USER QUERY: {}\n\nDOCUMENT TOC ({} entries):\n{}\n\nSelect the TOP 3 most relevant entries. Respond with ONLY the JSON object:",
-            query,
-            node_ids.len(),
-            toc_str
-        );
-
-        match client
-            .complete_json::<LocateResponse>(system_prompt, &user_prompt)
-            .await
-        {
-            Ok(llm_response) => {
-                let mut cues = Vec::new();
-                for candidate in &llm_response.candidates {
-                    let idx = candidate.node_id.saturating_sub(1);
-                    if idx < node_ids.len() {
-                        cues.push(SearchCue {
-                            root: node_ids[idx],
-                            confidence: candidate.relevance_score,
-                        });
-                    }
-                }
-
-                if cues.is_empty() {
-                    warn!(
-                        "LLM refinement returned no valid candidates, falling back to summary matching"
-                    );
-                    let summary_cues = self.match_by_summary(query, tree, top_level_nodes);
-                    if summary_cues.is_empty() {
-                        return vec![SearchCue {
-                            root: tree.root(),
-                            confidence: 0.5,
-                        }];
-                    }
-                    return summary_cues;
-                }
-
-                info!(
-                    "LLM refinement selected {} cues (reasoning: {})",
-                    cues.len(),
-                    &llm_response.reasoning[..llm_response.reasoning.len().min(100)]
-                );
-                cues
-            }
-            Err(e) => {
-                warn!(
-                    "LLM refinement failed: {}, falling back to summary matching",
-                    e
-                );
-                // Don't fall directly to root — try summary matching first
-                let summary_cues = self.match_by_summary(query, tree, top_level_nodes);
-                if summary_cues.is_empty() {
-                    vec![SearchCue {
-                        root: tree.root(),
-                        confidence: 0.5,
-                    }]
-                } else {
-                    summary_cues
-                }
-            }
-        }
-    }
-}
-
-/// Take the top-N scored items, sorted descending by score.
-fn take_top_n(scored: Vec<(NodeId, f32)>, n: usize) -> Vec<(NodeId, f32)> {
-    let mut sorted = scored;
-    sorted.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap_or(std::cmp::Ordering::Equal));
-    sorted.truncate(n);
-    sorted
-}
-
-/// Collect tree entries (title, summary) alongside their NodeIds.
-///
-/// Walks the subtree rooted at `node_id` up to `max_depth` levels deep.
-/// The `toc_entries[i]` ↔ `node_ids[i]` correspondence is guaranteed,
-/// so LLM response indices can be mapped back to real NodeIds.
-fn collect_tree_entries(
-    tree: &DocumentTree,
-    node_id: NodeId,
-    entries: &mut Vec<(String, Option<String>)>,
-    node_ids: &mut Vec<NodeId>,
-    depth: usize,
-    max_depth: usize,
-) {
-    if depth > max_depth {
-        return;
-    }
-    if let Some(node) = tree.get(node_id) {
-        let summary = if node.summary.is_empty() {
-            None
-        } else {
-            Some(node.summary.clone())
-        };
-        entries.push((node.title.clone(), summary));
-        node_ids.push(node_id);
-
-        for child_id in tree.children_with_refs(node_id) {
-            collect_tree_entries(tree, child_id, entries, node_ids, depth + 1, max_depth);
-        }
-    }
-}
-
-/// LLM response for locate query.
-#[derive(Debug, Clone, Deserialize)]
-struct LocateResponse {
-    reasoning: String,
-    candidates: Vec<LocateCandidate>,
-}
-
-/// A candidate from LLM locate response.
-#[derive(Debug, Clone, Deserialize)]
-struct LocateCandidate {
-    node_id: usize,
-    relevance_score: f32,
-    #[allow(dead_code)]
-    reason: String,
-}
-
-#[cfg(test)]
-mod tests {
-    #[test]
-    fn test_take_top_n_logic() {
-        let mut scored: Vec<(u32, f32)> = vec![(0, 0.1), (1, 0.9), (2, 0.5), (3, 0.3)];
-        scored.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap_or(std::cmp::Ordering::Equal));
-        scored.truncate(2);
-        assert_eq!(scored.len(), 2);
-        assert_eq!(scored[0].1, 0.9);
-        assert_eq!(scored[1].1, 0.5);
-    }
-}
diff --git a/rust/src/retrieval/search/trait.rs b/rust/src/retrieval/search/trait.rs
deleted file mode 100644
index b77b645c..00000000
--- a/rust/src/retrieval/search/trait.rs
+++ /dev/null
@@ -1,150 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Search algorithm trait and common types.
-
-use async_trait::async_trait;
-
-use super::super::RetrievalContext;
-use super::super::types::{NavigationStep, SearchPath};
-use crate::document::{DocumentTree, NodeId};
-use crate::retrieval::pilot::Pilot;
-
-/// Result of a search operation.
-#[derive(Debug, Clone)]
-pub struct SearchResult {
-    /// Paths found during search.
-    pub paths: Vec<SearchPath>,
-    /// Navigation trace.
-    pub trace: Vec<NavigationStep>,
-    /// Number of nodes visited.
-    pub nodes_visited: usize,
-    /// Number of iterations performed.
-    pub iterations: usize,
-    /// Number of Pilot interventions.
-    pub pilot_interventions: usize,
-}
-
-impl Default for SearchResult {
-    fn default() -> Self {
-        Self {
-            paths: Vec::new(),
-            trace: Vec::new(),
-            nodes_visited: 0,
-            iterations: 0,
-            pilot_interventions: 0,
-        }
-    }
-}
-
-/// Configuration for search algorithms.
-#[derive(Debug, Clone)]
-pub struct SearchConfig {
-    /// Maximum number of results to return.
-    pub top_k: usize,
-    /// Beam width for multi-path search.
-    pub beam_width: usize,
-    /// Maximum iterations.
-    pub max_iterations: usize,
-    /// Minimum score to include a path.
-    pub min_score: f32,
-    /// Whether to include leaf nodes only.
-    pub leaf_only: bool,
-    /// Maximum number of backtracking attempts per search.
-    ///
-    /// When the main beam exhausts all paths without finding enough
-    /// results, the search can pop entries from the fallback stack
-    /// and try alternative branches. This limits how many times
-    /// that happens. Default: equal to `beam_width`.
-    pub max_backtracks: usize,
-    /// Minimum score ratio for a path to be eligible for the fallback stack.
-    ///
-    /// Expressed as a fraction of `min_score`. Paths truncated from the
-    /// beam with a score above `min_score * fallback_score_ratio` are
-    /// kept in the fallback stack for potential backtracking.
-    /// Default: 0.5.
-    pub fallback_score_ratio: f32,
-}
-
-impl Default for SearchConfig {
-    fn default() -> Self {
-        Self {
-            top_k: 5,
-            beam_width: 3,
-            max_iterations: 10,
-            min_score: 0.1,
-            leaf_only: false,
-            max_backtracks: 3,
-            fallback_score_ratio: 0.5,
-        }
-    }
-}
-
-/// Trait for tree search algorithms.
-///
-/// Implementations provide different strategies for exploring
-/// the document tree to find relevant content.
-///
-/// # Pilot Integration
-///
-/// Search algorithms can optionally accept a [`Pilot`] for intelligent
-/// navigation guidance at key decision points. When a Pilot is provided,
-/// the algorithm consults it at:
-/// - Fork points (multiple candidates)
-/// - Low confidence situations
-/// - Backtracking decisions
-///
-/// When no Pilot is provided (None), the algorithm uses its default
-/// scoring mechanism.
-#[async_trait]
-pub trait SearchTree: Send + Sync {
-    /// Search the tree for relevant nodes.
-    ///
-    /// # Arguments
-    ///
-    /// * `tree` - The document tree to search
-    /// * `context` - Retrieval context with query information
-    /// * `config` - Search configuration
-    /// * `pilot` - Optional Pilot for navigation guidance
-    ///
-    /// # Returns
-    ///
-    /// A `SearchResult` with paths and trace information.
-    async fn search(
-        &self,
-        tree: &DocumentTree,
-        context: &RetrievalContext,
-        config: &SearchConfig,
-        pilot: Option<&dyn Pilot>,
-    ) -> SearchResult;
-
-    /// Search without Pilot (uses default algorithm scoring).
-    async fn search_without_pilot(
-        &self,
-        tree: &DocumentTree,
-        context: &RetrievalContext,
-        config: &SearchConfig,
-    ) -> SearchResult {
-        self.search(tree, context, config, None).await
-    }
-
-    /// Search starting from a specific node instead of the root.
-    ///
-    /// This allows tree traversal to be constrained to a subtree
-    /// identified by the ToCNavigator. The default implementation
-    /// falls back to a full search from root.
-    async fn search_from(
-        &self,
-        tree: &DocumentTree,
-        context: &RetrievalContext,
-        config: &SearchConfig,
-        pilot: Option<&dyn Pilot>,
-        start_node: NodeId,
-    ) -> SearchResult {
-        let _ = start_node; // default: ignore start_node
-        self.search(tree, context, config, pilot).await
-    }
-
-    /// Get the name of this search algorithm.
-    fn name(&self) -> &str;
-}
diff --git a/rust/src/retrieval/stages/analyze.rs b/rust/src/retrieval/stages/analyze.rs
deleted file mode 100644
index 0c8bcdab..00000000
--- a/rust/src/retrieval/stages/analyze.rs
+++ /dev/null
@@ -1,515 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Analyze Stage - Query analysis and information extraction.
-//!
-//! This stage analyzes the query to determine:
-//! - Query complexity (Simple/Medium/Complex)
-//! - Keywords for matching
-//! - Target sections based on ToC matching
-//! - Query decomposition for complex queries
-
-use async_trait::async_trait;
-use tracing::info;
-
-use crate::document::{DocumentTree, NodeId, TocView};
-use crate::llm::memo::MemoStore;
-use crate::retrieval::complexity::ComplexityDetector;
-use crate::retrieval::decompose::{DecompositionConfig, QueryDecomposer};
-use crate::retrieval::pipeline::{FailurePolicy, PipelineContext, RetrievalStage, StageOutcome};
-use crate::retrieval::types::{NavigationDecision, StageName};
-
-/// Analyze Stage - analyzes queries for retrieval planning.
-///
-/// This stage:
-/// 1. Detects query complexity (Simple/Medium/Complex)
-/// 2. Extracts keywords for matching
-/// 3. Matches target sections from ToC
-/// 4. Decomposes complex queries into sub-queries (if enabled)
-///
-/// # Example
-///
-/// Convert Chinese number string to integer (e.g. "三" → 3, "二十一" → 21).
-fn chinese_num_to_int(s: &str) -> Option<usize> {
-    let chars: Vec<char> = s.chars().collect();
-    if chars.is_empty() {
-        return None;
-    }
-    // If purely digits, parse directly
-    if chars.iter().all(|c| c.is_ascii_digit()) {
-        return s.parse().ok();
-    }
-    let map = |c: char| -> usize {
-        match c {
-            '一' => 1,
-            '二' => 2,
-            '三' => 3,
-            '四' => 4,
-            '五' => 5,
-            '六' => 6,
-            '七' => 7,
-            '八' => 8,
-            '九' => 9,
-            '十' => 10,
-            '百' => 100,
-            _ => 0,
-        }
-    };
-    // Simple two-pass: handle 十/百 as positional
-    let mut total: usize = 0;
-    let mut current: usize = 0;
-    for &c in &chars {
-        let v = map(c);
-        if v == 0 {
-            continue;
-        }
-        if v >= 10 {
-            // Positional multiplier
-            let base = if current == 0 { 1 } else { current };
-            total += base * v;
-            current = 0;
-        } else {
-            current = v;
-        }
-    }
-    total += current;
-    if total > 0 { Some(total) } else { None }
-}
-
-/// Analyze Stage - analyzes queries for retrieval planning.
-///
-/// This stage:
-/// 1. Detects query complexity (Simple/Medium/Complex)
-/// 2. Extracts keywords for matching
-/// 3. Matches target sections from ToC
-/// 4. Extracts structural path hints (Section 3.2, 第3章, etc.)
-/// 5. Decomposes complex queries into sub-queries (if enabled)
-///
-/// # Example
-///
-/// ```rust,ignore
-/// let stage = AnalyzeStage::new()
-///     .with_toc_matching(true)
-///     .with_decomposition(true);
-/// ```
-pub struct AnalyzeStage {
-    complexity_detector: ComplexityDetector,
-    toc_view: TocView,
-    enable_toc_matching: bool,
-    /// Query decomposer for complex queries.
-    query_decomposer: Option<QueryDecomposer>,
-    /// Enable query decomposition.
-    enable_decomposition: bool,
-    /// Complexity threshold for triggering decomposition.
-    decomposition_threshold: f32,
-    /// Memo store for caching LLM results.
-    memo_store: Option<MemoStore>,
-}
-
-impl Default for AnalyzeStage {
-    fn default() -> Self {
-        Self::new()
-    }
-}
-
-impl AnalyzeStage {
-    /// Create a new analyze stage.
-    pub fn new() -> Self {
-        Self {
-            complexity_detector: ComplexityDetector::new(),
-            toc_view: TocView::new(),
-            enable_toc_matching: true,
-            query_decomposer: None,
-            enable_decomposition: false,
-            decomposition_threshold: 0.6,
-            memo_store: None,
-        }
-    }
-
-    /// Enable or disable ToC section matching.
-    pub fn with_toc_matching(mut self, enable: bool) -> Self {
-        self.enable_toc_matching = enable;
-        self
-    }
-
-    /// Enable query decomposition with default configuration.
-    pub fn with_decomposition(mut self, enable: bool) -> Self {
-        self.enable_decomposition = enable;
-        if enable && self.query_decomposer.is_none() {
-            self.query_decomposer = Some(QueryDecomposer::new(DecompositionConfig::default()));
-        }
-        self
-    }
-
-    /// Enable query decomposition with custom configuration.
-    pub fn with_decomposition_config(mut self, config: DecompositionConfig) -> Self {
-        self.enable_decomposition = true;
-        self.query_decomposer = Some(QueryDecomposer::new(config));
-        self
-    }
-
-    /// Add memo store for caching complexity detection and decomposition results.
-    pub fn with_memo_store(mut self, store: MemoStore) -> Self {
-        self.memo_store = Some(store);
-        self
-    }
-
-    /// Enable query decomposition and LLM-based complexity detection.
-    pub fn with_llm_client(mut self, client: crate::llm::LlmClient) -> Self {
-        // Use LLM client for complexity detection
-        let mut detector = ComplexityDetector::with_llm_client(client.clone());
-        if let Some(ref store) = self.memo_store {
-            detector = detector.with_memo_store(store.clone());
-        }
-        self.complexity_detector = detector;
-
-        // Also enable query decomposition
-        let mut decomposer =
-            QueryDecomposer::new(DecompositionConfig::default()).with_llm_client(client);
-        if let Some(ref store) = self.memo_store {
-            decomposer = decomposer.with_memo_store(store.clone());
-        }
-        if self.query_decomposer.is_none() {
-            self.query_decomposer = Some(decomposer);
-        } else if let Some(ref mut d) = self.query_decomposer {
-            *d = decomposer;
-        }
-        self.enable_decomposition = true;
-        self
-    }
-
-    /// Set complexity threshold for triggering decomposition.
-    pub fn with_decomposition_threshold(mut self, threshold: f32) -> Self {
-        self.decomposition_threshold = threshold.clamp(0.0, 1.0);
-        self
-    }
-
-    /// Extract keywords from a query.
-    fn extract_keywords(&self, query: &str) -> Vec<String> {
-        // Simple keyword extraction:
-        // 1. Lowercase
-        // 2. Split on whitespace
-        // 3. Remove common stop words
-        // 4. Remove short words (< 2 chars)
-        // 5. Remove punctuation
-
-        let stop_words = [
-            "the", "a", "an", "is", "are", "was", "were", "be", "been", "being", "have", "has",
-            "had", "do", "does", "did", "will", "would", "could", "should", "may", "might", "must",
-            "shall", "can", "need", "dare", "ought", "used", "to", "of", "in", "for", "on", "with",
-            "at", "by", "from", "as", "into", "through", "during", "before", "after", "above",
-            "below", "between", "under", "again", "further", "then", "once", "here", "there",
-            "when", "where", "why", "how", "all", "each", "few", "more", "most", "other", "some",
-            "such", "no", "nor", "not", "only", "own", "same", "so", "than", "too", "very", "just",
-            "and", "but", "if", "or", "because", "until", "while", "although", "though", "what",
-            "which", "who", "whom", "this", "that", "these", "those", "am", "it", "its", "itself",
-            "he", "him", "his", "she", "her", "hers", "they", "them", "their", "we", "us", "our",
-            "you", "your", "i", "me", "my",
-        ];
-
-        query
-            .to_lowercase()
-            .split_whitespace()
-            .filter(|word| {
-                let word = word.trim_matches(|c: char| !c.is_alphanumeric());
-                word.len() >= 2 && !stop_words.contains(&word)
-            })
-            .map(|word| {
-                word.trim_matches(|c: char| !c.is_alphanumeric())
-                    .to_string()
-            })
-            .filter(|word| !word.is_empty())
-            .collect()
-    }
-
-    /// Extract structural path hints from the query.
-    ///
-    /// Recognizes patterns like:
-    /// - "第3章", "第2节", "第一章" (Chinese chapter/section)
-    /// - "Section 3.2", "section 4.1.2" (English section numbers)
-    /// - "Chapter 5", "chapter 10" (English chapter)
-    /// - "3.2.1", "2.1" (bare section numbers)
-    /// - "表3", "Table 5", "图2", "Figure 4" (table/figure references)
-    ///
-    /// Maps them to tree NodeIds via `find_by_structure()`.
-    fn extract_structure_hints(&self, query: &str, tree: &DocumentTree) -> Vec<(String, NodeId)> {
-        let mut hints = Vec::new();
-
-        // Chinese patterns: 第X章, 第X节, 第X部分
-        for cap in regex::Regex::new(r"第([一二三四五六七八九十百\d]+)[章节部分]")
-            .unwrap()
-            .captures_iter(query)
-        {
-            let num = chinese_num_to_int(&cap[1]).unwrap_or(0);
-            if num > 0 {
-                if let Some(node_id) = tree.find_by_structure(&num.to_string()) {
-                    hints.push((cap[0].to_string(), node_id));
-                }
-            }
-        }
-
-        // "Section X.Y.Z" or "section X.Y"
-        for cap in regex::Regex::new(r"(?i)section\s+(\d+(?:\.\d+)*)")
-            .unwrap()
-            .captures_iter(query)
-        {
-            if let Some(node_id) = tree.find_by_structure(&cap[1]) {
-                hints.push((cap[0].to_string(), node_id));
-            }
-        }
-
-        // "Chapter X"
-        for cap in regex::Regex::new(r"(?i)chapter\s+(\d+)")
-            .unwrap()
-            .captures_iter(query)
-        {
-            if let Some(node_id) = tree.find_by_structure(&cap[1]) {
-                hints.push((cap[0].to_string(), node_id));
-            }
-        }
-
-        // Bare section numbers: "3.2.1", "2.1"
-        // Use word boundary instead of lookbehind (Rust regex doesn't support lookaround)
-        for cap in regex::Regex::new(r"\b(\d+\.\d+(?:\.\d+)*)")
-            .unwrap()
-            .captures_iter(query)
-        {
-            if let Some(node_id) = tree.find_by_structure(&cap[1]) {
-                hints.push((cap[0].to_string(), node_id));
-            }
-        }
-
-        // Table/Figure references
-        for cap in regex::Regex::new(r"(?:表|(?i)table)\s*(\d+)")
-            .unwrap()
-            .captures_iter(query)
-        {
-            if let Some(node_id) = tree.find_by_structure(&format!("table {}", &cap[1])) {
-                hints.push((cap[0].to_string(), node_id));
-            }
-        }
-        for cap in regex::Regex::new(r"(?:图|(?i)figure)\s*(\d+)")
-            .unwrap()
-            .captures_iter(query)
-        {
-            if let Some(node_id) = tree.find_by_structure(&format!("figure {}", &cap[1])) {
-                hints.push((cap[0].to_string(), node_id));
-            }
-        }
-
-        // Deduplicate by NodeId
-        let mut seen = std::collections::HashSet::new();
-        hints.retain(|(_, nid)| seen.insert(*nid));
-
-        hints
-    }
-
-    /// Match target sections from ToC.
-    fn match_toc_sections(&self, query: &str, tree: &DocumentTree) -> Vec<String> {
-        if !self.enable_toc_matching {
-            return Vec::new();
-        }
-
-        let toc = self.toc_view.generate_from(tree, tree.root());
-        let query_lower = query.to_lowercase();
-
-        // Find sections that match query keywords
-        let mut matches: Vec<(String, f32)> = Vec::new();
-
-        fn collect_sections(
-            nodes: &[crate::document::TocNode],
-            query_lower: &str,
-            matches: &mut Vec<(String, f32)>,
-        ) {
-            for node in nodes {
-                let title_lower = node.title.to_lowercase();
-
-                // Calculate match score
-                let mut score = 0.0f32;
-
-                // Exact title match
-                if title_lower.contains(query_lower) {
-                    score = 1.0;
-                } else {
-                    // Partial word matches
-                    for word in query_lower.split_whitespace() {
-                        if title_lower.contains(word) {
-                            score += 0.3;
-                        }
-                    }
-                }
-
-                if score > 0.0 {
-                    matches.push((node.title.clone(), score));
-                }
-
-                // Recurse into children
-                collect_sections(&node.children, query_lower, matches);
-            }
-        }
-
-        collect_sections(&toc.children, &query_lower, &mut matches);
-
-        // Sort by score and return top sections
-        matches.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap_or(std::cmp::Ordering::Equal));
-        matches
-            .into_iter()
-            .take(5)
-            .map(|(title, _)| title)
-            .collect()
-    }
-}
-
-#[async_trait]
-impl RetrievalStage for AnalyzeStage {
-    fn name(&self) -> &'static str {
-        "analyze"
-    }
-
-    fn priority(&self) -> i32 {
-        10 // First stage
-    }
-
-    fn failure_policy(&self) -> FailurePolicy {
-        FailurePolicy::fail() // Must succeed
-    }
-
-    async fn execute(&self, ctx: &mut PipelineContext) -> crate::error::Result<StageOutcome> {
-        info!("Analyzing query: '{}'", ctx.query);
-
-        // 1. Detect complexity (LLM-based when available, heuristic fallback)
-        ctx.complexity = Some(self.complexity_detector.detect(&ctx.query).await);
-        info!("Query complexity: {:?}", ctx.complexity);
-
-        // 2. Extract keywords
-        ctx.keywords = self.extract_keywords(&ctx.query);
-        info!("Extracted keywords: {:?}", ctx.keywords);
-
-        // 3. Match target sections
-        ctx.target_sections = self.match_toc_sections(&ctx.query, &ctx.tree);
-        if !ctx.target_sections.is_empty() {
-            info!("Target sections: {:?}", ctx.target_sections);
-        }
-
-        // 3.5 Extract structural path hints
-        ctx.resolved_path_hints = self.extract_structure_hints(&ctx.query, &ctx.tree);
-        if !ctx.resolved_path_hints.is_empty() {
-            info!(
-                "Resolved {} structure hints: {:?}",
-                ctx.resolved_path_hints.len(),
-                ctx.resolved_path_hints
-                    .iter()
-                    .map(|(s, _)| s)
-                    .collect::<Vec<_>>()
-            );
-        }
-
-        // 4. Decompose query if enabled and complex enough
-        if self.enable_decomposition {
-            if let Some(ref decomposer) = self.query_decomposer {
-                let complexity_score = ctx
-                    .complexity
-                    .as_ref()
-                    .map(|c| match c {
-                        crate::retrieval::types::QueryComplexity::Simple => 0.3,
-                        crate::retrieval::types::QueryComplexity::Medium => 0.6,
-                        crate::retrieval::types::QueryComplexity::Complex => 0.9,
-                    })
-                    .unwrap_or(0.5);
-
-                if complexity_score >= self.decomposition_threshold {
-                    info!("Decomposing query (complexity: {:.2})", complexity_score);
-                    match decomposer.decompose(&ctx.query).await {
-                        Ok(result) => {
-                            if result.was_decomposed {
-                                info!(
-                                    "Query decomposed into {} sub-queries",
-                                    result.sub_queries.len()
-                                );
-                                for (i, sq) in result.sub_queries.iter().enumerate() {
-                                    info!(
-                                        "  Sub-query {}: {} (priority: {})",
-                                        i, sq.text, sq.priority
-                                    );
-                                }
-                            }
-                            ctx.decomposition = Some(result);
-                        }
-                        Err(e) => {
-                            info!(
-                                "Query decomposition failed: {}, continuing with original query",
-                                e
-                            );
-                        }
-                    }
-                }
-            }
-        }
-
-        // 5. Update metrics
-        ctx.metrics.llm_calls += 0; // No LLM calls in this stage
-
-        // 6. Record reasoning
-        let complexity_str = format!("{:?}", ctx.complexity.unwrap_or_default());
-        let mut reasoning_parts = vec![
-            format!("Query complexity: {}", complexity_str),
-            format!("Keywords: {:?}", ctx.keywords),
-        ];
-        if !ctx.target_sections.is_empty() {
-            reasoning_parts.push(format!("Target sections: {:?}", ctx.target_sections));
-        }
-        if !ctx.resolved_path_hints.is_empty() {
-            reasoning_parts.push(format!(
-                "Structure hints: {:?}",
-                ctx.resolved_path_hints
-                    .iter()
-                    .map(|(s, _)| s)
-                    .collect::<Vec<_>>()
-            ));
-        }
-        if let Some(ref decomp) = ctx.decomposition {
-            if decomp.was_decomposed {
-                reasoning_parts.push(format!(
-                    "Decomposed into {} sub-queries",
-                    decomp.sub_queries.len()
-                ));
-            }
-        }
-        ctx.record_reasoning(
-            StageName::Analyze,
-            reasoning_parts.join("; "),
-            NavigationDecision::ExploreMore,
-        );
-
-        Ok(StageOutcome::cont())
-    }
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-
-    #[test]
-    fn test_extract_keywords() {
-        let stage = AnalyzeStage::new();
-
-        let keywords = stage.extract_keywords("What is the architecture of the system?");
-        assert!(!keywords.contains(&"the".to_string()));
-        assert!(keywords.contains(&"architecture".to_string()));
-        assert!(keywords.contains(&"system".to_string()));
-    }
-
-    #[test]
-    fn test_extract_keywords_empty() {
-        let stage = AnalyzeStage::new();
-        let keywords = stage.extract_keywords("");
-        assert!(keywords.is_empty());
-    }
-
-    #[test]
-    fn test_extract_keywords_stopwords() {
-        let stage = AnalyzeStage::new();
-        let keywords = stage.extract_keywords("the a an is are was were");
-        assert!(keywords.is_empty());
-    }
-}
diff --git a/rust/src/retrieval/stages/evaluate.rs b/rust/src/retrieval/stages/evaluate.rs
deleted file mode 100644
index b008afb1..00000000
--- a/rust/src/retrieval/stages/evaluate.rs
+++ /dev/null
@@ -1,527 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Evaluate Stage - Sufficiency checking.
-//!
-//! This stage evaluates whether the collected content is sufficient
-//! to answer the query, and can trigger additional search iterations.
-
-use async_trait::async_trait;
-// Arc is used for async sharing
-use tracing::{info, warn};
-
-use crate::llm::LlmClient;
-use crate::llm::memo::MemoStore;
-use crate::retrieval::content::{ContentAggregator, ContentAggregatorConfig};
-use crate::retrieval::pipeline::{FailurePolicy, PipelineContext, RetrievalStage, StageOutcome};
-use crate::retrieval::sufficiency::{LlmJudge, SufficiencyChecker, ThresholdChecker};
-use crate::retrieval::types::{
-    NavigationDecision, RetrievalResult, RetrieveResponse, StageName, SufficiencyLevel,
-};
-use crate::utils::estimate_tokens;
-
-/// Evaluate Stage - evaluates retrieval sufficiency.
-///
-/// This stage:
-/// 1. Aggregates content from candidates
-/// 2. Checks if content is sufficient to answer the query
-/// 3. Can trigger additional search iterations if needed
-///
-/// # Content Aggregation
-///
-/// By default, uses simple content collection. For precision-focused
-/// aggregation with token budget control, use `with_content_aggregator()`.
-///
-/// # Example
-///
-/// ```rust,ignore
-/// let stage = EvaluateStage::new()
-///     .with_llm_judge(llm_client)
-///     .with_max_iterations(3)
-///     .with_content_aggregator(ContentAggregatorConfig::default());
-/// ```
-pub struct EvaluateStage {
-    threshold_checker: ThresholdChecker,
-    llm_judge: Option<LlmJudge>,
-    max_iterations: usize,
-    use_llm_judge: bool,
-    /// Optional content aggregator for precision-focused aggregation.
-    content_aggregator: Option<ContentAggregator>,
-    /// Memo store for caching LLM judgments.
-    memo_store: Option<MemoStore>,
-}
-
-impl Default for EvaluateStage {
-    fn default() -> Self {
-        Self::new()
-    }
-}
-
-impl EvaluateStage {
-    /// Create a new evaluate stage.
-    pub fn new() -> Self {
-        Self {
-            threshold_checker: ThresholdChecker::new(),
-            llm_judge: None,
-            max_iterations: 3,
-            use_llm_judge: false,
-            content_aggregator: None,
-            memo_store: None,
-        }
-    }
-
-    /// Add LLM judge for more accurate sufficiency checking.
-    pub fn with_llm_judge(mut self, client: LlmClient) -> Self {
-        let mut judge = LlmJudge::new(Box::new(client));
-        if let Some(ref store) = self.memo_store {
-            judge = judge.with_memo_store(store.clone());
-        }
-        self.llm_judge = Some(judge);
-        self.use_llm_judge = true;
-        self
-    }
-
-    /// Add memo store for caching LLM judgments.
-    pub fn with_memo_store(mut self, store: MemoStore) -> Self {
-        self.memo_store = Some(store);
-        self
-    }
-
-    /// Set maximum search iterations.
-    pub fn with_max_iterations(mut self, n: usize) -> Self {
-        self.max_iterations = n;
-        self
-    }
-
-    /// Add content aggregator for precision-focused aggregation.
-    ///
-    /// When enabled, content aggregation uses:
-    /// - Relevance scoring (keyword + BM25)
-    /// - Token budget allocation
-    /// - Hierarchical content selection
-    pub fn with_content_aggregator(mut self, config: ContentAggregatorConfig) -> Self {
-        self.content_aggregator = Some(ContentAggregator::new(config));
-        self
-    }
-
-    /// Enable content aggregator with default configuration.
-    pub fn with_default_content_aggregator(mut self) -> Self {
-        self.content_aggregator = Some(ContentAggregator::with_defaults());
-        self
-    }
-
-    /// Aggregate content from candidates.
-    ///
-    /// Populates `ctx.node_content_cache` with per-node content so that
-    /// `build_response()` can reuse it without recomputing leaf traversal.
-    fn aggregate_content(&self, ctx: &mut PipelineContext) -> (String, usize) {
-        // Use ContentAggregator if configured
-        if let Some(ref aggregator) = self.content_aggregator {
-            use crate::retrieval::content::CandidateNode;
-
-            let candidates: Vec<CandidateNode> = ctx
-                .candidates
-                .iter()
-                .map(|c| CandidateNode::new(c.node_id, c.score, c.depth))
-                .collect();
-
-            let result = aggregator.aggregate(&candidates, &ctx.tree, &ctx.query);
-            info!(
-                "ContentAggregator: {} nodes, {} tokens, avg score {:.2}",
-                result.nodes_included, result.tokens_used, result.avg_score
-            );
-            return (result.content, result.tokens_used);
-        }
-
-        // Simple content collection with per-node caching
-        self.aggregate_content_simple(ctx)
-    }
-
-    /// Simple content aggregation with per-node caching.
-    ///
-    /// Computes each candidate's content once and stores it in
-    /// `ctx.node_content_cache` for reuse by `build_response()`.
-    fn aggregate_content_simple(&self, ctx: &mut PipelineContext) -> (String, usize) {
-        let mut content_parts = Vec::new();
-        let mut total_tokens = 0;
-
-        for candidate in &ctx.candidates {
-            if let Some(node) = ctx.tree.get(candidate.node_id) {
-                // Build per-node content (own + leaf descendants)
-                let node_content = self.build_node_content(&ctx.tree, candidate.node_id);
-
-                // Cache for build_response reuse
-                ctx.node_content_cache
-                    .insert(candidate.node_id, node_content.clone());
-
-                // Add to aggregated content
-                if !node_content.is_empty() {
-                    content_parts.push(format!("## {}\n", node.title));
-                    content_parts.push(format!("{}\n\n", node_content));
-                    total_tokens += estimate_tokens(&node_content);
-                } else if !node.summary.is_empty() {
-                    content_parts.push(format!("## {}\n", node.title));
-                    content_parts.push(format!("{}\n\n", node.summary));
-                    total_tokens += estimate_tokens(&node.summary);
-                }
-            }
-        }
-
-        (content_parts.join(""), total_tokens)
-    }
-
-    /// Build content for a single node (own content + leaf descendants).
-    fn build_node_content(
-        &self,
-        tree: &crate::document::DocumentTree,
-        node_id: crate::document::NodeId,
-    ) -> String {
-        let mut parts = Vec::new();
-
-        if let Some(node) = tree.get(node_id) {
-            if !node.content.is_empty() {
-                parts.push(node.content.clone());
-            }
-        }
-
-        let leaf_content = self.collect_leaf_content(tree, node_id);
-        if !leaf_content.is_empty() {
-            parts.push(leaf_content);
-        }
-
-        parts.join("\n\n")
-    }
-
-    /// Collect content from leaf descendants of a node (excluding the node itself).
-    ///
-    /// Uses BFS (FIFO) traversal to preserve document order — the first
-    /// section in the document appears first in the output.
-    fn collect_leaf_content(
-        &self,
-        tree: &crate::document::DocumentTree,
-        node_id: crate::document::NodeId,
-    ) -> String {
-        use std::collections::VecDeque;
-
-        let mut content_parts = Vec::new();
-
-        // Start with children, not the node itself
-        let children = tree.children(node_id);
-        if children.is_empty() {
-            // Node is already a leaf, no descendants to collect
-            return String::new();
-        }
-
-        let mut queue: VecDeque<crate::document::NodeId> = children.into_iter().collect();
-
-        while let Some(current_id) = queue.pop_front() {
-            let current_children = tree.children(current_id);
-
-            if current_children.is_empty() {
-                // Leaf node - collect its content
-                if let Some(node) = tree.get(current_id) {
-                    if !node.content.is_empty() {
-                        content_parts.push(format!("### {}\n{}", node.title, node.content));
-                    }
-                }
-            } else {
-                // Non-leaf node - add children to queue (FIFO preserves order)
-                queue.extend(current_children);
-            }
-        }
-
-        content_parts.join("\n\n")
-    }
-
-    /// Check sufficiency level.
-    fn check_sufficiency(&self, ctx: &PipelineContext) -> SufficiencyLevel {
-        if !ctx.options.sufficiency_check {
-            return SufficiencyLevel::Sufficient;
-        }
-
-        // Use LLM evaluate if available and enabled
-        if self.use_llm_judge {
-            if let Some(ref evaluate) = self.llm_judge {
-                return evaluate.check(&ctx.query, &ctx.accumulated_content, ctx.token_count);
-            }
-        }
-
-        // Fall back to threshold checker
-        self.threshold_checker
-            .check(&ctx.query, &ctx.accumulated_content, ctx.token_count)
-    }
-
-    /// Build the final response.
-    ///
-    /// Reads per-node content from `ctx.node_content_cache` populated
-    /// during `aggregate_content()` — no duplicate leaf traversal.
-    fn build_response(&self, ctx: &PipelineContext) -> RetrieveResponse {
-        let mut results = Vec::new();
-
-        for candidate in &ctx.candidates {
-            if let Some(node) = ctx.tree.get(candidate.node_id) {
-                let content = if ctx.options.include_content {
-                    // Read from cache — computed once in aggregate_content()
-                    match ctx.node_content_cache.get(&candidate.node_id) {
-                        Some(cached) if !cached.is_empty() => Some(cached.clone()),
-                        _ => {
-                            // Cache miss (edge case): compute inline
-                            let built = self.build_node_content(&ctx.tree, candidate.node_id);
-                            if built.is_empty() { None } else { Some(built) }
-                        }
-                    }
-                } else {
-                    None
-                };
-
-                results.push(RetrievalResult {
-                    node_id: Some(format!("{:?}", candidate.node_id)),
-                    title: node.title.clone(),
-                    content,
-                    summary: if ctx.options.include_summaries {
-                        Some(node.summary.clone())
-                    } else {
-                        None
-                    },
-                    score: candidate.score,
-                    depth: candidate.depth,
-                    page_range: node.start_page.zip(node.end_page),
-                });
-            }
-        }
-
-        RetrieveResponse {
-            results,
-            content: ctx.accumulated_content.clone(),
-            confidence: self.calculate_confidence(ctx),
-            is_sufficient: ctx.sufficiency == SufficiencyLevel::Sufficient,
-            strategy_used: ctx
-                .selected_strategy
-                .map(|s| format!("{:?}", s))
-                .unwrap_or_else(|| "unknown".to_string()),
-            complexity: ctx.complexity.unwrap_or_default(),
-            reasoning_chain: ctx.reasoning_chain.clone(),
-            tokens_used: ctx.token_count,
-        }
-    }
-
-    /// Calculate overall confidence score.
-    fn calculate_confidence(&self, ctx: &PipelineContext) -> f32 {
-        if ctx.candidates.is_empty() {
-            return 0.0;
-        }
-
-        // Weight by score and sufficiency
-        let avg_score: f32 =
-            ctx.candidates.iter().map(|c| c.score).sum::<f32>() / ctx.candidates.len() as f32;
-
-        let sufficiency_factor = match ctx.sufficiency {
-            SufficiencyLevel::Sufficient => 1.0,
-            SufficiencyLevel::PartialSufficient => 0.7,
-            SufficiencyLevel::Insufficient => 0.4,
-        };
-
-        avg_score * sufficiency_factor
-    }
-}
-
-#[async_trait]
-impl RetrievalStage for EvaluateStage {
-    fn name(&self) -> &'static str {
-        "evaluate"
-    }
-
-    fn depends_on(&self) -> Vec<&'static str> {
-        vec!["search"]
-    }
-
-    fn priority(&self) -> i32 {
-        40 // Fourth stage
-    }
-
-    fn failure_policy(&self) -> FailurePolicy {
-        FailurePolicy::skip() // Can skip if evaluate fails
-    }
-
-    fn can_backtrack(&self) -> bool {
-        true // Can trigger backtracking to search
-    }
-
-    async fn execute(&self, ctx: &mut PipelineContext) -> crate::error::Result<StageOutcome> {
-        let start = std::time::Instant::now();
-
-        info!(
-            "Judging sufficiency: {} candidates, iteration {}",
-            ctx.candidates.len(),
-            ctx.search_iterations
-        );
-
-        // 1. Aggregate content from candidates
-        let (content, tokens) = self.aggregate_content(ctx);
-        ctx.accumulated_content = content;
-        ctx.token_count = tokens;
-
-        info!("Aggregated {} tokens", tokens);
-
-        // 2. Report token consumption to budget controller
-        ctx.budget_controller.record_tokens(tokens);
-
-        // 3. Check sufficiency
-        ctx.sufficiency = self.check_sufficiency(ctx);
-        info!("Sufficiency level: {:?}", ctx.sufficiency);
-
-        // 3.5 Detect stagnant candidates (same results as previous iteration)
-        // If candidates haven't changed, further backtracking won't help.
-        let stagnant = ctx.check_candidates_stagnant();
-        if stagnant {
-            info!(
-                "Candidates unchanged after backtrack, completing with {} candidates",
-                ctx.candidates.len()
-            );
-            ctx.result = Some(self.build_response(ctx));
-            ctx.record_reasoning(
-                StageName::Evaluate,
-                format!(
-                    "Candidates stagnant (unchanged), forced completion with {} candidates",
-                    ctx.candidates.len()
-                ),
-                NavigationDecision::Skip,
-            );
-            return Ok(StageOutcome::complete());
-        }
-
-        // Update metrics
-        ctx.metrics.evaluate_time_ms += start.elapsed().as_millis() as u64;
-        ctx.metrics.tokens_used = tokens;
-
-        // 4. Check budget status for adaptive decision
-        let budget_status = ctx.budget_controller.status();
-        let confidence = self.calculate_confidence(ctx);
-
-        // If budget is exhausted, force completion regardless of sufficiency
-        if budget_status.should_stop() && ctx.search_iterations >= 1 {
-            info!(
-                "Budget exhausted ({}/{}), completing with current results",
-                ctx.budget_controller.consumed(),
-                ctx.budget_controller.total_budget(),
-            );
-            ctx.result = Some(self.build_response(ctx));
-            ctx.record_reasoning(
-                StageName::Evaluate,
-                format!(
-                    "Budget exhausted ({}/{}), forced completion; confidence={:.3}",
-                    ctx.budget_controller.consumed(),
-                    ctx.budget_controller.total_budget(),
-                    confidence,
-                ),
-                NavigationDecision::Skip,
-            );
-            return Ok(StageOutcome::complete());
-        }
-
-        // 2.5 Record successful navigation paths to L2 cache
-        if confidence > 0.5 {
-            let doc_key = format!("{:?}", ctx.tree.root());
-            for candidate in ctx.candidates.iter().take(3) {
-                if let Some(node) = ctx.tree.get(candidate.node_id) {
-                    let _path = format!("{}", node.depth);
-                    // Use the node title as path identifier for L2
-                    ctx.reasoning_cache
-                        .l2_record(&doc_key, &node.title, candidate.score);
-                }
-            }
-        }
-
-        // 3. Decide next action based on sufficiency
-        let outcome = match ctx.sufficiency {
-            SufficiencyLevel::Sufficient => {
-                info!("Content is sufficient, completing retrieval");
-                ctx.result = Some(self.build_response(ctx));
-                StageOutcome::complete()
-            }
-            SufficiencyLevel::PartialSufficient => {
-                // Can return current results or continue
-                if ctx.search_iterations >= self.max_iterations {
-                    info!(
-                        "Partial sufficient but max iterations reached, completing with {} candidates",
-                        ctx.candidates.len()
-                    );
-                    ctx.result = Some(self.build_response(ctx));
-                    StageOutcome::complete()
-                } else {
-                    // Continue searching with small beam increase
-                    info!("Partial sufficient, requesting one more search iteration");
-                    StageOutcome::need_more(1, false)
-                }
-            }
-            SufficiencyLevel::Insufficient => {
-                if ctx.search_iterations >= self.max_iterations {
-                    warn!(
-                        "Insufficient but max iterations reached, returning {} candidates",
-                        ctx.candidates.len()
-                    );
-                    ctx.result = Some(self.build_response(ctx));
-                    StageOutcome::complete()
-                } else {
-                    // Need more data - increase beam and go deeper
-                    info!("Insufficient content, requesting more search with larger beam");
-                    StageOutcome::need_more(2, true)
-                }
-            }
-        };
-
-        // Update LLM call count if we used LLM evaluate
-        if self.use_llm_judge && self.llm_judge.is_some() {
-            ctx.metrics.llm_calls += 1;
-        }
-
-        // Record evaluation reasoning with budget status
-        let sufficiency_str = format!("{:?}", ctx.sufficiency);
-        let decision = match ctx.sufficiency {
-            SufficiencyLevel::Sufficient => NavigationDecision::ThisIsTheAnswer,
-            SufficiencyLevel::PartialSufficient => NavigationDecision::ExploreMore,
-            SufficiencyLevel::Insufficient => NavigationDecision::ExploreMore,
-        };
-        ctx.record_reasoning(
-            StageName::Evaluate,
-            format!(
-                "Sufficiency={}, confidence={:.3}, tokens={}, candidates={}, iteration={}, budget={:?} ({}/{})",
-                sufficiency_str,
-                self.calculate_confidence(ctx),
-                ctx.token_count,
-                ctx.candidates.len(),
-                ctx.search_iterations,
-                budget_status,
-                ctx.budget_controller.consumed(),
-                ctx.budget_controller.total_budget(),
-            ),
-            decision,
-        );
-
-        Ok(outcome)
-    }
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-
-    #[test]
-    fn test_evaluate_stage_creation() {
-        let stage = EvaluateStage::new();
-        assert!(stage.llm_judge.is_none());
-        assert!(!stage.use_llm_judge);
-    }
-
-    #[test]
-    fn test_evaluate_stage_dependencies() {
-        let stage = EvaluateStage::new();
-        assert_eq!(stage.depends_on(), vec!["search"]);
-    }
-
-    #[test]
-    fn test_evaluate_can_backtrack() {
-        let stage = EvaluateStage::new();
-        assert!(stage.can_backtrack());
-    }
-}
diff --git a/rust/src/retrieval/stages/mod.rs b/rust/src/retrieval/stages/mod.rs
deleted file mode 100644
index e6cd13b4..00000000
--- a/rust/src/retrieval/stages/mod.rs
+++ /dev/null
@@ -1,33 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Built-in retrieval pipeline stages.
-//!
-//! This module provides the four core stages for retrieval:
-//!
-//! - [`AnalyzeStage`] - Query analysis (complexity, keywords, target sections)
-//! - [`PlanStage`] - Strategy and algorithm selection
-//! - [`SearchStage`] - Execute tree search
-//! - [`EvaluateStage`] - Sufficiency checking
-//!
-//! # Stage Flow
-//!
-//! ```text
-//! Analyze → Plan → Search → Evaluate
-//!                    ↑         │
-//!                    └─────────┘ (NeedMoreData)
-//! ```
-//!
-//! # Custom Stages
-//!
-//! Implement [`RetrievalStage`](crate::retrieval::pipeline::RetrievalStage) to create custom stages.
-
-mod analyze;
-mod evaluate;
-mod plan;
-mod search;
-
-pub use analyze::AnalyzeStage;
-pub use evaluate::EvaluateStage;
-pub use plan::PlanStage;
-pub use search::SearchStage;
diff --git a/rust/src/retrieval/stages/plan.rs b/rust/src/retrieval/stages/plan.rs
deleted file mode 100644
index 4442551c..00000000
--- a/rust/src/retrieval/stages/plan.rs
+++ /dev/null
@@ -1,261 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Plan Stage - Strategy and algorithm selection.
-//!
-//! This stage selects:
-//! - Retrieval strategy (Keyword/Semantic/LLM)
-//! - Search algorithm (PurePilot/Beam/MCTS)
-//! - Search configuration
-
-use async_trait::async_trait;
-use std::sync::Arc;
-use tracing::info;
-
-// DocumentTree is accessed via context
-use crate::llm::LlmClient;
-use crate::retrieval::pipeline::{
-    BudgetStatus, FailurePolicy, PipelineContext, RetrievalStage, SearchAlgorithm, SearchConfig,
-    StageOutcome,
-};
-use crate::retrieval::types::{NavigationDecision, QueryComplexity, StageName, StrategyPreference};
-
-/// Plan Stage - plans the retrieval strategy.
-///
-/// This stage:
-/// 1. Selects the appropriate retrieval strategy based on complexity
-/// 2. Chooses the search algorithm
-/// 3. Configures search parameters
-///
-/// # Example
-///
-/// ```rust,ignore
-/// let stage = PlanStage::new()
-///     .with_llm_client(llm_client);
-/// ```
-pub struct PlanStage {
-    llm_client: Option<Arc<LlmClient>>,
-}
-
-impl Default for PlanStage {
-    fn default() -> Self {
-        Self::new()
-    }
-}
-
-impl PlanStage {
-    /// Create a new plan stage.
-    pub fn new() -> Self {
-        Self { llm_client: None }
-    }
-
-    /// Set LLM client for complex planning.
-    pub fn with_llm_client(mut self, client: LlmClient) -> Self {
-        self.llm_client = Some(Arc::new(client));
-        self
-    }
-
-    /// Select retrieval strategy based on complexity, preferences, and budget.
-    fn select_strategy(&self, ctx: &PipelineContext) -> StrategyPreference {
-        // Respect explicit strategy preference
-        if ctx.options.strategy != StrategyPreference::Auto {
-            info!("Using explicit strategy: {:?}", ctx.options.strategy);
-            return ctx.options.strategy;
-        }
-
-        // Budget-aware strategy selection
-        let budget_status = ctx.budget_controller.status();
-        if budget_status.should_stop() {
-            info!("Budget exhausted, forcing Keyword strategy");
-            return StrategyPreference::ForceKeyword;
-        }
-
-        // Auto-select based on complexity
-        let complexity = ctx.complexity.unwrap_or(QueryComplexity::Medium);
-
-        let strategy = match complexity {
-            QueryComplexity::Simple => {
-                info!("Complexity is Simple, selecting Keyword strategy");
-                StrategyPreference::ForceKeyword
-            }
-            QueryComplexity::Medium => {
-                if budget_status == BudgetStatus::Constrained {
-                    info!(
-                        "Complexity is Medium but budget constrained, selecting Keyword strategy"
-                    );
-                    StrategyPreference::ForceKeyword
-                } else if self.llm_client.is_some() {
-                    info!("Complexity is Medium, selecting LLM strategy");
-                    StrategyPreference::ForceLlm
-                } else {
-                    info!("Complexity is Medium, no LLM, selecting Keyword strategy");
-                    StrategyPreference::ForceKeyword
-                }
-            }
-            QueryComplexity::Complex => {
-                if budget_status == BudgetStatus::Constrained {
-                    info!(
-                        "Complexity is Complex but budget constrained, selecting Hybrid strategy"
-                    );
-                    if self.llm_client.is_some() {
-                        StrategyPreference::ForceHybrid
-                    } else {
-                        StrategyPreference::ForceKeyword
-                    }
-                } else if self.llm_client.is_some() {
-                    info!("Complexity is Complex, selecting LLM strategy");
-                    StrategyPreference::ForceLlm
-                } else {
-                    info!("Complexity is Complex, no LLM, selecting Keyword strategy");
-                    StrategyPreference::ForceKeyword
-                }
-            }
-        };
-
-        strategy
-    }
-
-    /// Select search algorithm based on complexity and options.
-    fn select_algorithm(&self, ctx: &PipelineContext) -> SearchAlgorithm {
-        let complexity = ctx.complexity.unwrap_or(QueryComplexity::Medium);
-
-        let algorithm = match complexity {
-            QueryComplexity::Simple => {
-                // Simple queries: PurePilot (beam=1, fast)
-                SearchAlgorithm::PurePilot
-            }
-            QueryComplexity::Medium => {
-                // Medium queries: Beam search
-                SearchAlgorithm::Beam
-            }
-            QueryComplexity::Complex => {
-                // Complex queries: MCTS for thorough exploration
-                SearchAlgorithm::Mcts
-            }
-        };
-
-        info!("Selected search algorithm: {:?}", algorithm);
-        algorithm
-    }
-
-    /// Build search configuration from options and complexity.
-    fn build_search_config(&self, ctx: &PipelineContext) -> SearchConfig {
-        let complexity = ctx.complexity.unwrap_or(QueryComplexity::Medium);
-
-        let (beam_width, max_depth) = match complexity {
-            QueryComplexity::Simple => (1, 5), // PurePilot-like
-            QueryComplexity::Medium => (ctx.options.beam_width, 10),
-            QueryComplexity::Complex => (ctx.options.beam_width + 2, 15),
-        };
-
-        SearchConfig {
-            beam_width,
-            max_depth,
-            min_score: ctx.options.min_score,
-            max_iterations: ctx.options.max_iterations,
-        }
-    }
-}
-
-#[async_trait]
-impl RetrievalStage for PlanStage {
-    fn name(&self) -> &'static str {
-        "plan"
-    }
-
-    fn depends_on(&self) -> Vec<&'static str> {
-        vec!["analyze"]
-    }
-
-    fn priority(&self) -> i32 {
-        20 // Second stage
-    }
-
-    fn failure_policy(&self) -> FailurePolicy {
-        FailurePolicy::fail() // Must succeed
-    }
-
-    async fn execute(&self, ctx: &mut PipelineContext) -> crate::error::Result<StageOutcome> {
-        info!("Planning retrieval strategy");
-
-        // 1. Select strategy
-        ctx.selected_strategy = Some(self.select_strategy(ctx));
-
-        // 2. Select algorithm
-        ctx.selected_algorithm = Some(self.select_algorithm(ctx));
-
-        // 3. Build search config
-        ctx.search_config = Some(self.build_search_config(ctx));
-
-        // 4. Build fallback chain: primary algorithm first, then alternatives
-        //    The chain determines which algorithms to try if the primary
-        //    doesn't produce results above min_score.
-        let primary = ctx.selected_algorithm.unwrap_or(SearchAlgorithm::Beam);
-        let mut chain = vec![primary];
-        for name in &ctx.options.fallback_chain {
-            if let Some(algo) = SearchAlgorithm::from_name(name) {
-                if algo != primary {
-                    chain.push(algo);
-                }
-            }
-        }
-        ctx.search_fallback_chain = chain;
-
-        info!(
-            "Plan complete: strategy={:?}, algorithm={:?}, beam_width={}",
-            ctx.selected_strategy,
-            ctx.selected_algorithm,
-            ctx.search_config
-                .as_ref()
-                .map(|c| c.beam_width)
-                .unwrap_or(0)
-        );
-
-        // Record reasoning
-        let strategy_str = ctx
-            .selected_strategy
-            .map(|s| format!("{:?}", s))
-            .unwrap_or_else(|| "auto".to_string());
-        let algorithm_str = ctx
-            .selected_algorithm
-            .map(|a| a.name().to_string())
-            .unwrap_or_else(|| "unknown".to_string());
-        let beam_width = ctx
-            .search_config
-            .as_ref()
-            .map(|c| c.beam_width)
-            .unwrap_or(3);
-        ctx.record_reasoning(
-            StageName::Plan,
-            format!(
-                "Selected strategy={}, algorithm={}, beam_width={}; budget: {}/{} ({:.0}%)",
-                strategy_str,
-                algorithm_str,
-                beam_width,
-                ctx.budget_controller.consumed(),
-                ctx.budget_controller.total_budget(),
-                ctx.budget_controller.utilization() * 100.0
-            ),
-            NavigationDecision::ExploreMore,
-        );
-
-        Ok(StageOutcome::cont())
-    }
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-
-    #[test]
-    fn test_plan_stage_creation() {
-        let stage = PlanStage::new();
-        assert!(stage.llm_client.is_none());
-    }
-
-    #[test]
-    fn test_plan_stage_dependencies() {
-        let stage = PlanStage::new();
-        assert_eq!(stage.depends_on(), vec!["analyze"]);
-    }
-}
diff --git a/rust/src/retrieval/stages/search.rs b/rust/src/retrieval/stages/search.rs
deleted file mode 100644
index 2e90a57e..00000000
--- a/rust/src/retrieval/stages/search.rs
+++ /dev/null
@@ -1,961 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Search Stage - Execute tree search with Pilot integration.
-//!
-//! This stage executes the selected search algorithm using
-//! hierarchical ToC-based location followed by tree traversal.
-//! When a Pilot is provided, it can provide semantic guidance
-//! at key decision points.
-
-use async_trait::async_trait;
-use std::sync::Arc;
-use tracing::{debug, info, warn};
-
-use crate::document::DocumentTree;
-use crate::document::ReasoningIndex;
-use crate::llm::LlmClient;
-use crate::retrieval::RetrievalContext;
-use crate::retrieval::cache::CachedCandidate;
-use crate::retrieval::pilot::Pilot;
-use crate::retrieval::pipeline::{
-    CandidateNode, FailurePolicy, PipelineContext, RetrievalStage, SearchAlgorithm, StageOutcome,
-};
-use crate::retrieval::scoring::extract_keywords;
-use crate::retrieval::search::{
-    BeamSearch, MctsSearch, PurePilotSearch, SearchConfig as SearchAlgConfig, SearchCue,
-    SearchTree, ToCNavigator,
-};
-use crate::retrieval::strategy::{
-    CrossDocumentConfig, CrossDocumentStrategy, HybridConfig, HybridStrategy, KeywordStrategy,
-    LlmStrategy, RetrievalStrategy,
-};
-use crate::retrieval::types::{
-    NavigationDecision, ReasoningCandidate, ReasoningStep, StageName, StrategyPreference,
-};
-
-/// Search Stage - executes tree search with optional Pilot guidance.
-///
-/// This stage:
-/// 1. Uses ToCNavigator to locate relevant subtrees (Phase Locate)
-/// 2. Resolves queries (original or decomposed sub-queries)
-/// 3. Runs search algorithms from located subtrees (Phase Traverse)
-/// 4. Collects and deduplicates candidates (Phase Collect)
-///
-/// # Pilot Integration
-///
-/// When a Pilot is provided via [`with_pilot`], the search algorithm
-/// can consult it at key decision points for semantic guidance.
-/// Without a Pilot, the search uses pure algorithm scoring.
-pub struct SearchStage {
-    keyword_strategy: KeywordStrategy,
-    llm_strategy: Option<Arc<LlmStrategy>>,
-    hybrid_strategy: Option<Arc<dyn RetrievalStrategy>>,
-    /// Pilot for navigation guidance (optional).
-    pilot: Option<Arc<dyn Pilot>>,
-    /// LLM client for ToC-based location (optional).
-    llm_client: Option<LlmClient>,
-    /// ToC navigator for hierarchical subtree location.
-    toc_navigator: ToCNavigator,
-}
-
-impl Default for SearchStage {
-    fn default() -> Self {
-        Self::new()
-    }
-}
-
-impl SearchStage {
-    /// Create a new search stage without Pilot.
-    pub fn new() -> Self {
-        Self {
-            keyword_strategy: KeywordStrategy::new(),
-            llm_strategy: None,
-            hybrid_strategy: None,
-            pilot: None,
-            llm_client: None,
-            toc_navigator: ToCNavigator::new(),
-        }
-    }
-
-    /// Add LLM client for ToC-based search.
-    pub fn with_llm_client(mut self, client: Option<LlmClient>) -> Self {
-        if let Some(ref client) = client {
-            self.toc_navigator = ToCNavigator::new().with_llm_client(client.clone());
-        }
-        self.llm_client = client;
-        self
-    }
-
-    /// Add Pilot for semantic navigation guidance.
-    pub fn with_pilot(mut self, pilot: Arc<dyn Pilot>) -> Self {
-        self.pilot = Some(pilot);
-        self
-    }
-
-    /// Add LLM strategy for complex queries.
-    pub fn with_llm_strategy(mut self, strategy: LlmStrategy) -> Self {
-        self.llm_strategy = Some(Arc::new(strategy));
-        self
-    }
-
-    /// Add hybrid strategy (BM25 + LLM refinement).
-    pub fn with_hybrid_strategy(mut self, strategy: Arc<dyn RetrievalStrategy>) -> Self {
-        self.hybrid_strategy = Some(strategy);
-        self
-    }
-
-    /// Configure hybrid strategy with custom config using the LLM strategy.
-    pub fn with_hybrid_config(mut self, config: HybridConfig) -> Self {
-        if let Some(ref llm) = self.llm_strategy {
-            let llm_boxed: Box<dyn RetrievalStrategy> = Box::new((**llm).clone());
-            self.hybrid_strategy =
-                Some(Arc::new(HybridStrategy::new(llm_boxed).with_config(config)));
-        }
-        self
-    }
-
-    /// Check if Pilot is available and active.
-    pub fn has_pilot(&self) -> bool {
-        self.pilot.as_ref().map(|p| p.is_active()).unwrap_or(false)
-    }
-
-    /// Get the strategy to use based on context.
-    fn get_strategy(&self, ctx: &PipelineContext) -> Arc<dyn RetrievalStrategy> {
-        let preference = ctx.selected_strategy.unwrap_or(StrategyPreference::Auto);
-
-        match preference {
-            StrategyPreference::ForceKeyword => {
-                info!("Using Keyword strategy");
-                Arc::new(self.keyword_strategy.clone())
-            }
-            StrategyPreference::ForceLlm => {
-                if let Some(ref strategy) = self.llm_strategy {
-                    info!("Using LLM strategy");
-                    strategy.clone()
-                } else {
-                    warn!("LLM strategy requested but not available, falling back to Keyword");
-                    Arc::new(self.keyword_strategy.clone())
-                }
-            }
-            StrategyPreference::ForceHybrid => {
-                if let Some(ref strategy) = self.hybrid_strategy {
-                    info!("Using Hybrid strategy");
-                    strategy.clone()
-                } else if let Some(ref llm) = self.llm_strategy {
-                    info!("Using Hybrid strategy (auto-created from LLM)");
-                    let llm_boxed: Box<dyn RetrievalStrategy> = Box::new((**llm).clone());
-                    Arc::new(HybridStrategy::new(llm_boxed))
-                } else {
-                    warn!(
-                        "Hybrid strategy requested but no LLM available, falling back to Keyword"
-                    );
-                    Arc::new(self.keyword_strategy.clone())
-                }
-            }
-            StrategyPreference::ForceCrossDocument => {
-                // Build a CrossDocumentStrategy with graph-based boosting
-                let inner: Box<dyn RetrievalStrategy> = Box::new(self.keyword_strategy.clone());
-
-                let cross_doc =
-                    CrossDocumentStrategy::new(inner).with_config(CrossDocumentConfig::default());
-
-                // Attach graph for GraphBoosted merge if available.
-                // Multi-document trees are collected at the orchestrator level.
-                let cross_doc = if let Some(ref graph) = ctx.document_graph {
-                    cross_doc.with_graph(graph.clone())
-                } else {
-                    cross_doc
-                };
-
-                info!(
-                    "Using CrossDocument strategy (graph={})",
-                    ctx.document_graph.is_some()
-                );
-                Arc::new(cross_doc)
-            }
-            StrategyPreference::ForcePageRange => {
-                if let Some(ref strategy) = self.hybrid_strategy {
-                    info!("Using Hybrid strategy as fallback for ForcePageRange");
-                    strategy.clone()
-                } else {
-                    warn!("ForcePageRange requires special configuration, falling back to Keyword");
-                    Arc::new(self.keyword_strategy.clone())
-                }
-            }
-            StrategyPreference::Auto => Arc::new(self.keyword_strategy.clone()),
-        }
-    }
-
-    /// Extract candidates from search paths.
-    fn extract_candidates(
-        &self,
-        paths: &[crate::retrieval::types::SearchPath],
-        tree: &DocumentTree,
-    ) -> Vec<CandidateNode> {
-        let mut candidates = Vec::new();
-
-        for path in paths {
-            if let Some(leaf_id) = path.leaf {
-                if let Some(node) = tree.get(leaf_id) {
-                    let depth = node.depth;
-                    let is_leaf = tree.is_leaf(leaf_id);
-                    candidates.push(CandidateNode::new(leaf_id, path.score, depth, is_leaf));
-                }
-            }
-        }
-
-        candidates.sort_by(|a, b| {
-            b.score
-                .partial_cmp(&a.score)
-                .unwrap_or(std::cmp::Ordering::Equal)
-        });
-
-        candidates
-    }
-
-    /// Resolve the list of queries to search for.
-    ///
-    /// If decomposition produced multi-turn sub-queries, returns them in
-    /// execution order. Otherwise returns the original query.
-    fn resolve_queries(ctx: &PipelineContext) -> Vec<String> {
-        if let Some(ref decomp) = ctx.decomposition {
-            if decomp.was_decomposed && decomp.is_multi_turn() {
-                return decomp
-                    .execution_order()
-                    .iter()
-                    .map(|&i| decomp.sub_queries[i].text.clone())
-                    .collect();
-            }
-        }
-        vec![ctx.query.clone()]
-    }
-
-    /// Run search across the fallback chain.
-    ///
-    /// Iterates through algorithms in the fallback chain. After each algorithm,
-    /// checks if the best candidate score meets `min_score`. If sufficient,
-    /// returns early. Otherwise tries the next algorithm in the chain.
-    async fn run_search(
-        &self,
-        ctx: &mut PipelineContext,
-        queries: &[String],
-        cues: &[SearchCue],
-    ) -> (Vec<crate::retrieval::types::SearchPath>, Vec<CandidateNode>) {
-        let config = ctx.search_config.clone().unwrap_or_default();
-        let min_score = config.min_score;
-
-        // Build fallback chain: primary algorithm first, then remaining from chain
-        let primary = ctx.selected_algorithm.unwrap_or(SearchAlgorithm::Beam);
-        let chain = &ctx.search_fallback_chain;
-
-        // Build ordered algorithm list: primary first, then chain (excluding primary)
-        let mut algorithms = vec![primary];
-        for &algo in chain {
-            if algo != primary {
-                algorithms.push(algo);
-            }
-        }
-
-        info!(
-            "Search fallback chain: {:?} (min_score={:.2})",
-            algorithms.iter().map(|a| a.name()).collect::<Vec<_>>(),
-            min_score
-        );
-
-        let mut best_paths = Vec::new();
-        let mut best_candidates = Vec::new();
-        let mut total_pilot_interventions = 0u64;
-
-        for (idx, &algorithm) in algorithms.iter().enumerate() {
-            let (paths, candidates) = self
-                .run_single_algorithm(ctx, queries, cues, algorithm)
-                .await;
-
-            // Accumulate pilot interventions
-            total_pilot_interventions += paths.len() as u64; // approximate
-
-            // Merge results: collect all paths and candidates across fallback rounds
-            best_paths.extend(paths);
-            best_candidates.extend(candidates);
-
-            // Check if best candidate meets the threshold
-            let best_score = best_candidates
-                .iter()
-                .map(|c| c.score)
-                .fold(0.0f32, f32::max);
-
-            if best_score >= min_score {
-                info!(
-                    "Algorithm {} (#{}) sufficient: best_score={:.3} >= min_score={:.3}",
-                    algorithm.name(),
-                    idx + 1,
-                    best_score,
-                    min_score
-                );
-                break;
-            }
-
-            info!(
-                "Algorithm {} (#{}) insufficient: best_score={:.3} < min_score={:.3}, trying next",
-                algorithm.name(),
-                idx + 1,
-                best_score,
-                min_score
-            );
-        }
-
-        // Deduplicate candidates by node_id, keeping highest score
-        best_candidates.sort_by(|a, b| {
-            b.score
-                .partial_cmp(&a.score)
-                .unwrap_or(std::cmp::Ordering::Equal)
-        });
-        best_candidates.dedup_by(|a, b| a.node_id == b.node_id);
-
-        info!(
-            "Search complete: {} paths, {} candidates (pilot interventions: {})",
-            best_paths.len(),
-            best_candidates.len(),
-            total_pilot_interventions
-        );
-
-        (best_paths, best_candidates)
-    }
-
-    /// Run a single search algorithm across all queries and cues.
-    async fn run_single_algorithm(
-        &self,
-        ctx: &mut PipelineContext,
-        queries: &[String],
-        cues: &[SearchCue],
-        algorithm: SearchAlgorithm,
-    ) -> (Vec<crate::retrieval::types::SearchPath>, Vec<CandidateNode>) {
-        let config = ctx.search_config.clone().unwrap_or_default();
-
-        let search_config = SearchAlgConfig {
-            top_k: config.beam_width * 2,
-            beam_width: config.beam_width,
-            max_iterations: config.max_iterations,
-            min_score: config.min_score,
-            leaf_only: false,
-            max_backtracks: config.beam_width,
-            fallback_score_ratio: 0.5,
-        };
-
-        let pilot_ref: Option<&dyn Pilot> = self.pilot.as_deref();
-
-        let mut all_paths = Vec::new();
-
-        for query in queries {
-            let legacy_ctx =
-                RetrievalContext::new(query, ctx.options.max_tokens, ctx.options.sufficiency_check);
-
-            for cue in cues {
-                debug!(
-                    "Searching: algorithm={}, query='{}', cue.root={:?}, cue.confidence={:.3}",
-                    algorithm.name(),
-                    query,
-                    cue.root,
-                    cue.confidence
-                );
-
-                let result = match algorithm {
-                    SearchAlgorithm::PurePilot => {
-                        PurePilotSearch::new()
-                            .search_from(
-                                &ctx.tree,
-                                &legacy_ctx,
-                                &search_config,
-                                pilot_ref,
-                                cue.root,
-                            )
-                            .await
-                    }
-                    SearchAlgorithm::Beam => {
-                        BeamSearch::new()
-                            .search_from(
-                                &ctx.tree,
-                                &legacy_ctx,
-                                &search_config,
-                                pilot_ref,
-                                cue.root,
-                            )
-                            .await
-                    }
-                    SearchAlgorithm::Mcts => {
-                        MctsSearch::new()
-                            .search_from(
-                                &ctx.tree,
-                                &legacy_ctx,
-                                &search_config,
-                                pilot_ref,
-                                cue.root,
-                            )
-                            .await
-                    }
-                };
-
-                all_paths.extend(result.paths);
-            }
-        }
-
-        let candidates = self.extract_candidates(&all_paths, &ctx.tree);
-        (all_paths, candidates)
-    }
-
-    /// Check if a query is asking for a document summary/overview.
-    fn is_summary_query(query: &str) -> bool {
-        let lower = query.to_lowercase();
-
-        // Direct keyword matches
-        let patterns = [
-            "summarize",
-            "summary",
-            "overview",
-            "give me an overview",
-            "describe this document",
-            "main topics",
-            "table of contents",
-            "这篇文档讲了什么",
-            "总结",
-            "概述",
-            "概要",
-            "主要内容",
-            "文档简介",
-            "介绍一下",
-        ];
-        if patterns.iter().any(|p| lower.contains(p)) {
-            return true;
-        }
-
-        // Phrase patterns — match with intervening words removed.
-        // "what is this project about" → remove common filler words, check for "what is this about"
-        let filler_words = [
-            "project", "document", "file", "paper", "article", "text", "book", "the", "a", "an",
-        ];
-        let cleaned: String = lower
-            .split_whitespace()
-            .filter(|w| !filler_words.contains(w))
-            .collect::<Vec<_>>()
-            .join(" ");
-
-        let phrase_patterns = [
-            "what is this about",
-            "what is this document",
-            "what is this about",
-            "what does this mean",
-            "tell me about this",
-            "what is the main idea",
-            "what are the key points",
-            "what is the purpose",
-        ];
-        phrase_patterns.iter().any(|p| cleaned.contains(p))
-    }
-
-    /// Try to match the query against pre-computed reasoning index entries.
-    ///
-    /// Returns candidates if a high-confidence match is found, None otherwise.
-    fn try_reasoning_shortcut(
-        ridx: &ReasoningIndex,
-        ctx: &PipelineContext,
-    ) -> Option<Vec<CandidateNode>> {
-        // Check 1: Summary shortcut — handle "overview" style queries
-        if let Some(ref shortcut) = ridx.summary_shortcut() {
-            if Self::is_summary_query(&ctx.query) {
-                // For summary queries, return all top-level sections as candidates.
-                // Don't include the root node itself — it has no direct content,
-                // only descendant leaf content which is already covered by sections.
-                let candidates: Vec<CandidateNode> = shortcut
-                    .section_summaries
-                    .iter()
-                    .map(|section| {
-                        CandidateNode::new(
-                            section.node_id,
-                            1.0,
-                            section.depth,
-                            ctx.tree.is_leaf(section.node_id),
-                        )
-                    })
-                    .collect();
-
-                if !candidates.is_empty() {
-                    return Some(candidates);
-                }
-
-                // Fallback: if no sections, use root node
-                return Some(vec![CandidateNode::new(
-                    shortcut.root_node,
-                    1.0,
-                    0,
-                    ctx.tree.is_leaf(shortcut.root_node),
-                )]);
-            }
-        }
-
-        // Check 2: Keyword → Topic path matching
-        let keywords = extract_keywords(&ctx.query);
-        if keywords.is_empty() {
-            return None;
-        }
-
-        let mut scored_nodes: std::collections::HashMap<crate::document::NodeId, f32> =
-            std::collections::HashMap::new();
-        for keyword in &keywords {
-            if let Some(entries) = ridx.topic_entries(keyword) {
-                for entry in entries {
-                    let score = scored_nodes.entry(entry.node_id).or_insert(0.0);
-                    *score += entry.weight;
-                }
-            }
-        }
-
-        if scored_nodes.is_empty() {
-            return None;
-        }
-
-        // Boost hot nodes by 20%
-        for (node_id, score) in scored_nodes.iter_mut() {
-            if ridx.is_hot(*node_id) {
-                *score *= 1.2;
-            }
-        }
-
-        // Convert to candidates, only return if best match is high-confidence
-        let mut candidates: Vec<CandidateNode> = scored_nodes
-            .into_iter()
-            .filter_map(|(node_id, score)| {
-                let depth = ctx.tree.get(node_id).map(|n| n.depth)?;
-                Some(CandidateNode::new(
-                    node_id,
-                    score,
-                    depth,
-                    ctx.tree.is_leaf(node_id),
-                ))
-            })
-            .collect();
-
-        candidates.sort_by(|a, b| {
-            b.score
-                .partial_cmp(&a.score)
-                .unwrap_or(std::cmp::Ordering::Equal)
-        });
-
-        // Only return shortcut results if we have a high-confidence match
-        let best_score = candidates.first().map(|c| c.score).unwrap_or(0.0);
-        if best_score > 0.5 {
-            Some(candidates)
-        } else {
-            None
-        }
-    }
-}
-
-#[async_trait]
-impl RetrievalStage for SearchStage {
-    fn name(&self) -> &str {
-        "search"
-    }
-
-    fn depends_on(&self) -> Vec<&'static str> {
-        vec!["plan"]
-    }
-
-    fn priority(&self) -> i32 {
-        30
-    }
-
-    fn failure_policy(&self) -> FailurePolicy {
-        FailurePolicy::retry()
-    }
-
-    fn can_backtrack(&self) -> bool {
-        true
-    }
-
-    async fn execute(&self, ctx: &mut PipelineContext) -> crate::error::Result<StageOutcome> {
-        let start = std::time::Instant::now();
-
-        let algorithm = ctx.selected_algorithm.unwrap_or(SearchAlgorithm::Beam);
-        let config = ctx.search_config.clone().unwrap_or_default();
-
-        // Budget check: skip search iteration if exhausted
-        let budget_status = ctx.budget_controller.status();
-        if budget_status.should_stop() && ctx.search_iterations > 0 {
-            info!(
-                "Budget exhausted ({}/{}), skipping search iteration",
-                ctx.budget_controller.consumed(),
-                ctx.budget_controller.total_budget(),
-            );
-            ctx.record_reasoning(
-                StageName::Search,
-                format!(
-                    "Budget exhausted ({}/{}), returning current candidates",
-                    ctx.budget_controller.consumed(),
-                    ctx.budget_controller.total_budget(),
-                ),
-                NavigationDecision::Skip,
-            );
-            return Ok(StageOutcome::complete());
-        }
-
-        // Reset Pilot state for new query
-        if let Some(ref pilot) = self.pilot {
-            pilot.reset();
-            debug!(
-                "SearchStage: Pilot is available, is_active={}",
-                pilot.is_active()
-            );
-        }
-
-        // Apply budget-aware beam width adjustment
-        let effective_beam = ctx
-            .budget_controller
-            .suggested_beam_width(config.beam_width, ctx.search_iterations);
-
-        info!(
-            "Executing search: algorithm={:?}, beam_width={} (budget: {:?}), pilot={}",
-            algorithm,
-            effective_beam,
-            budget_status,
-            if self.has_pilot() {
-                "enabled"
-            } else {
-                "disabled"
-            }
-        );
-
-        ctx.increment_search_iteration();
-
-        // === L1 Cache check: return cached results if available ===
-        if ctx.options.enable_cache && ctx.search_iterations <= 1 {
-            let scope_fp =
-                crate::utils::fingerprint::Fingerprint::from_str(&format!("{:?}", ctx.tree.root()));
-            if let Some(cached) = ctx.reasoning_cache.l1_get(&ctx.query, &scope_fp) {
-                info!(
-                    "L1 cache hit for query, returning {} cached candidates",
-                    cached.len()
-                );
-                ctx.candidates = cached
-                    .into_iter()
-                    .map(|c| {
-                        CandidateNode::new(c.node_id, c.score, c.depth, ctx.tree.is_leaf(c.node_id))
-                    })
-                    .collect();
-                ctx.metrics.cache_hits += 1;
-                ctx.record_reasoning(
-                    StageName::Search,
-                    format!(
-                        "L1 cache hit: {} candidates returned from cache",
-                        ctx.candidates.len()
-                    ),
-                    NavigationDecision::ThisIsTheAnswer,
-                );
-                return Ok(StageOutcome::cont());
-            }
-            ctx.metrics.cache_misses += 1;
-        }
-
-        // === Reasoning Index Quick Match ===
-        // Check pre-computed index before running expensive ToC navigation.
-        if let Some(ref ridx) = ctx.reasoning_index {
-            if let Some(shortcut_candidates) = Self::try_reasoning_shortcut(ridx, ctx) {
-                info!(
-                    "Reasoning index shortcut match, returning {} candidates",
-                    shortcut_candidates.len()
-                );
-                ctx.candidates = shortcut_candidates;
-                ctx.metrics.cache_hits += 1;
-                ctx.record_reasoning(
-                    StageName::Search,
-                    "Reasoning index shortcut: direct path match".to_string(),
-                    NavigationDecision::ThisIsTheAnswer,
-                );
-                return Ok(StageOutcome::cont());
-            }
-        }
-
-        // === Phase Locate: find relevant subtrees via ToC ===
-        // Use depth-1 nodes (root's direct children = top-level sections).
-        // level(0) is only the root itself, which is not useful for locating.
-        let top_level_nodes: Vec<_> = ctx
-            .retrieval_index
-            .as_ref()
-            .and_then(|idx| idx.level(1))
-            .map(|nodes| nodes.to_vec())
-            .unwrap_or_else(|| ctx.tree.children(ctx.tree.root()));
-
-        let mut cues = self
-            .toc_navigator
-            .locate(&ctx.query, &ctx.tree, &top_level_nodes)
-            .await;
-
-        // === L2 Cache boost: boost cues whose paths have historical success ===
-        let doc_key = format!("{:?}", ctx.tree.root());
-        let l2_paths = ctx.reasoning_cache.l2_top_paths(&doc_key, 5);
-        if !l2_paths.is_empty() {
-            for cue in &mut cues {
-                if let Some(node) = ctx.tree.get(cue.root) {
-                    let node_path = node.title.as_str();
-                    if let Some((_, cached_conf)) = l2_paths.iter().find(|(path, _)| {
-                        node_path.contains(path.as_str()) || path.contains(node_path)
-                    }) {
-                        // Blend current confidence with historical: 60% current + 40% cached
-                        cue.confidence = cue.confidence * 0.6 + cached_conf * 0.4;
-                        debug!(
-                            "L2 cache boost for '{}': {:.3} → {:.3}",
-                            node_path, cue.confidence, cue.confidence
-                        );
-                    }
-                }
-            }
-        }
-
-        debug!("ToCNavigator returned {} cues", cues.len());
-
-        // Inject structure hints from Analyze stage as high-priority cues
-        if !ctx.resolved_path_hints.is_empty() {
-            for (hint_text, node_id) in &ctx.resolved_path_hints {
-                if ctx.tree.get(*node_id).is_some() {
-                    info!("Injecting structure hint '{}' as search cue", hint_text);
-                    cues.push(SearchCue {
-                        root: *node_id,
-                        confidence: 1.0, // Direct match from query structure
-                    });
-                }
-            }
-        }
-
-        // === Resolve queries (decomposed or original) ===
-        let queries = Self::resolve_queries(ctx);
-
-        // === Phase Traverse + Collect ===
-        let (paths, mut candidates) = self.run_search(ctx, &queries, &cues).await;
-
-        // Add cue root nodes as direct candidates.
-        // The ToCNavigator already identified these as relevant; they may not
-        // be leaf nodes so tree traversal would skip them. This restores the
-        // old locate_via_llm behavior where LLM-selected nodes became
-        // candidates directly.
-        for cue in &cues {
-            if let Some(node) = ctx.tree.get(cue.root) {
-                candidates.push(CandidateNode::new(
-                    cue.root,
-                    cue.confidence,
-                    node.depth,
-                    ctx.tree.is_leaf(cue.root),
-                ));
-            }
-        }
-
-        // Sort by score and deduplicate
-        candidates.sort_by(|a, b| {
-            b.score
-                .partial_cmp(&a.score)
-                .unwrap_or(std::cmp::Ordering::Equal)
-        });
-        candidates.dedup_by(|a, b| a.node_id == b.node_id);
-
-        ctx.search_paths = paths;
-        ctx.candidates = candidates;
-
-        debug!(
-            "Search found {} total paths, {} candidates",
-            ctx.search_paths.len(),
-            ctx.candidates.len()
-        );
-        for (i, c) in ctx.candidates.iter().enumerate().take(5) {
-            if let Some(node) = ctx.tree.get(c.node_id) {
-                debug!(
-                    "Candidate {}: score={:.3}, title='{}'",
-                    i, c.score, node.title
-                );
-            }
-        }
-
-        // Update metrics and budget
-        ctx.metrics.search_time_ms += start.elapsed().as_millis() as u64;
-        ctx.metrics.nodes_visited += ctx.candidates.len();
-
-        // Update hot node tracker with retrieval results
-        if let Some(ref tracker) = ctx.hot_tracker {
-            let hits: Vec<(crate::document::NodeId, f32)> = ctx
-                .candidates
-                .iter()
-                .map(|c| (c.node_id, c.score))
-                .collect();
-            tracker.record_hits(&hits);
-        }
-
-        // === L3 Cache boost: use cached strategy scores to refine candidates ===
-        for candidate in &mut ctx.candidates {
-            if let Some(node) = ctx.tree.get(candidate.node_id) {
-                let content_fp = crate::utils::fingerprint::Fingerprint::from_str(&node.content);
-                if let Some((cached_score, _strategy)) = ctx.reasoning_cache.l3_get(&content_fp) {
-                    // Blend: if L3 has a higher score for this node, boost it
-                    if cached_score > candidate.score {
-                        candidate.score = (candidate.score + cached_score) / 2.0;
-                    }
-                }
-            }
-        }
-        // Re-sort after L3 boost
-        ctx.candidates.sort_by(|a, b| {
-            b.score
-                .partial_cmp(&a.score)
-                .unwrap_or(std::cmp::Ordering::Equal)
-        });
-
-        // Store L3 scores for future queries
-        for candidate in &ctx.candidates {
-            if let Some(node) = ctx.tree.get(candidate.node_id) {
-                if !node.content.is_empty() {
-                    let content_fp =
-                        crate::utils::fingerprint::Fingerprint::from_str(&node.content);
-                    ctx.reasoning_cache.l3_store(
-                        content_fp,
-                        candidate.score,
-                        ctx.selected_strategy
-                            .map(|s| format!("{:?}", s))
-                            .unwrap_or_else(|| "auto".to_string()),
-                    );
-                }
-            }
-        }
-        // Estimate tokens consumed by this search iteration (content-based heuristic)
-        let search_tokens: usize = ctx
-            .candidates
-            .iter()
-            .filter_map(|c| ctx.tree.get(c.node_id).map(|n| n.content.len()))
-            .sum::<usize>()
-            / 4; // rough: 4 chars ≈ 1 token
-        ctx.budget_controller.record_tokens(search_tokens);
-
-        // Store results in L1 cache
-        if ctx.options.enable_cache && ctx.search_iterations <= 1 && !ctx.candidates.is_empty() {
-            let scope_fp =
-                crate::utils::fingerprint::Fingerprint::from_str(&format!("{:?}", ctx.tree.root()));
-            let cached: Vec<CachedCandidate> = ctx
-                .candidates
-                .iter()
-                .map(|c| CachedCandidate {
-                    node_id: c.node_id,
-                    score: c.score,
-                    depth: c.depth,
-                })
-                .collect();
-            ctx.reasoning_cache.l1_store(
-                &ctx.query,
-                scope_fp,
-                cached,
-                ctx.selected_strategy
-                    .map(|s| format!("{:?}", s))
-                    .unwrap_or_else(|| "auto".to_string()),
-            );
-        }
-
-        info!(
-            "Search complete: {} candidates (iteration {})",
-            ctx.candidates.len(),
-            ctx.search_iterations
-        );
-
-        // Record reasoning — collect data first to avoid borrow conflicts
-        let strategy_str = ctx
-            .selected_strategy
-            .map(|s| format!("{:?}", s))
-            .unwrap_or_else(|| "auto".to_string());
-        let search_iterations = ctx.search_iterations;
-
-        let reasoning_data: Vec<(
-            String,
-            Option<String>,
-            f32,
-            usize,
-            String,
-            Vec<ReasoningCandidate>,
-        )> = ctx
-            .candidates
-            .iter()
-            .take(5)
-            .map(|candidate| {
-                let (title, depth) = ctx
-                    .tree
-                    .get(candidate.node_id)
-                    .map(|n| (n.title.clone(), n.depth))
-                    .unwrap_or_else(|| ("(unknown)".to_string(), 0));
-
-                let considered: Vec<ReasoningCandidate> = ctx
-                    .candidates
-                    .iter()
-                    .filter(|c| c.node_id != candidate.node_id)
-                    .take(5)
-                    .filter_map(|c| {
-                        ctx.tree.get(c.node_id).map(|n| ReasoningCandidate {
-                            node_id: format!("{:?}", c.node_id),
-                            title: n.title.clone(),
-                            score: c.score,
-                        })
-                    })
-                    .collect();
-
-                let reasoning = format!(
-                    "Candidate '{}' (score={:.3}) found via {} search, iteration {}",
-                    title,
-                    candidate.score,
-                    algorithm.name(),
-                    search_iterations
-                );
-
-                (
-                    format!("{:?}", candidate.node_id),
-                    Some(title),
-                    candidate.score,
-                    depth,
-                    reasoning,
-                    considered,
-                )
-            })
-            .collect();
-
-        for (node_id, title, score, depth, reasoning, considered) in reasoning_data {
-            ctx.push_reasoning_step(ReasoningStep {
-                stage: StageName::Search,
-                node_id: Some(node_id),
-                title,
-                score,
-                decision: if score > 0.7 {
-                    NavigationDecision::ThisIsTheAnswer
-                } else {
-                    NavigationDecision::ExploreMore
-                },
-                depth,
-                reasoning,
-                candidates: considered,
-                strategy_used: Some(strategy_str.clone()),
-                llm_call: None,
-                references_followed: Vec::new(),
-            });
-        }
-
-        Ok(StageOutcome::cont())
-    }
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-
-    #[test]
-    fn test_search_stage_creation() {
-        let stage = SearchStage::new();
-        assert!(stage.llm_strategy.is_none());
-        assert!(!stage.has_pilot());
-    }
-
-    #[test]
-    fn test_search_stage_dependencies() {
-        let stage = SearchStage::new();
-        assert_eq!(stage.depends_on(), vec!["plan"]);
-    }
-}
diff --git a/rust/src/retrieval/strategy/cross_document.rs b/rust/src/retrieval/strategy/cross_document.rs
deleted file mode 100644
index c296ec24..00000000
--- a/rust/src/retrieval/strategy/cross_document.rs
+++ /dev/null
@@ -1,499 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Cross-document retrieval strategy.
-//!
-//! Retrieves relevant content from multiple documents, aggregating
-//! results into a unified response.
-
-use async_trait::async_trait;
-use std::sync::Arc;
-
-use super::r#trait::{NodeEvaluation, RetrievalStrategy, StrategyCapabilities};
-use crate::document::{DocumentTree, NodeId};
-use crate::graph::DocumentGraph;
-use crate::retrieval::RetrievalContext;
-use crate::retrieval::types::QueryComplexity;
-
-/// Document identifier for cross-document retrieval.
-pub type DocumentId = String;
-
-/// A document with its tree structure for cross-document retrieval.
-pub struct DocumentEntry {
-    /// Unique document identifier.
-    pub id: DocumentId,
-    /// Document title or name.
-    pub title: String,
-    /// The document tree.
-    pub tree: DocumentTree,
-}
-
-impl DocumentEntry {
-    /// Create a new document entry.
-    pub fn new(id: impl Into<String>, title: impl Into<String>, tree: DocumentTree) -> Self {
-        Self {
-            id: id.into(),
-            title: title.into(),
-            tree,
-        }
-    }
-}
-
-/// Result from a single document in cross-document retrieval.
-#[derive(Debug, Clone)]
-pub struct DocumentResult {
-    /// Document ID.
-    pub doc_id: DocumentId,
-    /// Document title.
-    pub doc_title: String,
-    /// Node evaluation results from this document.
-    pub evaluations: Vec<(NodeId, NodeEvaluation)>,
-    /// Best score from this document.
-    pub best_score: f32,
-}
-
-/// Strategy for merging results from multiple documents.
-#[derive(Debug, Clone, Copy, PartialEq, Eq, Default)]
-pub enum MergeStrategy {
-    /// Take top-k results across all documents (default).
-    #[default]
-    TopK,
-    /// Take best result from each document.
-    BestPerDocument,
-    /// Weight results by document relevance score.
-    WeightedByRelevance,
-    /// Use graph connectivity to boost connected documents.
-    GraphBoosted,
-}
-
-/// Configuration for cross-document retrieval.
-#[derive(Debug, Clone)]
-pub struct CrossDocumentConfig {
-    /// Maximum number of documents to search.
-    pub max_documents: usize,
-    /// Maximum results per document.
-    pub max_results_per_doc: usize,
-    /// Maximum total results.
-    pub max_total_results: usize,
-    /// Minimum score threshold for including results.
-    pub min_score: f32,
-    /// How to merge results from multiple documents.
-    pub merge_strategy: MergeStrategy,
-    /// Whether to search documents in parallel.
-    pub parallel_search: bool,
-}
-
-impl Default for CrossDocumentConfig {
-    fn default() -> Self {
-        Self {
-            max_documents: 10,
-            max_results_per_doc: 3,
-            max_total_results: 10,
-            min_score: 0.3,
-            merge_strategy: MergeStrategy::TopK,
-            parallel_search: true,
-        }
-    }
-}
-
-/// Cross-document retrieval strategy.
-///
-/// Searches multiple documents and aggregates results based on
-/// the configured merge strategy.
-///
-/// # Example
-///
-/// ```rust,ignore
-/// use vectorless::retrieval::strategy::{CrossDocumentStrategy, DocumentEntry};
-///
-/// let docs = vec![
-///     DocumentEntry::new("doc1", "Manual A", tree1),
-///     DocumentEntry::new("doc2", "Manual B", tree2),
-/// ];
-///
-/// let strategy = CrossDocumentStrategy::new(inner_strategy)
-///     .with_config(CrossDocumentConfig {
-///         max_documents: 5,
-///         max_results_per_doc: 2,
-///         ..Default::default()
-///     });
-/// ```
-pub struct CrossDocumentStrategy {
-    /// Inner strategy for searching individual documents.
-    inner: Box<dyn RetrievalStrategy>,
-    /// Configuration.
-    config: CrossDocumentConfig,
-    /// Documents to search.
-    documents: Vec<DocumentEntry>,
-    /// Optional document graph for graph-aware ranking.
-    graph: Option<Arc<DocumentGraph>>,
-}
-
-impl CrossDocumentStrategy {
-    /// Create a new cross-document strategy.
-    pub fn new(inner: Box<dyn RetrievalStrategy>) -> Self {
-        Self {
-            inner,
-            config: CrossDocumentConfig::default(),
-            documents: Vec::new(),
-            graph: None,
-        }
-    }
-
-    /// Create with configuration.
-    pub fn with_config(mut self, config: CrossDocumentConfig) -> Self {
-        self.config = config;
-        self
-    }
-
-    /// Add a document to search.
-    pub fn add_document(&mut self, doc: DocumentEntry) {
-        if self.documents.len() < self.config.max_documents {
-            self.documents.push(doc);
-        }
-    }
-
-    /// Set documents to search.
-    pub fn with_documents(mut self, documents: Vec<DocumentEntry>) -> Self {
-        self.documents = documents
-            .into_iter()
-            .take(self.config.max_documents)
-            .collect();
-        self
-    }
-
-    /// Get the number of documents.
-    pub fn document_count(&self) -> usize {
-        self.documents.len()
-    }
-
-    /// Set the document graph for graph-aware ranking.
-    pub fn with_graph(mut self, graph: Arc<DocumentGraph>) -> Self {
-        self.graph = Some(graph);
-        self
-    }
-
-    /// Apply graph-based score boosting to merged results.
-    ///
-    /// For each high-confidence result (score > 0.5), find its graph neighbors
-    /// and boost their scores by `boost_factor * edge_weight`.
-    fn apply_graph_boost(
-        &self,
-        results: &mut Vec<(DocumentId, NodeId, NodeEvaluation)>,
-        boost_factor: f32,
-    ) {
-        let graph = match self.graph {
-            Some(ref g) => g,
-            None => return,
-        };
-
-        // Collect doc_ids with high scores
-        let high_score_docs: Vec<(String, f32)> = results
-            .iter()
-            .filter(|(_, _, eval)| eval.score > 0.5)
-            .map(|(doc_id, _, eval)| (doc_id.clone(), eval.score))
-            .collect();
-
-        if high_score_docs.is_empty() {
-            return;
-        }
-
-        // For each high-score doc, boost its graph neighbors
-        for (doc_id, base_score) in &high_score_docs {
-            let neighbors = graph.get_neighbors(doc_id);
-            for edge in neighbors {
-                // Find results from the neighbor doc and boost them
-                for result in results.iter_mut() {
-                    if result.0 == edge.target_doc_id {
-                        let boost = boost_factor * edge.weight * base_score;
-                        result.2.score += boost;
-                    }
-                }
-            }
-        }
-
-        // Re-sort by score after boosting
-        results.sort_by(|a, b| {
-            b.2.score
-                .partial_cmp(&a.2.score)
-                .unwrap_or(std::cmp::Ordering::Equal)
-        });
-    }
-
-    /// Search a single document and return results.
-    ///
-    /// Performs depth-first traversal: evaluates top-level nodes first,
-    /// then recursively explores children of high-scoring nodes.
-    async fn search_document(
-        &self,
-        doc: &DocumentEntry,
-        context: &RetrievalContext,
-    ) -> DocumentResult {
-        let root_id = doc.tree.root();
-        let children = doc.tree.children(root_id);
-
-        // Phase 1: Evaluate top-level nodes
-        let top_evaluations = self
-            .inner
-            .evaluate_nodes(&doc.tree, &children, context)
-            .await;
-
-        let mut scored_nodes: Vec<(NodeId, NodeEvaluation)> = children
-            .into_iter()
-            .zip(top_evaluations.into_iter())
-            .filter(|(_, eval)| eval.score >= self.config.min_score)
-            .collect();
-
-        // Phase 2: Depth traversal — explore children of high-scoring nodes
-        let high_score_nodes: Vec<NodeId> = scored_nodes
-            .iter()
-            .filter(|(_, eval)| eval.score >= self.config.min_score * 1.5)
-            .map(|(id, _)| *id)
-            .collect();
-
-        for node_id in high_score_nodes {
-            let depth_results = self.search_subtree(&doc.tree, node_id, context, 0, 2).await;
-            scored_nodes.extend(depth_results);
-        }
-
-        // Sort by score descending
-        scored_nodes.sort_by(|a, b| {
-            b.1.score
-                .partial_cmp(&a.1.score)
-                .unwrap_or(std::cmp::Ordering::Equal)
-        });
-
-        // Deduplicate by node_id
-        scored_nodes.dedup_by(|a, b| a.0 == b.0);
-
-        // Limit results per document
-        scored_nodes.truncate(self.config.max_results_per_doc);
-
-        let best_score = scored_nodes.first().map(|(_, e)| e.score).unwrap_or(0.0);
-
-        DocumentResult {
-            doc_id: doc.id.clone(),
-            doc_title: doc.title.clone(),
-            evaluations: scored_nodes,
-            best_score,
-        }
-    }
-
-    /// Recursively search a subtree, evaluating children of high-scoring nodes.
-    fn search_subtree<'a>(
-        &'a self,
-        tree: &'a DocumentTree,
-        parent_id: NodeId,
-        context: &'a RetrievalContext,
-        current_depth: usize,
-        max_depth: usize,
-    ) -> std::pin::Pin<
-        Box<dyn std::future::Future<Output = Vec<(NodeId, NodeEvaluation)>> + Send + 'a>,
-    > {
-        Box::pin(async move {
-            if current_depth >= max_depth {
-                return Vec::new();
-            }
-
-            let children = tree.children(parent_id);
-            if children.is_empty() {
-                return Vec::new();
-            }
-
-            let evaluations = self.inner.evaluate_nodes(tree, &children, context).await;
-
-            let mut results = Vec::new();
-            let mut explore_further = Vec::new();
-
-            for (node_id, eval) in children.into_iter().zip(evaluations.into_iter()) {
-                if eval.score >= self.config.min_score {
-                    results.push((node_id, eval.clone()));
-                }
-                // Only explore deeper if score is promising
-                if eval.score >= self.config.min_score * 1.5 {
-                    explore_further.push(node_id);
-                }
-            }
-
-            // Recurse into promising children
-            for child_id in explore_further {
-                let deeper = self
-                    .search_subtree(tree, child_id, context, current_depth + 1, max_depth)
-                    .await;
-                results.extend(deeper);
-            }
-
-            results
-        })
-    }
-
-    /// Merge results from all documents.
-    fn merge_results(
-        &self,
-        doc_results: Vec<DocumentResult>,
-    ) -> Vec<(DocumentId, NodeId, NodeEvaluation)> {
-        match self.config.merge_strategy {
-            MergeStrategy::TopK => {
-                // Collect all results and sort by score
-                let mut all_results: Vec<_> = doc_results
-                    .into_iter()
-                    .flat_map(|doc| {
-                        doc.evaluations
-                            .into_iter()
-                            .map(move |(node_id, eval)| (doc.doc_id.clone(), node_id, eval))
-                    })
-                    .collect();
-
-                all_results.sort_by(|a, b| {
-                    b.2.score
-                        .partial_cmp(&a.2.score)
-                        .unwrap_or(std::cmp::Ordering::Equal)
-                });
-                all_results.truncate(self.config.max_total_results);
-                all_results
-            }
-
-            MergeStrategy::BestPerDocument => {
-                // Take the best result from each document
-                doc_results
-                    .into_iter()
-                    .filter_map(|doc| {
-                        doc.evaluations
-                            .into_iter()
-                            .next()
-                            .map(|(node_id, eval)| (doc.doc_id, node_id, eval))
-                    })
-                    .take(self.config.max_total_results)
-                    .collect()
-            }
-
-            MergeStrategy::WeightedByRelevance => {
-                // Weight by document's best score
-                let max_doc_score = doc_results
-                    .iter()
-                    .map(|d| d.best_score)
-                    .fold(0.0_f32, f32::max);
-
-                let mut all_results: Vec<_> = doc_results
-                    .into_iter()
-                    .flat_map(|doc| {
-                        let weight = if max_doc_score > 0.0 {
-                            doc.best_score / max_doc_score
-                        } else {
-                            1.0
-                        };
-                        doc.evaluations.into_iter().map(move |(node_id, mut eval)| {
-                            eval.score *= weight;
-                            (doc.doc_id.clone(), node_id, eval)
-                        })
-                    })
-                    .collect();
-
-                all_results.sort_by(|a, b| {
-                    b.2.score
-                        .partial_cmp(&a.2.score)
-                        .unwrap_or(std::cmp::Ordering::Equal)
-                });
-                all_results.truncate(self.config.max_total_results);
-                all_results
-            }
-
-            MergeStrategy::GraphBoosted => {
-                // First do TopK merge
-                let mut all_results: Vec<_> = doc_results
-                    .into_iter()
-                    .flat_map(|doc| {
-                        doc.evaluations
-                            .into_iter()
-                            .map(move |(node_id, eval)| (doc.doc_id.clone(), node_id, eval))
-                    })
-                    .collect();
-
-                all_results.sort_by(|a, b| {
-                    b.2.score
-                        .partial_cmp(&a.2.score)
-                        .unwrap_or(std::cmp::Ordering::Equal)
-                });
-
-                // Apply graph-based boosting
-                self.apply_graph_boost(&mut all_results, 0.15);
-
-                all_results.truncate(self.config.max_total_results);
-                all_results
-            }
-        }
-    }
-}
-
-#[async_trait]
-impl RetrievalStrategy for CrossDocumentStrategy {
-    async fn evaluate_node(
-        &self,
-        tree: &DocumentTree,
-        node_id: NodeId,
-        context: &RetrievalContext,
-    ) -> NodeEvaluation {
-        // Delegate to inner strategy
-        self.inner.evaluate_node(tree, node_id, context).await
-    }
-
-    async fn evaluate_nodes(
-        &self,
-        tree: &DocumentTree,
-        node_ids: &[NodeId],
-        context: &RetrievalContext,
-    ) -> Vec<NodeEvaluation> {
-        // Delegate to inner strategy
-        self.inner.evaluate_nodes(tree, node_ids, context).await
-    }
-
-    fn name(&self) -> &'static str {
-        "cross_document"
-    }
-
-    fn capabilities(&self) -> StrategyCapabilities {
-        let inner_caps = self.inner.capabilities();
-        StrategyCapabilities {
-            uses_llm: inner_caps.uses_llm,
-            uses_embeddings: inner_caps.uses_embeddings,
-            supports_sufficiency: true,
-            typical_latency_ms: inner_caps.typical_latency_ms * self.documents.len().min(5) as u64,
-        }
-    }
-
-    fn suitable_for_complexity(&self, complexity: QueryComplexity) -> bool {
-        // Cross-document is suitable for all complexity levels
-        matches!(
-            complexity,
-            QueryComplexity::Simple | QueryComplexity::Medium | QueryComplexity::Complex
-        )
-    }
-
-    fn estimate_cost(&self, node_count: usize) -> super::r#trait::StrategyCost {
-        let inner_cost = self.inner.estimate_cost(node_count);
-        super::r#trait::StrategyCost {
-            llm_calls: inner_cost.llm_calls * self.documents.len().min(self.config.max_documents),
-            tokens: inner_cost.tokens * self.documents.len().min(self.config.max_documents),
-        }
-    }
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-
-    #[test]
-    fn test_config_default() {
-        let config = CrossDocumentConfig::default();
-        assert_eq!(config.max_documents, 10);
-        assert_eq!(config.max_results_per_doc, 3);
-        assert_eq!(config.max_total_results, 10);
-        assert_eq!(config.merge_strategy, MergeStrategy::TopK);
-    }
-
-    #[test]
-    fn test_merge_strategy_default() {
-        let strategy = MergeStrategy::default();
-        assert!(matches!(strategy, MergeStrategy::TopK));
-    }
-}
diff --git a/rust/src/retrieval/strategy/hybrid.rs b/rust/src/retrieval/strategy/hybrid.rs
deleted file mode 100644
index 37c7f5fc..00000000
--- a/rust/src/retrieval/strategy/hybrid.rs
+++ /dev/null
@@ -1,471 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Hybrid retrieval strategy combining BM25 pre-filtering with LLM refinement.
-//!
-//! This two-stage approach minimizes LLM calls while maintaining high accuracy:
-//! 1. **BM25 Filter**: Fast keyword scoring to identify candidate nodes
-//! 2. **LLM Refinement**: Semantic understanding of top candidates only
-
-use async_trait::async_trait;
-
-use super::r#trait::{NodeEvaluation, RetrievalStrategy, StrategyCapabilities};
-use crate::document::{DocumentTree, NodeId};
-use crate::retrieval::RetrievalContext;
-use crate::retrieval::scoring::{Bm25Engine, FieldDocument};
-use crate::retrieval::types::{NavigationDecision, QueryComplexity};
-
-/// Configuration for hybrid retrieval.
-#[derive(Debug, Clone)]
-pub struct HybridConfig {
-    /// BM25 pre-filter: keep top N% of candidates.
-    pub pre_filter_ratio: f32,
-    /// BM25 pre-filter: minimum candidates to keep.
-    pub min_candidates: usize,
-    /// BM25 pre-filter: maximum candidates to pass to LLM.
-    pub max_candidates: usize,
-    /// Score threshold for automatic acceptance (skip LLM).
-    pub auto_accept_threshold: f32,
-    /// Score threshold for automatic rejection (skip LLM).
-    pub auto_reject_threshold: f32,
-    /// Weight for BM25 score in final scoring.
-    pub bm25_weight: f32,
-    /// Weight for LLM score in final scoring.
-    pub llm_weight: f32,
-    /// Whether to use BM25 for initial filtering.
-    pub use_pre_filter: bool,
-}
-
-impl Default for HybridConfig {
-    fn default() -> Self {
-        Self {
-            pre_filter_ratio: 0.3, // Keep top 30%
-            min_candidates: 2,
-            max_candidates: 5,
-            auto_accept_threshold: 0.85,
-            auto_reject_threshold: 0.15,
-            bm25_weight: 0.4,
-            llm_weight: 0.6,
-            use_pre_filter: true,
-        }
-    }
-}
-
-impl HybridConfig {
-    /// Create a new configuration.
-    pub fn new() -> Self {
-        Self::default()
-    }
-
-    /// Set pre-filter ratio.
-    #[must_use]
-    pub fn with_pre_filter_ratio(mut self, ratio: f32) -> Self {
-        self.pre_filter_ratio = ratio.clamp(0.1, 1.0);
-        self
-    }
-
-    /// Set candidate limits.
-    #[must_use]
-    pub fn with_candidate_limits(mut self, min: usize, max: usize) -> Self {
-        self.min_candidates = min;
-        self.max_candidates = max;
-        self
-    }
-
-    /// Set score thresholds.
-    #[must_use]
-    pub fn with_thresholds(mut self, auto_accept: f32, auto_reject: f32) -> Self {
-        self.auto_accept_threshold = auto_accept;
-        self.auto_reject_threshold = auto_reject;
-        self
-    }
-
-    /// Set scoring weights.
-    #[must_use]
-    pub fn with_weights(mut self, bm25: f32, llm: f32) -> Self {
-        self.bm25_weight = bm25;
-        self.llm_weight = llm;
-        self
-    }
-
-    /// Disable pre-filtering (pass all to LLM).
-    #[must_use]
-    pub fn without_pre_filter(mut self) -> Self {
-        self.use_pre_filter = false;
-        self
-    }
-
-    /// High-quality mode (more LLM calls).
-    #[must_use]
-    pub fn high_quality() -> Self {
-        Self {
-            pre_filter_ratio: 0.5,
-            min_candidates: 3,
-            max_candidates: 8,
-            auto_accept_threshold: 0.95,
-            auto_reject_threshold: 0.1,
-            bm25_weight: 0.3,
-            llm_weight: 0.7,
-            use_pre_filter: true,
-        }
-    }
-
-    /// Low-cost mode (fewer LLM calls).
-    #[must_use]
-    pub fn low_cost() -> Self {
-        Self {
-            pre_filter_ratio: 0.2,
-            min_candidates: 1,
-            max_candidates: 3,
-            auto_accept_threshold: 0.75,
-            auto_reject_threshold: 0.25,
-            bm25_weight: 0.5,
-            llm_weight: 0.5,
-            use_pre_filter: true,
-        }
-    }
-}
-
-/// Hybrid retrieval strategy combining BM25 and LLM.
-///
-/// This strategy uses a two-stage approach:
-/// 1. **BM25 Filter**: Quickly score all nodes using keyword matching
-/// 2. **LLM Refinement**: Apply semantic understanding to top candidates
-///
-/// This dramatically reduces LLM calls while maintaining accuracy.
-///
-/// # Example
-///
-/// ```rust,ignore
-/// use vectorless::retrieval::strategy::{HybridStrategy, LlmStrategy};
-///
-/// let hybrid = HybridStrategy::new(
-///     llm_strategy,
-/// ).with_config(HybridConfig::high_quality());
-/// ```
-pub struct HybridStrategy {
-    /// LLM strategy for refinement.
-    llm_strategy: Box<dyn RetrievalStrategy>,
-    /// Configuration.
-    config: HybridConfig,
-    /// BM25 engine for pre-filtering.
-    bm25_engine: Option<Bm25Engine<usize>>,
-}
-
-impl HybridStrategy {
-    /// Create a new hybrid strategy.
-    pub fn new(llm_strategy: Box<dyn RetrievalStrategy>) -> Self {
-        Self {
-            llm_strategy,
-            config: HybridConfig::default(),
-            bm25_engine: None,
-        }
-    }
-
-    /// Create with configuration.
-    pub fn with_config(mut self, config: HybridConfig) -> Self {
-        self.config = config;
-        self
-    }
-
-    /// Set configuration for high-quality mode.
-    pub fn with_high_quality(mut self) -> Self {
-        self.config = HybridConfig::high_quality();
-        self
-    }
-
-    /// Set configuration for low-cost mode.
-    pub fn with_low_cost(mut self) -> Self {
-        self.config = HybridConfig::low_cost();
-        self
-    }
-
-    /// Build BM25 index from tree nodes.
-    fn build_bm25_index(&mut self, tree: &DocumentTree, node_ids: &[NodeId]) {
-        let documents: Vec<FieldDocument<usize>> = node_ids
-            .iter()
-            .enumerate()
-            .map(|(idx, &node_id)| {
-                if let Some(node) = tree.get(node_id) {
-                    FieldDocument::new(
-                        idx,
-                        node.title.clone(),
-                        node.summary.clone(),
-                        node.content.clone(),
-                    )
-                } else {
-                    FieldDocument::new(idx, String::new(), String::new(), String::new())
-                }
-            })
-            .collect();
-
-        if !documents.is_empty() {
-            self.bm25_engine = Some(Bm25Engine::fit_to_corpus(&documents));
-        }
-    }
-
-    /// Get BM25 scores for a query.
-    fn bm25_scores(&self, query: &str, node_count: usize) -> Vec<(usize, f32)> {
-        let engine = match &self.bm25_engine {
-            Some(e) => e,
-            None => return Vec::new(),
-        };
-
-        let results = engine.search_weighted(query, node_count);
-        results
-            .into_iter()
-            .map(|(idx, score)| (idx, score))
-            .collect()
-    }
-
-    /// Filter candidates using BM25 scores.
-    fn filter_candidates(&self, bm25_scores: &[(usize, f32)], total_count: usize) -> Vec<usize> {
-        if !self.config.use_pre_filter || total_count <= self.config.min_candidates {
-            return (0..total_count).collect();
-        }
-
-        // Calculate how many candidates to keep
-        let keep_count = ((total_count as f32 * self.config.pre_filter_ratio) as usize)
-            .max(self.config.min_candidates)
-            .min(self.config.max_candidates)
-            .min(total_count);
-
-        // Sort by score and take top candidates
-        let mut sorted: Vec<_> = bm25_scores.to_vec();
-        sorted.sort_by(|a, b| b.1.partial_cmp(&a.1).unwrap_or(std::cmp::Ordering::Equal));
-
-        sorted
-            .into_iter()
-            .take(keep_count)
-            .map(|(idx, _)| idx)
-            .collect()
-    }
-
-    /// Combine BM25 and LLM scores.
-    fn combine_scores(&self, bm25_score: f32, llm_score: f32) -> f32 {
-        (bm25_score * self.config.bm25_weight + llm_score * self.config.llm_weight)
-            / (self.config.bm25_weight + self.config.llm_weight)
-    }
-}
-
-#[async_trait]
-impl RetrievalStrategy for HybridStrategy {
-    async fn evaluate_node(
-        &self,
-        tree: &DocumentTree,
-        node_id: NodeId,
-        context: &RetrievalContext,
-    ) -> NodeEvaluation {
-        // Delegate to LLM strategy for single node
-        self.llm_strategy
-            .evaluate_node(tree, node_id, context)
-            .await
-    }
-
-    async fn evaluate_nodes(
-        &self,
-        tree: &DocumentTree,
-        node_ids: &[NodeId],
-        context: &RetrievalContext,
-    ) -> Vec<NodeEvaluation> {
-        if node_ids.is_empty() {
-            return Vec::new();
-        }
-
-        // Build BM25 index if needed
-        let bm25_scores = self.bm25_scores(&context.query, node_ids.len());
-
-        // If no BM25 scores available, fall back to LLM only
-        if bm25_scores.is_empty() {
-            return self
-                .llm_strategy
-                .evaluate_nodes(tree, node_ids, context)
-                .await;
-        }
-
-        // Create a score map for quick lookup
-        let score_map: std::collections::HashMap<usize, f32> = bm25_scores
-            .iter()
-            .map(|(idx, score)| (*idx, *score))
-            .collect();
-
-        // Normalize BM25 scores
-        let max_bm25 = score_map.values().cloned().fold(0.0_f32, f32::max);
-        let normalized_scores: std::collections::HashMap<usize, f32> = if max_bm25 > 0.0 {
-            score_map
-                .iter()
-                .map(|(idx, score)| (*idx, *score / max_bm25))
-                .collect()
-        } else {
-            score_map
-        };
-
-        // Check for auto-accept/reject candidates
-        let mut results = vec![NodeEvaluation::default(); node_ids.len()];
-        let mut needs_llm = Vec::new();
-
-        for (idx, &node_id) in node_ids.iter().enumerate() {
-            let bm25_score = normalized_scores.get(&idx).copied().unwrap_or(0.0);
-
-            if bm25_score >= self.config.auto_accept_threshold {
-                // High BM25 score: auto-accept without LLM
-                results[idx] = NodeEvaluation {
-                    score: bm25_score,
-                    decision: if tree.is_leaf(node_id) {
-                        NavigationDecision::ThisIsTheAnswer
-                    } else {
-                        NavigationDecision::ExploreMore
-                    },
-                    reasoning: Some(format!("Auto-accepted by BM25: {:.3}", bm25_score)),
-                };
-            } else if bm25_score <= self.config.auto_reject_threshold {
-                // Low BM25 score: auto-reject without LLM
-                results[idx] = NodeEvaluation {
-                    score: bm25_score,
-                    decision: NavigationDecision::Skip,
-                    reasoning: Some(format!("Auto-rejected by BM25: {:.3}", bm25_score)),
-                };
-            } else {
-                // Need LLM refinement
-                needs_llm.push((idx, node_id, bm25_score));
-            }
-        }
-
-        // Filter candidates for LLM
-        let candidate_indices: std::collections::HashSet<usize> = self
-            .filter_candidates(&bm25_scores, node_ids.len())
-            .into_iter()
-            .collect();
-
-        // Only send to LLM if in candidates and not already processed
-        let llm_nodes: Vec<NodeId> = needs_llm
-            .iter()
-            .filter(|(idx, _, _)| candidate_indices.contains(idx))
-            .map(|(_, node_id, _)| *node_id)
-            .collect();
-
-        // Call LLM for filtered candidates
-        if !llm_nodes.is_empty() {
-            let llm_results = self
-                .llm_strategy
-                .evaluate_nodes(tree, &llm_nodes, context)
-                .await;
-
-            // Map LLM results back with combined scores
-            let mut llm_iter = llm_results.into_iter();
-            for (idx, _node_id, bm25_score) in &needs_llm {
-                if candidate_indices.contains(idx) {
-                    if let Some(llm_eval) = llm_iter.next() {
-                        let combined_score = self.combine_scores(*bm25_score, llm_eval.score);
-                        results[*idx] = NodeEvaluation {
-                            score: combined_score,
-                            decision: llm_eval.decision,
-                            reasoning: Some(format!(
-                                "Hybrid: BM25={:.2}, LLM={:.2}, Combined={:.2}",
-                                bm25_score, llm_eval.score, combined_score
-                            )),
-                        };
-                    }
-                } else {
-                    // Not in LLM candidates, use BM25 only
-                    results[*idx] = NodeEvaluation {
-                        score: *bm25_score,
-                        decision: if *bm25_score > 0.5 {
-                            NavigationDecision::ExploreMore
-                        } else {
-                            NavigationDecision::Skip
-                        },
-                        reasoning: Some(format!("BM25 only (filtered): {:.3}", bm25_score)),
-                    };
-                }
-            }
-        } else {
-            // No LLM calls needed, use BM25 for all remaining
-            for (idx, _, bm25_score) in &needs_llm {
-                results[*idx] = NodeEvaluation {
-                    score: *bm25_score,
-                    decision: if *bm25_score > 0.5 {
-                        NavigationDecision::ExploreMore
-                    } else {
-                        NavigationDecision::Skip
-                    },
-                    reasoning: Some(format!("BM25 only: {:.3}", bm25_score)),
-                };
-            }
-        }
-
-        results
-    }
-
-    fn name(&self) -> &'static str {
-        "hybrid"
-    }
-
-    fn capabilities(&self) -> StrategyCapabilities {
-        let llm_caps = self.llm_strategy.capabilities();
-        StrategyCapabilities {
-            uses_llm: llm_caps.uses_llm,
-            uses_embeddings: false, // BM25 doesn't use embeddings
-            supports_sufficiency: llm_caps.supports_sufficiency,
-            typical_latency_ms: llm_caps.typical_latency_ms / 2, // Faster due to pre-filtering
-        }
-    }
-
-    fn suitable_for_complexity(&self, complexity: QueryComplexity) -> bool {
-        matches!(
-            complexity,
-            QueryComplexity::Simple | QueryComplexity::Medium | QueryComplexity::Complex
-        )
-    }
-
-    fn estimate_cost(&self, node_count: usize) -> super::r#trait::StrategyCost {
-        let llm_cost = self.llm_strategy.estimate_cost(node_count);
-
-        // Estimate reduced LLM calls due to pre-filtering
-        let filtered_count = ((node_count as f32 * self.config.pre_filter_ratio) as usize)
-            .max(self.config.min_candidates)
-            .min(self.config.max_candidates);
-
-        // Account for auto-accept/reject
-        let estimated_llm_calls = (filtered_count as f32 * 0.5) as usize;
-
-        super::r#trait::StrategyCost {
-            llm_calls: estimated_llm_calls.min(llm_cost.llm_calls),
-            tokens: estimated_llm_calls * 200, // Approximate tokens per call
-        }
-    }
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-
-    #[test]
-    fn test_config_default() {
-        let config = HybridConfig::default();
-        assert!((config.pre_filter_ratio - 0.3).abs() < f32::EPSILON);
-        assert_eq!(config.min_candidates, 2);
-        assert_eq!(config.max_candidates, 5);
-        assert!((config.bm25_weight - 0.4).abs() < f32::EPSILON);
-        assert!((config.llm_weight - 0.6).abs() < f32::EPSILON);
-    }
-
-    #[test]
-    fn test_config_presets() {
-        let high = HybridConfig::high_quality();
-        assert!(high.max_candidates > HybridConfig::default().max_candidates);
-
-        let low = HybridConfig::low_cost();
-        assert!(low.max_candidates < HybridConfig::default().max_candidates);
-    }
-
-    #[test]
-    fn test_combine_scores() {
-        let strategy =
-            HybridStrategy::new(Box::new(crate::retrieval::strategy::KeywordStrategy::new()));
-        let combined = strategy.combine_scores(0.8, 0.6);
-
-        // 0.8 * 0.4 + 0.6 * 0.6 = 0.32 + 0.36 = 0.68
-        assert!((combined - 0.68).abs() < 0.01);
-    }
-}
diff --git a/rust/src/retrieval/strategy/keyword.rs b/rust/src/retrieval/strategy/keyword.rs
deleted file mode 100644
index 7e505f0e..00000000
--- a/rust/src/retrieval/strategy/keyword.rs
+++ /dev/null
@@ -1,176 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Keyword-based retrieval strategy.
-//!
-//! Fast, no-LLM strategy using TF-IDF style matching.
-
-use async_trait::async_trait;
-use std::collections::{HashMap, HashSet};
-
-use super::super::RetrievalContext;
-use super::super::types::{NavigationDecision, QueryComplexity};
-use super::r#trait::{NodeEvaluation, RetrievalStrategy, StrategyCapabilities};
-use crate::document::{DocumentTree, NodeId};
-
-/// Keyword-based retrieval strategy.
-///
-/// Uses simple term frequency matching for fast, lightweight retrieval.
-/// Best for simple queries where exact term matches are sufficient.
-#[derive(Clone)]
-pub struct KeywordStrategy {
-    /// Whether to use bigram matching.
-    use_bigrams: bool,
-    /// Whether to match in summaries.
-    match_summaries: bool,
-}
-
-impl Default for KeywordStrategy {
-    fn default() -> Self {
-        Self {
-            use_bigrams: true,
-            match_summaries: true,
-        }
-    }
-}
-
-impl KeywordStrategy {
-    /// Create a new keyword strategy.
-    pub fn new() -> Self {
-        Self::default()
-    }
-
-    /// Tokenize text into words.
-    fn tokenize(text: &str) -> Vec<String> {
-        text.to_lowercase()
-            .split(|c: char| !c.is_alphanumeric())
-            .filter(|s| !s.is_empty() && s.len() > 1)
-            .map(|s| s.to_string())
-            .collect()
-    }
-
-    /// Generate bigrams from tokens.
-    fn bigrams(tokens: &[String]) -> Vec<String> {
-        tokens
-            .windows(2)
-            .map(|w| format!("{} {}", w[0], w[1]))
-            .collect()
-    }
-
-    /// Calculate term frequency for a document.
-    fn term_frequency(tokens: &[String]) -> HashMap<String, f32> {
-        let mut tf = HashMap::new();
-        let len = tokens.len().max(1);
-        for token in tokens {
-            *tf.entry(token.clone()).or_insert(0.0) += 1.0;
-        }
-        // Normalize by document length
-        for count in tf.values_mut() {
-            *count /= len as f32;
-        }
-        tf
-    }
-
-    /// Calculate relevance score using term overlap.
-    fn calculate_score(&self, node_tokens: &[String], query_tokens: &[String]) -> f32 {
-        if query_tokens.is_empty() || node_tokens.is_empty() {
-            return 0.0;
-        }
-
-        let query_set: HashSet<&String> = query_tokens.iter().collect();
-        let node_set: HashSet<&String> = node_tokens.iter().collect();
-
-        // Jaccard similarity
-        let intersection = query_set.intersection(&node_set).count();
-        let union = query_set.union(&node_set).count();
-
-        if union == 0 {
-            0.0
-        } else {
-            intersection as f32 / union as f32
-        }
-    }
-
-    /// Get all text from a node for matching.
-    fn get_node_text(tree: &DocumentTree, node_id: NodeId) -> String {
-        if let Some(node) = tree.get(node_id) {
-            let mut text = format!("{} {}", node.title, node.content);
-            if !node.summary.is_empty() {
-                text.push_str(&format!(" {}", node.summary));
-            }
-            text
-        } else {
-            String::new()
-        }
-    }
-}
-
-#[async_trait]
-impl RetrievalStrategy for KeywordStrategy {
-    async fn evaluate_node(
-        &self,
-        tree: &DocumentTree,
-        node_id: NodeId,
-        context: &RetrievalContext,
-    ) -> NodeEvaluation {
-        let node_text = Self::get_node_text(tree, node_id);
-        let node_tokens = Self::tokenize(&node_text);
-
-        // Calculate base score from unigram matching
-        let unigram_score = self.calculate_score(&node_tokens, &context.query_tokens);
-
-        // Optionally add bigram matching
-        let bigram_score = if self.use_bigrams {
-            let node_bigrams = Self::bigrams(&node_tokens);
-            let query_bigrams = Self::bigrams(&context.query_tokens);
-            self.calculate_score(&node_bigrams, &query_bigrams)
-        } else {
-            0.0
-        };
-
-        // Combine scores (weighted average)
-        let final_score = if self.use_bigrams {
-            0.6 * unigram_score + 0.4 * bigram_score
-        } else {
-            unigram_score
-        };
-
-        // Determine decision based on score and whether node has children
-        let decision = if final_score > 0.7 {
-            NavigationDecision::ThisIsTheAnswer
-        } else if final_score > 0.3 {
-            if tree.is_leaf(node_id) {
-                NavigationDecision::ThisIsTheAnswer
-            } else {
-                NavigationDecision::ExploreMore
-            }
-        } else if final_score > 0.1 {
-            NavigationDecision::ExploreMore
-        } else {
-            NavigationDecision::Skip
-        };
-
-        NodeEvaluation {
-            score: final_score,
-            decision,
-            reasoning: Some(format!("Keyword match score: {:.3}", final_score)),
-        }
-    }
-
-    fn name(&self) -> &'static str {
-        "keyword"
-    }
-
-    fn capabilities(&self) -> StrategyCapabilities {
-        StrategyCapabilities {
-            uses_llm: false,
-            uses_embeddings: false,
-            supports_sufficiency: false,
-            typical_latency_ms: 1,
-        }
-    }
-
-    fn suitable_for_complexity(&self, complexity: QueryComplexity) -> bool {
-        matches!(complexity, QueryComplexity::Simple)
-    }
-}
diff --git a/rust/src/retrieval/strategy/llm.rs b/rust/src/retrieval/strategy/llm.rs
deleted file mode 100644
index 10241ed8..00000000
--- a/rust/src/retrieval/strategy/llm.rs
+++ /dev/null
@@ -1,581 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! LLM-based retrieval strategy.
-//!
-//! Uses an LLM for deep reasoning about node relevance with ToC context.
-//! Supports batch evaluation — all sibling nodes are scored in a single
-//! LLM call instead of one call per node.
-
-use async_trait::async_trait;
-use serde::Deserialize;
-
-use super::super::RetrievalContext;
-use super::super::types::{NavigationDecision, QueryComplexity};
-use super::r#trait::{NodeEvaluation, RetrievalStrategy, StrategyCapabilities};
-use crate::document::{DocumentTree, NodeId, TocView};
-use crate::llm::LlmClient;
-use crate::llm::memo::{MemoKey, MemoOpType, MemoStore, MemoValue};
-use crate::utils::fingerprint::Fingerprint;
-
-/// LLM response for a single node in batch evaluation.
-#[derive(Debug, Clone, Deserialize)]
-struct NodeScore {
-    /// 1-based index matching the order in the prompt.
-    index: usize,
-    /// Relevance score (0-100, will be normalized to 0-1).
-    relevance: u8,
-    /// Decision: "answer", "explore", or "skip".
-    action: String,
-    /// Optional reasoning.
-    #[serde(default)]
-    reasoning: Option<String>,
-}
-
-/// LLM response for batch node evaluation.
-#[derive(Debug, Clone, Deserialize)]
-struct BatchResponse {
-    /// Analysis reasoning.
-    #[serde(default)]
-    reasoning: String,
-    /// Scored nodes.
-    nodes: Vec<NodeScore>,
-}
-
-/// LLM response for single-node evaluation (fallback).
-#[derive(Debug, Clone, Deserialize)]
-struct NavigationResponse {
-    /// Relevance score (0-100, will be normalized to 0-1).
-    relevance: u8,
-    /// Decision: "answer", "explore", or "skip".
-    action: String,
-    /// Optional reasoning.
-    #[serde(default)]
-    reasoning: Option<String>,
-}
-
-/// LLM-based retrieval strategy.
-///
-/// Uses an LLM to reason about which nodes are most relevant
-/// to the query. Includes ToC context for better navigation decisions.
-///
-/// # Batch Evaluation
-///
-/// When multiple nodes need scoring, they are sent in a single LLM call
-/// instead of one call per node. This reduces latency from O(N) LLM calls
-/// to O(1).
-///
-/// # Example
-///
-/// ```rust,no_run
-/// use vectorless::retrieval::strategy::LlmStrategy;
-/// use vectorless::llm::LlmClient;
-///
-/// let client = LlmClient::with_defaults();
-/// let strategy = LlmStrategy::new(client)
-///     .with_toc_context(true);
-/// ```
-#[derive(Clone)]
-pub struct LlmStrategy {
-    /// The LLM client.
-    client: LlmClient,
-    /// System prompt for single-node navigation.
-    system_prompt: String,
-    /// System prompt for batch evaluation.
-    batch_system_prompt: String,
-    /// ToC view generator.
-    toc_view: TocView,
-    /// Whether to include ToC context in prompts.
-    include_toc: bool,
-    /// Memo store for caching LLM evaluations.
-    memo_store: Option<MemoStore>,
-}
-
-impl LlmStrategy {
-    /// Create a new LLM strategy.
-    pub fn new(client: LlmClient) -> Self {
-        Self {
-            client,
-            system_prompt: Self::default_system_prompt(),
-            batch_system_prompt: Self::default_batch_system_prompt(),
-            toc_view: TocView::new(),
-            include_toc: true,
-            memo_store: None,
-        }
-    }
-
-    /// Create with default LLM client.
-    pub fn with_defaults() -> Self {
-        Self::new(LlmClient::with_defaults())
-    }
-
-    /// Set custom system prompt.
-    pub fn with_system_prompt(mut self, prompt: String) -> Self {
-        self.system_prompt = prompt;
-        self
-    }
-
-    /// Enable or disable ToC context in prompts.
-    pub fn with_toc_context(mut self, include: bool) -> Self {
-        self.include_toc = include;
-        self
-    }
-
-    /// Add memo store for caching LLM evaluations.
-    ///
-    /// When enabled, node evaluations are cached based on prompt fingerprints,
-    /// avoiding redundant LLM calls for the same node+query combinations.
-    pub fn with_memo_store(mut self, store: MemoStore) -> Self {
-        self.memo_store = Some(store);
-        self
-    }
-
-    /// Default system prompt for single-node navigation.
-    fn default_system_prompt() -> String {
-        r#"You are a document navigation assistant. Your task is to help find the most relevant sections in a document tree.
-
-Given a query and document context (Table of Contents + current node), determine:
-1. The relevance of this node (0-100)
-2. The best action: "answer" (this node contains the answer), "explore" (check children), or "skip" (not relevant)
-
-Respond in JSON format:
-{"relevance": <0-100>, "action": "<answer|explore|skip>", "reasoning": "<brief explanation>"}
-
-Be concise and focused on finding the most relevant information."#.to_string()
-    }
-
-    /// Default system prompt for batch node evaluation.
-    fn default_batch_system_prompt() -> String {
-        r#"You are a document navigation assistant. Score the relevance of multiple document sections against a user query.
-
-CRITICAL: Respond with ONLY valid JSON (no markdown code blocks).
-
-Response format:
-{
-  "reasoning": "Brief analysis of the query",
-  "nodes": [
-    {"index": 1, "relevance": 85, "action": "answer", "reason": "Why relevant"},
-    {"index": 2, "relevance": 30, "action": "skip", "reason": "Why not relevant"}
-  ]
-}
-
-Rules:
-- index: MUST be the number from [N] brackets in the input
-- relevance: 0-100 (how relevant this section is to the query)
-- action: one of "answer", "explore", "skip"
-- Score ALL provided nodes, not just the top ones
-- Be concise in reasons"#.to_string()
-    }
-
-    /// Build the navigation prompt for a single node.
-    fn build_prompt(
-        &self,
-        tree: &DocumentTree,
-        node_id: NodeId,
-        context: &RetrievalContext,
-    ) -> String {
-        let node = tree.get(node_id);
-        let children = tree.children(node_id);
-
-        // Build current node info
-        let node_info = if let Some(n) = node {
-            let summary = if n.summary.is_empty() {
-                // Use first 200 chars of content if no summary
-                &n.content[..200.min(n.content.len())]
-            } else {
-                &n.summary
-            };
-            format!(
-                "Title: {}\nSummary: {}\nDepth: {}\nChildren: {}",
-                n.title,
-                summary,
-                n.depth,
-                children.len()
-            )
-        } else {
-            "Node not found".to_string()
-        };
-
-        // Build ToC context if enabled
-        let toc_context = if self.include_toc {
-            let toc = self.toc_view.generate_from(tree, node_id);
-            let toc_markdown = self.toc_view.format_markdown(&toc);
-            // Limit ToC size for token efficiency
-            let toc_preview: String = toc_markdown.chars().take(1000).collect();
-            format!(
-                "\n\nDocument ToC (from this node):\n```\n{}\n```\n",
-                toc_preview
-            )
-        } else {
-            String::new()
-        };
-
-        format!(
-            "Query: {}\n{}Current Node:\n{}\n\nWhat is the relevance and action?",
-            context.query, toc_context, node_info
-        )
-    }
-
-    /// Build a batch prompt that presents all nodes at once.
-    fn build_batch_prompt(
-        &self,
-        tree: &DocumentTree,
-        node_ids: &[NodeId],
-        context: &RetrievalContext,
-    ) -> String {
-        // Collect node descriptions
-        let node_descriptions: Vec<String> = node_ids
-            .iter()
-            .enumerate()
-            .filter_map(|(i, &node_id)| {
-                let node = tree.get(node_id)?;
-                let children = tree.children(node_id);
-                let summary = if node.summary.is_empty() {
-                    let end = 200.min(node.content.len());
-                    &node.content[..end]
-                } else {
-                    &node.summary
-                };
-                Some(format!(
-                    "[{}] Title: \"{}\"\n     Summary: \"{}\"\n     Depth: {}, Children: {}",
-                    i + 1,
-                    node.title,
-                    summary,
-                    node.depth,
-                    children.len()
-                ))
-            })
-            .collect();
-
-        let nodes_str = node_descriptions.join("\n\n");
-
-        // Optional ToC context from the first node's parent scope
-        let toc_context = if self.include_toc && !node_ids.is_empty() {
-            let toc = self.toc_view.generate_from(tree, node_ids[0]);
-            let toc_markdown = self.toc_view.format_markdown(&toc);
-            let toc_preview: String = toc_markdown.chars().take(800).collect();
-            format!("\n\nDocument ToC:\n{}\n", toc_preview)
-        } else {
-            String::new()
-        };
-
-        format!(
-            "USER QUERY: {}\n{}SECTIONS TO SCORE ({} entries):\n{}\n\nScore ALL sections. Respond with ONLY the JSON object:",
-            context.query,
-            toc_context,
-            node_ids.len(),
-            nodes_str
-        )
-    }
-
-    /// Build a memo cache key for a single node evaluation.
-    fn node_eval_cache_key(&self, node_id: NodeId, context: &RetrievalContext) -> MemoKey {
-        let mut parts = String::new();
-        parts.push_str(&context.query);
-        parts.push_str(":node:");
-        // Use the NodeId debug representation as part of the fingerprint
-        parts.push_str(&format!("{:?}", node_id));
-        let fp = Fingerprint::from_str(&parts);
-        MemoKey {
-            op_type: MemoOpType::NodeEvaluation,
-            input_fp: fp,
-            model_id: None,
-            version: 1,
-            context_fp: Fingerprint::zero(),
-        }
-    }
-
-    /// Build a memo cache key for a batch evaluation.
-    fn batch_eval_cache_key(&self, node_ids: &[NodeId], context: &RetrievalContext) -> MemoKey {
-        let mut parts = String::new();
-        parts.push_str(&context.query);
-        parts.push_str(":batch:");
-        for id in node_ids {
-            parts.push_str(&format!("{:?}", id));
-            parts.push(',');
-        }
-        let fp = Fingerprint::from_str(&parts);
-        MemoKey {
-            op_type: MemoOpType::NodeEvaluation,
-            input_fp: fp,
-            model_id: None,
-            version: 1,
-            context_fp: Fingerprint::zero(),
-        }
-    }
-
-    /// Try to deserialize a cached NodeEvaluation from MemoValue.
-    fn deserialize_cached_eval(&self, value: &MemoValue) -> Option<NodeEvaluation> {
-        match value {
-            MemoValue::Json(json) => serde_json::from_value(json.clone()).ok(),
-            _ => None,
-        }
-    }
-
-    /// Parse LLM response to evaluation for a single node.
-    fn parse_response(
-        &self,
-        response: &str,
-        tree: &DocumentTree,
-        node_id: NodeId,
-    ) -> NodeEvaluation {
-        // Try to parse as JSON
-        if let Ok(parsed) = serde_json::from_str::<NavigationResponse>(response) {
-            let score = (parsed.relevance as f32 / 100.0).clamp(0.0, 1.0);
-            let decision = match parsed.action.to_lowercase().as_str() {
-                "answer" => NavigationDecision::ThisIsTheAnswer,
-                "explore" => {
-                    if tree.is_leaf(node_id) {
-                        NavigationDecision::ThisIsTheAnswer
-                    } else {
-                        NavigationDecision::ExploreMore
-                    }
-                }
-                _ => NavigationDecision::Skip,
-            };
-
-            return NodeEvaluation {
-                score,
-                decision,
-                reasoning: parsed.reasoning,
-            };
-        }
-
-        // Fallback: try to extract relevance from text
-        let score = response
-            .lines()
-            .find_map(|line| {
-                let lower = line.to_lowercase();
-                if lower.contains("relevance") || lower.contains("score") {
-                    lower
-                        .split(|c: char| !c.is_numeric() && c != '.')
-                        .filter_map(|s| s.parse::<f32>().ok())
-                        .filter(|&s| (0.0..=100.0).contains(&s))
-                        .map(|v| v / 100.0)
-                        .next()
-                } else {
-                    None
-                }
-            })
-            .unwrap_or(0.5);
-
-        NodeEvaluation {
-            score,
-            decision: if tree.is_leaf(node_id) {
-                NavigationDecision::ThisIsTheAnswer
-            } else {
-                NavigationDecision::ExploreMore
-            },
-            reasoning: Some(format!(
-                "Parsed from response: {}...",
-                &response[..100.min(response.len())]
-            )),
-        }
-    }
-
-    /// Parse a batch LLM response into per-node evaluations.
-    ///
-    /// Returns evaluations in the same order as the input `node_ids`.
-    /// Nodes that the LLM didn't score get a default evaluation.
-    fn parse_batch_response(
-        &self,
-        response: &str,
-        tree: &DocumentTree,
-        node_ids: &[NodeId],
-    ) -> Vec<NodeEvaluation> {
-        // Try JSON parse
-        if let Ok(batch) = serde_json::from_str::<BatchResponse>(response) {
-            let mut evaluations = vec![
-                NodeEvaluation {
-                    score: 0.3,
-                    decision: NavigationDecision::ExploreMore,
-                    reasoning: Some("Not scored by LLM (batch fallback)".to_string()),
-                };
-                node_ids.len()
-            ];
-
-            for node_score in batch.nodes {
-                let idx = node_score.index.saturating_sub(1);
-                if idx < node_ids.len() {
-                    let node_id = node_ids[idx];
-                    let score = (node_score.relevance as f32 / 100.0).clamp(0.0, 1.0);
-                    let decision = match node_score.action.to_lowercase().as_str() {
-                        "answer" => NavigationDecision::ThisIsTheAnswer,
-                        "explore" => {
-                            if tree.is_leaf(node_id) {
-                                NavigationDecision::ThisIsTheAnswer
-                            } else {
-                                NavigationDecision::ExploreMore
-                            }
-                        }
-                        _ => NavigationDecision::Skip,
-                    };
-                    evaluations[idx] = NodeEvaluation {
-                        score,
-                        decision,
-                        reasoning: node_score.reasoning,
-                    };
-                }
-            }
-
-            return evaluations;
-        }
-
-        // Fallback: could not parse batch, return defaults
-        tracing::warn!(
-            "Failed to parse batch LLM response, using defaults for {} nodes",
-            node_ids.len()
-        );
-        node_ids
-            .iter()
-            .map(|&node_id| NodeEvaluation {
-                score: 0.5,
-                decision: if tree.is_leaf(node_id) {
-                    NavigationDecision::ThisIsTheAnswer
-                } else {
-                    NavigationDecision::ExploreMore
-                },
-                reasoning: Some("Batch parse fallback".to_string()),
-            })
-            .collect()
-    }
-}
-
-#[async_trait]
-impl RetrievalStrategy for LlmStrategy {
-    async fn evaluate_node(
-        &self,
-        tree: &DocumentTree,
-        node_id: NodeId,
-        context: &RetrievalContext,
-    ) -> NodeEvaluation {
-        // Check memo cache
-        if let Some(ref store) = self.memo_store {
-            let cache_key = self.node_eval_cache_key(node_id, context);
-            if let Some(cached) = store.get(&cache_key) {
-                if let Some(eval) = self.deserialize_cached_eval(&cached) {
-                    tracing::debug!("Memo cache hit for node evaluation (node={:?})", node_id);
-                    return eval;
-                }
-            }
-        }
-
-        let prompt = self.build_prompt(tree, node_id, context);
-
-        let result = match self.client.complete(&self.system_prompt, &prompt).await {
-            Ok(response) => self.parse_response(&response, tree, node_id),
-            Err(e) => {
-                tracing::warn!("LLM evaluation failed: {}", e);
-                NodeEvaluation {
-                    score: 0.5,
-                    decision: if tree.is_leaf(node_id) {
-                        NavigationDecision::ThisIsTheAnswer
-                    } else {
-                        NavigationDecision::ExploreMore
-                    },
-                    reasoning: Some(format!("LLM error: {}", e)),
-                }
-            }
-        };
-
-        // Cache the result
-        if let Some(ref store) = self.memo_store {
-            let cache_key = self.node_eval_cache_key(node_id, context);
-            if let Ok(json) = serde_json::to_value(&result) {
-                let tokens = (prompt.len() / 4) as u64;
-                store.put_with_tokens(cache_key, MemoValue::Json(json), tokens);
-            }
-        }
-
-        result
-    }
-
-    async fn evaluate_nodes(
-        &self,
-        tree: &DocumentTree,
-        node_ids: &[NodeId],
-        context: &RetrievalContext,
-    ) -> Vec<NodeEvaluation> {
-        if node_ids.is_empty() {
-            return Vec::new();
-        }
-
-        // Single node: use the simpler single-node prompt
-        if node_ids.len() == 1 {
-            return vec![self.evaluate_node(tree, node_ids[0], context).await];
-        }
-
-        // Check memo cache for the entire batch
-        if let Some(ref store) = self.memo_store {
-            let cache_key = self.batch_eval_cache_key(node_ids, context);
-            if let Some(cached) = store.get(&cache_key) {
-                if let MemoValue::Json(json) = &cached {
-                    if let Ok(evals) = serde_json::from_value::<Vec<NodeEvaluation>>(json.clone()) {
-                        if evals.len() == node_ids.len() {
-                            tracing::debug!(
-                                "Memo cache hit for batch evaluation ({} nodes)",
-                                node_ids.len()
-                            );
-                            return evals;
-                        }
-                    }
-                }
-            }
-        }
-
-        // Batch: send all nodes in one LLM call
-        let prompt = self.build_batch_prompt(tree, node_ids, context);
-
-        let result = match self
-            .client
-            .complete(&self.batch_system_prompt, &prompt)
-            .await
-        {
-            Ok(response) => self.parse_batch_response(&response, tree, node_ids),
-            Err(e) => {
-                tracing::warn!(
-                    "Batch LLM evaluation failed ({}), falling back to single evaluation: {}",
-                    node_ids.len(),
-                    e
-                );
-                // Fallback: evaluate individually (still works, just slower)
-                let mut results = Vec::with_capacity(node_ids.len());
-                for &node_id in node_ids {
-                    results.push(self.evaluate_node(tree, node_id, context).await);
-                }
-                results
-            }
-        };
-
-        // Cache the batch result
-        if let Some(ref store) = self.memo_store {
-            let cache_key = self.batch_eval_cache_key(node_ids, context);
-            if let Ok(json) = serde_json::to_value(&result) {
-                let tokens = (prompt.len() / 4) as u64;
-                store.put_with_tokens(cache_key, MemoValue::Json(json), tokens);
-            }
-        }
-
-        result
-    }
-
-    fn name(&self) -> &'static str {
-        "llm"
-    }
-
-    fn capabilities(&self) -> StrategyCapabilities {
-        StrategyCapabilities {
-            uses_llm: true,
-            uses_embeddings: false,
-            supports_sufficiency: true,
-            typical_latency_ms: 500,
-        }
-    }
-
-    fn suitable_for_complexity(&self, complexity: QueryComplexity) -> bool {
-        matches!(
-            complexity,
-            QueryComplexity::Medium | QueryComplexity::Complex
-        )
-    }
-}
diff --git a/rust/src/retrieval/strategy/mod.rs b/rust/src/retrieval/strategy/mod.rs
deleted file mode 100644
index ebf042dc..00000000
--- a/rust/src/retrieval/strategy/mod.rs
+++ /dev/null
@@ -1,25 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Retrieval strategies for different query types.
-//!
-//! This module provides several retrieval strategies:
-//!
-//! - **KeywordStrategy**: Fast keyword matching using TF-IDF
-//! - **LlmStrategy**: LLM-powered reasoning with ToC context
-//! - **HybridStrategy**: BM25 pre-filter + LLM refinement (recommended)
-//! - **CrossDocumentStrategy**: Multi-document retrieval with result aggregation
-//! - **PageRangeStrategy**: Filter by page range before retrieval
-
-mod cross_document;
-mod hybrid;
-mod keyword;
-mod llm;
-mod page_range;
-mod r#trait;
-
-pub use cross_document::{CrossDocumentConfig, CrossDocumentStrategy};
-pub use hybrid::{HybridConfig, HybridStrategy};
-pub use keyword::KeywordStrategy;
-pub use llm::LlmStrategy;
-pub use r#trait::RetrievalStrategy;
diff --git a/rust/src/retrieval/strategy/page_range.rs b/rust/src/retrieval/strategy/page_range.rs
deleted file mode 100644
index 1dc26900..00000000
--- a/rust/src/retrieval/strategy/page_range.rs
+++ /dev/null
@@ -1,434 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Page-range retrieval strategy.
-//!
-//! Filters document nodes by page range before applying an inner strategy.
-//! Useful when the user knows approximately where the information is located.
-
-use async_trait::async_trait;
-
-use super::r#trait::{NodeEvaluation, RetrievalStrategy, StrategyCapabilities};
-use crate::document::{DocumentTree, NodeId};
-use crate::retrieval::RetrievalContext;
-use crate::retrieval::types::{NavigationDecision, QueryComplexity};
-
-/// A page range for filtering.
-#[derive(Debug, Clone, Copy, PartialEq, Eq)]
-pub struct PageRange {
-    /// Start page (inclusive, 1-indexed).
-    pub start: usize,
-    /// End page (inclusive).
-    pub end: usize,
-}
-
-impl PageRange {
-    /// Create a new page range.
-    pub fn new(start: usize, end: usize) -> Self {
-        Self { start, end }
-    }
-
-    /// Create a range from a single page.
-    pub fn single(page: usize) -> Self {
-        Self {
-            start: page,
-            end: page,
-        }
-    }
-
-    /// Create a range starting from a page to the end.
-    pub fn from(start: usize) -> Self {
-        Self {
-            start,
-            end: usize::MAX,
-        }
-    }
-
-    /// Create a range from the beginning to a page.
-    pub fn until(end: usize) -> Self {
-        Self { start: 1, end }
-    }
-
-    /// Check if a page is within this range.
-    pub fn contains(&self, page: usize) -> bool {
-        page >= self.start && page <= self.end
-    }
-
-    /// Check if this range overlaps with another.
-    pub fn overlaps(&self, other: &PageRange) -> bool {
-        self.start <= other.end && other.start <= self.end
-    }
-
-    /// Get the number of pages in this range.
-    pub fn len(&self) -> usize {
-        if self.end == usize::MAX {
-            usize::MAX
-        } else {
-            self.end.saturating_sub(self.start) + 1
-        }
-    }
-
-    /// Check if this is an empty range.
-    pub fn is_empty(&self) -> bool {
-        self.start > self.end
-    }
-}
-
-impl Default for PageRange {
-    fn default() -> Self {
-        Self {
-            start: 1,
-            end: usize::MAX,
-        }
-    }
-}
-
-/// Configuration for page-range retrieval.
-#[derive(Debug, Clone)]
-pub struct PageRangeConfig {
-    /// The page range to search within.
-    pub range: PageRange,
-    /// Whether to include nodes that span across the boundary.
-    pub include_boundary_nodes: bool,
-    /// Whether to expand the range slightly for context.
-    pub expand_context_pages: usize,
-    /// Minimum overlap ratio for a node to be included.
-    pub min_overlap_ratio: f32,
-}
-
-impl Default for PageRangeConfig {
-    fn default() -> Self {
-        Self {
-            range: PageRange::default(),
-            include_boundary_nodes: true,
-            expand_context_pages: 0,
-            min_overlap_ratio: 0.1,
-        }
-    }
-}
-
-impl PageRangeConfig {
-    /// Create a new configuration with a page range.
-    pub fn new(range: PageRange) -> Self {
-        Self {
-            range,
-            ..Default::default()
-        }
-    }
-
-    /// Set the page range.
-    #[must_use]
-    pub fn with_range(mut self, start: usize, end: usize) -> Self {
-        self.range = PageRange::new(start, end);
-        self
-    }
-
-    /// Include nodes that span the boundary.
-    #[must_use]
-    pub fn with_boundary_nodes(mut self, include: bool) -> Self {
-        self.include_boundary_nodes = include;
-        self
-    }
-
-    /// Expand the range by N pages for context.
-    #[must_use]
-    pub fn with_context_expansion(mut self, pages: usize) -> Self {
-        self.expand_context_pages = pages;
-        self
-    }
-}
-
-/// Page-range retrieval strategy.
-///
-/// Filters nodes by their page location before delegating to an inner strategy.
-/// This is useful when:
-/// - The user knows approximately where information is located
-/// - Searching large PDFs where certain sections are known
-/// - Implementing "search within pages X-Y" functionality
-///
-/// # Example
-///
-/// ```rust,ignore
-/// use vectorless::retrieval::strategy::{PageRangeStrategy, KeywordStrategy, PageRange};
-///
-/// // Search only pages 10-20
-/// let strategy = PageRangeStrategy::new(
-///     Box::new(KeywordStrategy::new()),
-///     PageRange::new(10, 20),
-/// );
-///
-/// // Search from page 50 onwards
-/// let strategy = PageRangeStrategy::new(
-///     Box::new(LlmStrategy::new(client)),
-///     PageRange::from(50),
-/// );
-/// ```
-pub struct PageRangeStrategy {
-    /// Inner strategy for filtered nodes.
-    inner: Box<dyn RetrievalStrategy>,
-    /// Configuration.
-    config: PageRangeConfig,
-}
-
-impl PageRangeStrategy {
-    /// Create a new page-range strategy.
-    pub fn new(inner: Box<dyn RetrievalStrategy>, range: PageRange) -> Self {
-        Self {
-            inner,
-            config: PageRangeConfig::new(range),
-        }
-    }
-
-    /// Create with configuration.
-    pub fn with_config(inner: Box<dyn RetrievalStrategy>, config: PageRangeConfig) -> Self {
-        Self { inner, config }
-    }
-
-    /// Set whether to include boundary nodes.
-    #[must_use]
-    pub fn with_boundary_nodes(mut self, include: bool) -> Self {
-        self.config.include_boundary_nodes = include;
-        self
-    }
-
-    /// Set context expansion pages.
-    #[must_use]
-    pub fn with_context_expansion(mut self, pages: usize) -> Self {
-        self.config.expand_context_pages = pages;
-        self
-    }
-
-    /// Get the effective range after context expansion.
-    fn effective_range(&self) -> PageRange {
-        if self.config.expand_context_pages == 0 {
-            return self.config.range;
-        }
-
-        PageRange {
-            start: self
-                .config
-                .range
-                .start
-                .saturating_sub(self.config.expand_context_pages),
-            end: self
-                .config
-                .range
-                .end
-                .saturating_add(self.config.expand_context_pages),
-        }
-    }
-
-    /// Check if a node is within the page range.
-    fn is_node_in_range(&self, tree: &DocumentTree, node_id: NodeId) -> bool {
-        let effective_range = self.effective_range();
-
-        if let Some(node) = tree.get(node_id) {
-            // Check if node has page information
-            let (start_page, end_page) = node
-                .start_page
-                .zip(node.end_page)
-                .unwrap_or((1, usize::MAX));
-
-            let node_range = PageRange::new(start_page, end_page);
-
-            // Check for overlap
-            if effective_range.overlaps(&node_range) {
-                // Calculate overlap ratio
-                let overlap_start = effective_range.start.max(node_range.start);
-                let overlap_end = effective_range.end.min(node_range.end);
-
-                if overlap_start <= overlap_end {
-                    let overlap_pages = overlap_end - overlap_start + 1;
-                    let node_pages = node_range.len();
-
-                    let ratio = overlap_pages as f32 / node_pages as f32;
-                    return ratio >= self.config.min_overlap_ratio;
-                }
-            }
-        }
-
-        // If no page info, include the node (conservative approach)
-        true
-    }
-
-    /// Filter nodes by page range.
-    fn filter_by_range(
-        &self,
-        tree: &DocumentTree,
-        node_ids: &[NodeId],
-    ) -> (Vec<(usize, NodeId)>, Vec<usize>) {
-        let mut included = Vec::new();
-        let mut excluded = Vec::new();
-
-        for (idx, &node_id) in node_ids.iter().enumerate() {
-            if self.is_node_in_range(tree, node_id) {
-                included.push((idx, node_id));
-            } else {
-                excluded.push(idx);
-            }
-        }
-
-        (included, excluded)
-    }
-}
-
-#[async_trait]
-impl RetrievalStrategy for PageRangeStrategy {
-    async fn evaluate_node(
-        &self,
-        tree: &DocumentTree,
-        node_id: NodeId,
-        context: &RetrievalContext,
-    ) -> NodeEvaluation {
-        // Check if node is in range
-        if !self.is_node_in_range(tree, node_id) {
-            return NodeEvaluation {
-                score: 0.0,
-                decision: NavigationDecision::Skip,
-                reasoning: Some("Node outside page range".to_string()),
-            };
-        }
-
-        // Delegate to inner strategy
-        self.inner.evaluate_node(tree, node_id, context).await
-    }
-
-    async fn evaluate_nodes(
-        &self,
-        tree: &DocumentTree,
-        node_ids: &[NodeId],
-        context: &RetrievalContext,
-    ) -> Vec<NodeEvaluation> {
-        if node_ids.is_empty() {
-            return Vec::new();
-        }
-
-        // Filter nodes by page range
-        let (included, excluded) = self.filter_by_range(tree, node_ids);
-
-        // Create result vector with default values
-        let mut results = vec![NodeEvaluation::default(); node_ids.len()];
-
-        // Mark excluded nodes as skipped
-        for idx in &excluded {
-            results[*idx] = NodeEvaluation {
-                score: 0.0,
-                decision: NavigationDecision::Skip,
-                reasoning: Some(format!(
-                    "Outside page range {}-{}",
-                    self.config.range.start, self.config.range.end
-                )),
-            };
-        }
-
-        // Evaluate included nodes with inner strategy
-        if !included.is_empty() {
-            let included_ids: Vec<NodeId> = included.iter().map(|(_, id)| *id).collect();
-            let inner_results = self
-                .inner
-                .evaluate_nodes(tree, &included_ids, context)
-                .await;
-
-            // Map results back to original positions
-            for ((orig_idx, _), eval) in included.into_iter().zip(inner_results.into_iter()) {
-                results[orig_idx] = eval;
-            }
-        }
-
-        results
-    }
-
-    fn name(&self) -> &'static str {
-        "page_range"
-    }
-
-    fn capabilities(&self) -> StrategyCapabilities {
-        let inner_caps = self.inner.capabilities();
-        StrategyCapabilities {
-            uses_llm: inner_caps.uses_llm,
-            uses_embeddings: inner_caps.uses_embeddings,
-            supports_sufficiency: inner_caps.supports_sufficiency,
-            typical_latency_ms: inner_caps.typical_latency_ms, // Same as inner
-        }
-    }
-
-    fn suitable_for_complexity(&self, complexity: QueryComplexity) -> bool {
-        self.inner.suitable_for_complexity(complexity)
-    }
-
-    fn estimate_cost(&self, node_count: usize) -> super::r#trait::StrategyCost {
-        // Estimate that only a fraction of nodes are in range
-        let estimated_in_range = (node_count as f32 * 0.3) as usize;
-        self.inner.estimate_cost(estimated_in_range.max(1))
-    }
-}
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-
-    #[test]
-    fn test_page_range_creation() {
-        let range = PageRange::new(10, 20);
-        assert_eq!(range.start, 10);
-        assert_eq!(range.end, 20);
-    }
-
-    #[test]
-    fn test_page_range_contains() {
-        let range = PageRange::new(10, 20);
-        assert!(range.contains(10));
-        assert!(range.contains(15));
-        assert!(range.contains(20));
-        assert!(!range.contains(9));
-        assert!(!range.contains(21));
-    }
-
-    #[test]
-    fn test_page_range_single() {
-        let range = PageRange::single(5);
-        assert!(range.contains(5));
-        assert!(!range.contains(4));
-        assert!(!range.contains(6));
-    }
-
-    #[test]
-    fn test_page_range_from() {
-        let range = PageRange::from(10);
-        assert!(range.contains(10));
-        assert!(range.contains(100));
-        assert!(range.contains(usize::MAX));
-        assert!(!range.contains(9));
-    }
-
-    #[test]
-    fn test_page_range_until() {
-        let range = PageRange::until(20);
-        assert!(range.contains(1));
-        assert!(range.contains(20));
-        assert!(!range.contains(21));
-    }
-
-    #[test]
-    fn test_page_range_overlaps() {
-        let r1 = PageRange::new(10, 20);
-        let r2 = PageRange::new(15, 25);
-        let r3 = PageRange::new(21, 30);
-
-        assert!(r1.overlaps(&r2));
-        assert!(!r1.overlaps(&r3));
-    }
-
-    #[test]
-    fn test_page_range_len() {
-        let range = PageRange::new(10, 20);
-        assert_eq!(range.len(), 11);
-    }
-
-    #[test]
-    fn test_config_default() {
-        let config = PageRangeConfig::default();
-        assert_eq!(config.range.start, 1);
-        assert!(config.include_boundary_nodes);
-    }
-}
diff --git a/rust/src/retrieval/strategy/trait.rs b/rust/src/retrieval/strategy/trait.rs
deleted file mode 100644
index a9b5958a..00000000
--- a/rust/src/retrieval/strategy/trait.rs
+++ /dev/null
@@ -1,113 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Strategy trait definition.
-
-use async_trait::async_trait;
-
-use super::super::RetrievalContext;
-use super::super::types::{NavigationDecision, QueryComplexity};
-use crate::document::{DocumentTree, NodeId};
-
-/// Result of evaluating a single node.
-#[derive(Debug, Clone, serde::Serialize, serde::Deserialize)]
-pub struct NodeEvaluation {
-    /// Relevance score (0.0 - 1.0).
-    pub score: f32,
-    /// Recommended action for this node.
-    pub decision: NavigationDecision,
-    /// Optional reasoning for the decision.
-    pub reasoning: Option<String>,
-}
-
-impl Default for NodeEvaluation {
-    fn default() -> Self {
-        Self {
-            score: 0.5,
-            decision: NavigationDecision::ExploreMore,
-            reasoning: None,
-        }
-    }
-}
-
-/// Capabilities of a retrieval strategy.
-#[derive(Debug, Clone, Copy, Default)]
-pub struct StrategyCapabilities {
-    /// Whether this strategy uses LLM calls.
-    pub uses_llm: bool,
-    /// Whether this strategy uses embeddings.
-    pub uses_embeddings: bool,
-    /// Whether this strategy supports sufficiency checking.
-    pub supports_sufficiency: bool,
-    /// Typical latency in milliseconds.
-    pub typical_latency_ms: u64,
-}
-
-/// Trait for retrieval strategies.
-///
-/// A strategy determines how to navigate the document tree
-/// and score nodes for relevance to a query.
-#[async_trait]
-pub trait RetrievalStrategy: Send + Sync {
-    /// Evaluate a single node's relevance to the query.
-    ///
-    /// This is the core method that determines how relevant
-    /// a node is to the current query.
-    async fn evaluate_node(
-        &self,
-        tree: &DocumentTree,
-        node_id: NodeId,
-        context: &RetrievalContext,
-    ) -> NodeEvaluation;
-
-    /// Evaluate multiple nodes in batch.
-    ///
-    /// Default implementation calls evaluate_node for each node,
-    /// but strategies can override for efficiency (e.g., batch LLM calls).
-    async fn evaluate_nodes(
-        &self,
-        tree: &DocumentTree,
-        node_ids: &[NodeId],
-        context: &RetrievalContext,
-    ) -> Vec<NodeEvaluation> {
-        let mut results = Vec::with_capacity(node_ids.len());
-        for node_id in node_ids {
-            results.push(self.evaluate_node(tree, *node_id, context).await);
-        }
-        results
-    }
-
-    /// Get the name of this strategy.
-    fn name(&self) -> &str;
-
-    /// Get the capabilities of this strategy.
-    fn capabilities(&self) -> StrategyCapabilities;
-
-    /// Check if this strategy is suitable for the given query complexity.
-    fn suitable_for_complexity(&self, complexity: QueryComplexity) -> bool;
-
-    /// Estimate the cost of evaluating a set of nodes.
-    fn estimate_cost(&self, node_count: usize) -> StrategyCost {
-        StrategyCost {
-            llm_calls: if self.capabilities().uses_llm {
-                node_count
-            } else {
-                0
-            },
-            tokens: if self.capabilities().uses_llm {
-                node_count * 200
-            } else {
-                0
-            },
-        }
-    }
-}
-
-/// Cost estimate for a strategy operation.
-#[derive(Debug, Clone, Copy, Default)]
-pub struct StrategyCost {
-    /// Number of LLM calls.
-    pub llm_calls: usize,
-    /// Number of tokens.
-    pub tokens: usize,
-}
diff --git a/rust/src/retrieval/sufficiency/llm_judge.rs b/rust/src/retrieval/sufficiency/llm_judge.rs
deleted file mode 100644
index 0bb676f3..00000000
--- a/rust/src/retrieval/sufficiency/llm_judge.rs
+++ /dev/null
@@ -1,258 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! LLM-based sufficiency checker.
-//!
-//! Uses an LLM to judge whether collected content is sufficient.
-
-use async_trait::async_trait;
-use serde::{Deserialize, Serialize};
-
-use super::{SufficiencyChecker, SufficiencyLevel};
-use crate::config::SufficiencyConfig;
-use crate::llm::memo::{MemoKey, MemoOpType, MemoStore, MemoValue};
-use crate::utils::fingerprint::Fingerprint;
-
-/// LLM client trait for the judge.
-#[async_trait]
-pub trait LlmJudgeClient: Send + Sync {
-    /// Generate a completion.
-    async fn complete(&self, prompt: &str) -> Result<String, JudgeError>;
-}
-
-/// Error type for LLM judge.
-#[derive(Debug, thiserror::Error)]
-pub enum JudgeError {
-    #[error("LLM request failed: {0}")]
-    RequestFailed(String),
-    #[error("Failed to parse response: {0}")]
-    ParseError(String),
-}
-
-/// Response from LLM judge.
-#[derive(Debug, Clone, Serialize, Deserialize)]
-struct JudgeResponse {
-    /// Whether content is sufficient.
-    sufficient: bool,
-    /// Confidence level (0-1).
-    confidence: f32,
-    /// Optional reasoning.
-    #[serde(default)]
-    reasoning: Option<String>,
-}
-
-/// LLM-based sufficiency judge.
-///
-/// Uses an LLM to determine if the collected content
-/// is sufficient to answer the query.
-pub struct LlmJudge {
-    client: Box<dyn LlmJudgeClient>,
-    /// System prompt for the judge.
-    system_prompt: String,
-    /// Minimum confidence to consider sufficient.
-    confidence_threshold: f32,
-    /// Memo store for caching sufficiency judgments.
-    memo_store: Option<MemoStore>,
-}
-
-impl LlmJudge {
-    /// Create a new LLM judge.
-    pub fn new(client: Box<dyn LlmJudgeClient>) -> Self {
-        Self::with_config(client, &SufficiencyConfig::default())
-    }
-
-    /// Create a new LLM judge with configuration.
-    pub fn with_config(client: Box<dyn LlmJudgeClient>, config: &SufficiencyConfig) -> Self {
-        Self {
-            client,
-            system_prompt: Self::default_system_prompt(),
-            confidence_threshold: config.confidence_threshold,
-            memo_store: None,
-        }
-    }
-
-    /// Add memo store for caching sufficiency judgments.
-    ///
-    /// When enabled, sufficiency check results are cached based on
-    /// query+content fingerprints, avoiding redundant LLM calls.
-    pub fn with_memo_store(mut self, store: MemoStore) -> Self {
-        self.memo_store = Some(store);
-        self
-    }
-
-    /// Set confidence threshold.
-    pub fn with_confidence_threshold(mut self, threshold: f32) -> Self {
-        self.confidence_threshold = threshold;
-        self
-    }
-
-    fn default_system_prompt() -> String {
-        r#"You are a content sufficiency judge. Your task is to determine if the provided content is sufficient to answer the given query.
-
-Respond in JSON format:
-{"sufficient": <true|false>, "confidence": <0.0-1.0>, "reasoning": "<brief explanation>"}
-
-Guidelines:
-- "sufficient" should be true only if the content directly addresses the query
-- "confidence" should reflect how certain you are in your judgment
-- Consider: completeness, relevance, and accuracy of the information
-
-Be conservative - only mark as sufficient if you're confident the content answers the query."#
-            .to_string()
-    }
-
-    fn build_prompt(&self, query: &str, content: &str) -> String {
-        format!(
-            "{}\n\nQuery: {}\n\nContent:\n{}\n\nIs this content sufficient to answer the query?",
-            self.system_prompt, query, content
-        )
-    }
-
-    fn parse_response(&self, response: &str) -> (SufficiencyLevel, f32) {
-        // Try JSON parsing
-        if let Ok(parsed) = serde_json::from_str::<JudgeResponse>(response) {
-            let level = if parsed.sufficient && parsed.confidence >= self.confidence_threshold {
-                SufficiencyLevel::Sufficient
-            } else if parsed.confidence >= 0.5 {
-                SufficiencyLevel::PartialSufficient
-            } else {
-                SufficiencyLevel::Insufficient
-            };
-            return (level, parsed.confidence);
-        }
-
-        // Fallback: keyword analysis
-        let lower = response.to_lowercase();
-        let sufficient_keywords = ["sufficient", "yes", "complete", "enough"];
-        let insufficient_keywords = ["insufficient", "no", "incomplete", "not enough"];
-
-        let sufficient_count = sufficient_keywords
-            .iter()
-            .filter(|k| lower.contains(*k))
-            .count();
-        let insufficient_count = insufficient_keywords
-            .iter()
-            .filter(|k| lower.contains(*k))
-            .count();
-
-        if sufficient_count > insufficient_count {
-            (SufficiencyLevel::PartialSufficient, 0.6)
-        } else {
-            (SufficiencyLevel::Insufficient, 0.4)
-        }
-    }
-
-    /// Check sufficiency asynchronously.
-    pub async fn check_async(
-        &self,
-        query: &str,
-        content: &str,
-        _token_count: usize,
-    ) -> SufficiencyLevel {
-        // Check memo cache
-        if let Some(ref store) = self.memo_store {
-            let cache_key = self.build_cache_key(query, content);
-            if let Some(cached) = store.get(&cache_key) {
-                if let Some(level) = Self::deserialize_sufficiency(&cached) {
-                    tracing::debug!("Memo cache hit for sufficiency check");
-                    return level;
-                }
-            }
-        }
-
-        let prompt = self.build_prompt(query, content);
-
-        let result = match self.client.complete(&prompt).await {
-            Ok(response) => self.parse_response(&response).0,
-            Err(_) => SufficiencyLevel::Insufficient,
-        };
-
-        // Cache the result
-        if let Some(ref store) = self.memo_store {
-            let cache_key = self.build_cache_key(query, content);
-            let tokens = (prompt.len() / 4) as u64;
-            store.put_with_tokens(cache_key, MemoValue::Text(format!("{:?}", result)), tokens);
-        }
-
-        result
-    }
-
-    /// Build a cache key for sufficiency check.
-    fn build_cache_key(&self, query: &str, content: &str) -> MemoKey {
-        let mut input = String::with_capacity(query.len() + content.len() / 4);
-        input.push_str(query);
-        // Use only first 2000 chars of content for fingerprint to avoid
-        // giant cache keys — content prefix captures topic identity.
-        input.push_str(&content[..2000.min(content.len())]);
-        let fp = Fingerprint::from_str(&input);
-        MemoKey {
-            op_type: MemoOpType::SufficiencyCheck,
-            input_fp: fp,
-            model_id: None,
-            version: 1,
-            context_fp: Fingerprint::zero(),
-        }
-    }
-
-    /// Deserialize a SufficiencyLevel from a MemoValue.
-    fn deserialize_sufficiency(value: &MemoValue) -> Option<SufficiencyLevel> {
-        match value {
-            MemoValue::Text(s) => match s.as_str() {
-                "Sufficient" => Some(SufficiencyLevel::Sufficient),
-                "PartialSufficient" => Some(SufficiencyLevel::PartialSufficient),
-                "Insufficient" => Some(SufficiencyLevel::Insufficient),
-                _ => None,
-            },
-            _ => None,
-        }
-    }
-}
-
-impl SufficiencyChecker for LlmJudge {
-    fn check(&self, query: &str, content: &str, token_count: usize) -> SufficiencyLevel {
-        // For synchronous usage, we use a simple heuristic
-        // The async version should be preferred when possible
-
-        // Quick content analysis
-        if content.is_empty() {
-            return SufficiencyLevel::Insufficient;
-        }
-
-        // Check for query terms in content
-        let query_terms: Vec<&str> = query.split_whitespace().collect();
-        let content_lower = content.to_lowercase();
-
-        let matches: usize = query_terms
-            .iter()
-            .filter(|term| content_lower.contains(&term.to_lowercase()))
-            .count();
-
-        let coverage = if query_terms.is_empty() {
-            0.0
-        } else {
-            matches as f32 / query_terms.len() as f32
-        };
-
-        if coverage > 0.8 && token_count > 500 {
-            SufficiencyLevel::Sufficient
-        } else if coverage > 0.5 {
-            SufficiencyLevel::PartialSufficient
-        } else {
-            SufficiencyLevel::Insufficient
-        }
-    }
-
-    fn name(&self) -> &'static str {
-        "llm_judge"
-    }
-}
-
-/// Adapter to use LlmClient as LlmJudgeClient.
-#[async_trait]
-impl LlmJudgeClient for crate::llm::LlmClient {
-    async fn complete(&self, prompt: &str) -> Result<String, JudgeError> {
-        self.complete("You are a content sufficiency judge.", prompt)
-            .await
-            .map_err(|e| JudgeError::RequestFailed(e.to_string()))
-    }
-}
diff --git a/rust/src/retrieval/sufficiency/mod.rs b/rust/src/retrieval/sufficiency/mod.rs
deleted file mode 100644
index ab3501bf..00000000
--- a/rust/src/retrieval/sufficiency/mod.rs
+++ /dev/null
@@ -1,32 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Sufficiency checking for incremental retrieval.
-//!
-//! Determines when enough information has been collected to answer the query.
-
-mod llm_judge;
-mod threshold;
-
-pub use super::types::SufficiencyLevel;
-pub use llm_judge::LlmJudge;
-pub use threshold::ThresholdChecker;
-
-/// Trait for sufficiency checking strategies.
-pub trait SufficiencyChecker: Send + Sync {
-    /// Check if the collected content is sufficient to answer the query.
-    ///
-    /// # Arguments
-    ///
-    /// * `query` - The original query
-    /// * `content` - The collected content so far
-    /// * `token_count` - Approximate token count of content
-    ///
-    /// # Returns
-    ///
-    /// A `SufficiencyLevel` indicating whether to continue retrieving.
-    fn check(&self, query: &str, content: &str, token_count: usize) -> SufficiencyLevel;
-
-    /// Get the name of this checker.
-    fn name(&self) -> &str;
-}
diff --git a/rust/src/retrieval/sufficiency/threshold.rs b/rust/src/retrieval/sufficiency/threshold.rs
deleted file mode 100644
index 9fb48f8e..00000000
--- a/rust/src/retrieval/sufficiency/threshold.rs
+++ /dev/null
@@ -1,148 +0,0 @@
-// Copyright (c) 2026 vectorless developers
-// SPDX-License-Identifier: Apache-2.0
-
-//! Threshold-based sufficiency checker.
-//!
-//! Uses simple heuristics like token count and content length.
-
-use super::{SufficiencyChecker, SufficiencyLevel};
-use crate::config::SufficiencyConfig;
-
-/// Configuration for threshold-based checking.
-#[derive(Debug, Clone)]
-pub struct ThresholdConfig {
-    /// Minimum tokens for sufficiency.
-    pub min_tokens: usize,
-    /// Target tokens for full sufficiency.
-    pub target_tokens: usize,
-    /// Maximum tokens before stopping.
-    pub max_tokens: usize,
-    /// Minimum content length (characters).
-    pub min_content_length: usize,
-}
-
-impl Default for ThresholdConfig {
-    fn default() -> Self {
-        Self::from_config(&SufficiencyConfig::default())
-    }
-}
-
-impl ThresholdConfig {
-    /// Create from application config.
-    pub fn from_config(config: &SufficiencyConfig) -> Self {
-        Self {
-            min_tokens: config.min_tokens,
-            target_tokens: config.target_tokens,
-            max_tokens: config.max_tokens,
-            min_content_length: config.min_content_length,
-        }
-    }
-}
-
-/// Threshold-based sufficiency checker.
-///
-/// Uses simple token and length thresholds to determine
-/// when enough content has been collected.
-pub struct ThresholdChecker {
-    config: ThresholdConfig,
-}
-
-impl ThresholdChecker {
-    /// Create a new threshold checker with default config.
-    pub fn new() -> Self {
-        Self {
-            config: ThresholdConfig::default(),
-        }
-    }
-
-    /// Create a threshold checker with custom config.
-    pub fn with_config(config: ThresholdConfig) -> Self {
-        Self { config }
-    }
-
-    /// Estimate token count from content.
-    fn estimate_tokens(&self, content: &str) -> usize {
-        // Rough estimate: ~4 characters per token on average
-        content.len() / 4
-    }
-
-    /// Check content quality indicators.
-    fn check_quality(&self, content: &str) -> f32 {
-        let mut score = 0.0;
-
-        // Check for sentence endings (periods, question marks, etc.)
-        let sentence_endings = content.matches('.').count()
-            + content.matches('?').count()
-            + content.matches('!').count();
-        score += (sentence_endings as f32 * 0.05).min(0.3);
-
-        // Check for paragraph breaks
-        let paragraphs = content.matches("\n\n").count();
-        score += (paragraphs as f32 * 0.1).min(0.3);
-
-        // Check for structure markers
-        if content.contains(':') || content.contains('-') {
-            score += 0.1;
-        }
-
-        // Penalize very repetitive content
-        let words: Vec<&str> = content.split_whitespace().collect();
-        if words.len() > 10 {
-            let unique_ratio = words.iter().collect::<std::collections::HashSet<_>>().len() as f32
-                / words.len() as f32;
-            score += unique_ratio * 0.3;
-        }
-
-        score.min(1.0)
-    }
-}
-
-impl Default for ThresholdChecker {
-    fn default() -> Self {
-        Self::new()
-    }
-}
-
-impl SufficiencyChecker for ThresholdChecker {
-    fn check(&self, _query: &str, content: &str, token_count: usize) -> SufficiencyLevel {
-        let estimated_tokens = if token_count == 0 {
-            self.estimate_tokens(content)
-        } else {
-            token_count
-        };
-
-        // Check minimum content length
-        if content.len() < self.config.min_content_length {
-            return SufficiencyLevel::Insufficient;
-        }
-
-        // Check maximum tokens - always sufficient if we hit the limit
-        if estimated_tokens >= self.config.max_tokens {
-            return SufficiencyLevel::Sufficient;
-        }
-
-        // Check target tokens
-        if estimated_tokens >= self.config.target_tokens {
-            let quality = self.check_quality(content);
-            if quality > 0.5 {
-                return SufficiencyLevel::Sufficient;
-            } else {
-                return SufficiencyLevel::PartialSufficient;
-            }
-        }
-
-        // Check minimum tokens
-        if estimated_tokens >= self.config.min_tokens {
-            let quality = self.check_quality(content);
-            if quality > 0.7 {
-                return SufficiencyLevel::PartialSufficient;
-            }
-        }
-
-        SufficiencyLevel::Insufficient
-    }
-
-    fn name(&self) -> &'static str {
-        "threshold"
-    }
-}
diff --git a/rust/src/retrieval/types.rs b/rust/src/retrieval/types.rs
index 7a7baa1e..3d1e41e5 100644
--- a/rust/src/retrieval/types.rs
+++ b/rust/src/retrieval/types.rs
@@ -5,56 +5,6 @@
 
 use serde::{Deserialize, Serialize};
 
-use super::context::{PruningStrategy, TokenEstimation};
-use crate::document::NodeId;
-
-/// Query complexity level for adaptive strategy selection.
-#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
-pub enum QueryComplexity {
-    /// Simple queries that can be solved with keyword matching.
-    Simple,
-
-    /// Medium complexity queries requiring semantic understanding.
-    Medium,
-
-    /// Complex queries requiring deep LLM reasoning.
-    Complex,
-}
-
-impl Default for QueryComplexity {
-    fn default() -> Self {
-        Self::Medium
-    }
-}
-
-/// Strategy preference for retrieval.
-#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
-pub enum StrategyPreference {
-    /// Automatically select strategy based on query complexity.
-    Auto,
-
-    /// Force keyword-based strategy (fast, no LLM).
-    ForceKeyword,
-
-    /// Force LLM strategy (deep reasoning).
-    ForceLlm,
-
-    /// Force hybrid strategy (BM25 + LLM refinement).
-    ForceHybrid,
-
-    /// Force cross-document strategy (multi-document retrieval).
-    ForceCrossDocument,
-
-    /// Force page-range strategy (filter by page range).
-    ForcePageRange,
-}
-
-impl Default for StrategyPreference {
-    fn default() -> Self {
-        Self::Auto
-    }
-}
-
 /// Sufficiency level for incremental retrieval.
 #[derive(Debug, Clone, Copy, PartialEq, Eq)]
 pub enum SufficiencyLevel {
@@ -74,210 +24,62 @@ impl Default for SufficiencyLevel {
     }
 }
 
-/// Options for retrieval operations.
+/// Complete retrieval response.
 #[derive(Debug, Clone)]
-pub struct RetrieveOptions {
-    /// Maximum number of results to return.
-    pub top_k: usize,
-
-    /// Beam width for multi-path search.
-    pub beam_width: usize,
-
-    /// Maximum search iterations.
-    pub max_iterations: usize,
-
-    /// Whether to include node content in results.
-    pub include_content: bool,
-
-    /// Whether to include node summaries in results.
-    pub include_summaries: bool,
-
-    /// Minimum relevance score (0.0 - 1.0).
-    pub min_score: f32,
-
-    /// Strategy preference.
-    pub strategy: StrategyPreference,
-
-    /// Enable sufficiency checking for incremental retrieval.
-    pub sufficiency_check: bool,
-
-    /// Maximum tokens for sufficiency threshold.
-    pub max_tokens: usize,
-
-    /// Enable result caching.
-    pub enable_cache: bool,
+pub struct RetrieveResponse {
+    /// Retrieved results.
+    pub results: Vec<RetrievalResult>,
 
-    /// Pruning strategy for context building.
-    pub pruning_strategy: super::PruningStrategy,
+    /// Aggregated content.
+    pub content: String,
 
-    /// Token estimation mode.
-    pub token_estimation: super::TokenEstimation,
+    /// Overall confidence score.
+    pub confidence: f32,
 
-    /// Whether to use async context building for large documents.
-    pub use_async_context: bool,
+    /// Whether information is sufficient.
+    pub is_sufficient: bool,
 
-    /// Enable streaming retrieval results.
-    ///
-    /// When enabled, use `query_stream()` to receive incremental
-    /// `RetrieveEvent`s as each pipeline stage completes. When disabled
-    /// (default), the standard `query()` returns a single final result.
-    pub streaming: bool,
+    /// Strategy that was used.
+    pub strategy_used: String,
 
-    /// Cross-document graph for graph-aware retrieval boosting.
-    pub document_graph: Option<std::sync::Arc<crate::graph::DocumentGraph>>,
+    /// Reasoning chain explaining how results were found.
+    pub reasoning_chain: ReasoningChain,
 
-    /// Search fallback chain: algorithm names tried in order until min_score is met.
-    /// Options: "beam", "mcts", "pure_pilot".
-    /// Default: ["beam", "mcts", "pure_pilot"]
-    pub fallback_chain: Vec<String>,
+    /// Total tokens used.
+    pub tokens_used: usize,
 }
 
-impl Default for RetrieveOptions {
+impl Default for RetrieveResponse {
     fn default() -> Self {
         Self {
-            top_k: 5,
-            beam_width: 3,
-            max_iterations: 10,
-            include_content: true,
-            include_summaries: true,
-            min_score: 0.1,
-            strategy: StrategyPreference::Auto,
-            sufficiency_check: true,
-            max_tokens: 4000,
-            enable_cache: true,
-            pruning_strategy: super::PruningStrategy::default(),
-            token_estimation: super::TokenEstimation::default(),
-            use_async_context: false,
-            streaming: false,
-            document_graph: None,
-            fallback_chain: vec!["beam".into(), "mcts".into(), "pure_pilot".into()],
+            results: Vec::new(),
+            content: String::new(),
+            confidence: 0.0,
+            is_sufficient: false,
+            strategy_used: String::new(),
+            reasoning_chain: ReasoningChain::default(),
+            tokens_used: 0,
         }
     }
 }
 
-impl RetrieveOptions {
-    /// Create new retrieve options with defaults.
+impl RetrieveResponse {
+    /// Create a new empty response.
     #[must_use]
     pub fn new() -> Self {
         Self::default()
     }
 
-    /// Set the maximum number of results to return.
-    #[must_use]
-    pub fn with_top_k(mut self, top_k: usize) -> Self {
-        self.top_k = top_k;
-        self
-    }
-
-    /// Set the beam width for multi-path search.
-    #[must_use]
-    pub fn with_beam_width(mut self, beam_width: usize) -> Self {
-        self.beam_width = beam_width;
-        self
-    }
-
-    /// Set the maximum search iterations.
-    #[must_use]
-    pub fn with_max_iterations(mut self, max_iterations: usize) -> Self {
-        self.max_iterations = max_iterations;
-        self
-    }
-
-    /// Set whether to include node content in results.
-    #[must_use]
-    pub fn with_include_content(mut self, include: bool) -> Self {
-        self.include_content = include;
-        self
-    }
-
-    /// Set whether to include node summaries in results.
-    #[must_use]
-    pub fn with_include_summaries(mut self, include: bool) -> Self {
-        self.include_summaries = include;
-        self
-    }
-
-    /// Set the minimum relevance score.
-    #[must_use]
-    pub fn with_min_score(mut self, min_score: f32) -> Self {
-        self.min_score = min_score;
-        self
-    }
-
-    /// Set the strategy preference.
-    #[must_use]
-    pub fn with_strategy(mut self, strategy: StrategyPreference) -> Self {
-        self.strategy = strategy;
-        self
-    }
-
-    /// Set whether to enable sufficiency checking.
-    #[must_use]
-    pub fn with_sufficiency_check(mut self, enable: bool) -> Self {
-        self.sufficiency_check = enable;
-        self
-    }
-
-    /// Set the maximum tokens for sufficiency threshold.
-    #[must_use]
-    pub fn with_max_tokens(mut self, max_tokens: usize) -> Self {
-        self.max_tokens = max_tokens;
-        self
-    }
-
-    /// Set whether to enable result caching.
-    #[must_use]
-    pub fn with_enable_cache(mut self, enable: bool) -> Self {
-        self.enable_cache = enable;
-        self
-    }
-
-    /// Set pruning strategy for context building.
-    #[must_use]
-    pub fn with_pruning_strategy(mut self, strategy: PruningStrategy) -> Self {
-        self.pruning_strategy = strategy;
-        self
-    }
-
-    /// Set token estimation mode.
-    #[must_use]
-    pub fn with_token_estimation(mut self, mode: TokenEstimation) -> Self {
-        self.token_estimation = mode;
-        self
-    }
-
-    /// Enable async context building for large documents.
-    #[must_use]
-    pub fn with_async_context(mut self, enable: bool) -> Self {
-        self.use_async_context = enable;
-        self
-    }
-
-    /// Enable streaming retrieval results.
-    #[must_use]
-    pub fn with_streaming(mut self, enable: bool) -> Self {
-        self.streaming = enable;
-        self
-    }
-
-    /// Set the cross-document graph for graph-aware retrieval boosting.
+    /// Check if there are any results.
     #[must_use]
-    pub fn with_document_graph(
-        mut self,
-        graph: std::sync::Arc<crate::graph::DocumentGraph>,
-    ) -> Self {
-        self.document_graph = Some(graph);
-        self
+    pub fn is_empty(&self) -> bool {
+        self.results.is_empty()
     }
 
-    /// Set the search fallback chain.
-    ///
-    /// Algorithm names: "beam", "mcts", "pure_pilot".
-    /// Primary algorithm is prepended automatically by the Plan stage.
+    /// Get the number of results.
     #[must_use]
-    pub fn with_fallback_chain(mut self, chain: Vec<String>) -> Self {
-        self.fallback_chain = chain;
-        self
+    pub fn len(&self) -> usize {
+        self.results.len()
     }
 }
 
@@ -364,190 +166,10 @@ impl RetrievalResult {
     }
 }
 
-/// Complete retrieval response.
-#[derive(Debug, Clone)]
-pub struct RetrieveResponse {
-    /// Retrieved results.
-    pub results: Vec<RetrievalResult>,
-
-    /// Aggregated content.
-    pub content: String,
-
-    /// Overall confidence score.
-    pub confidence: f32,
-
-    /// Whether information is sufficient.
-    pub is_sufficient: bool,
-
-    /// Strategy that was used.
-    pub strategy_used: String,
-
-    /// Detected query complexity.
-    pub complexity: QueryComplexity,
-
-    /// Reasoning chain explaining how results were found.
-    pub reasoning_chain: ReasoningChain,
-
-    /// Total tokens used.
-    pub tokens_used: usize,
-}
-
-impl Default for RetrieveResponse {
-    fn default() -> Self {
-        Self {
-            results: Vec::new(),
-            content: String::new(),
-            confidence: 0.0,
-            is_sufficient: false,
-            strategy_used: String::new(),
-            complexity: QueryComplexity::Medium,
-            reasoning_chain: ReasoningChain::default(),
-            tokens_used: 0,
-        }
-    }
-}
-
-impl RetrieveResponse {
-    /// Create a new empty response.
-    #[must_use]
-    pub fn new() -> Self {
-        Self::default()
-    }
-
-    /// Check if there are any results.
-    #[must_use]
-    pub fn is_empty(&self) -> bool {
-        self.results.is_empty()
-    }
-
-    /// Get the number of results.
-    #[must_use]
-    pub fn len(&self) -> usize {
-        self.results.len()
-    }
-}
-
-/// A single navigation step in the search trace.
-#[derive(Debug, Clone, Serialize, Deserialize)]
-pub struct NavigationStep {
-    /// Node ID visited.
-    pub node_id: String,
-
-    /// Node title.
-    pub title: String,
-
-    /// Relevance score at this step.
-    pub score: f32,
-
-    /// Decision made at this step.
-    pub decision: NavigationDecision,
-
-    /// Depth in tree.
-    pub depth: usize,
-}
-
-/// Navigation decision at each step.
-#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
-pub enum NavigationDecision {
-    /// Go to the specified child.
-    GoToChild(usize),
-
-    /// This node contains the answer.
-    ThisIsTheAnswer,
-
-    /// Explore multiple children.
-    ExploreMore,
-
-    /// Skip this branch.
-    Skip,
-
-    /// Backtrack from a dead-end node to a previously shelved alternative.
-    /// Contains the title of the dead-end node being abandoned.
-    BacktrackFrom(String),
-}
-
-/// Pipeline stage name for reasoning chain provenance.
-#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash, Serialize, Deserialize)]
-pub enum StageName {
-    /// Query analysis stage.
-    Analyze,
-    /// Strategy planning stage.
-    Plan,
-    /// Tree search stage.
-    Search,
-    /// Sufficiency evaluation stage.
-    Evaluate,
-}
-
-impl std::fmt::Display for StageName {
-    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
-        match self {
-            Self::Analyze => write!(f, "analyze"),
-            Self::Plan => write!(f, "plan"),
-            Self::Search => write!(f, "search"),
-            Self::Evaluate => write!(f, "evaluate"),
-        }
-    }
-}
-
-/// Summary of an LLM call made during a reasoning step.
-#[derive(Debug, Clone, Serialize, Deserialize)]
-pub struct LlmCallSummary {
-    /// Truncated prompt summary for display.
-    pub prompt_summary: String,
-    /// Tokens consumed by this call.
-    pub tokens_used: usize,
-    /// Model identifier.
-    pub model: String,
-}
-
-/// A candidate node considered but not selected during reasoning.
-#[derive(Debug, Clone, Serialize, Deserialize)]
-pub struct ReasoningCandidate {
-    /// Node ID.
-    pub node_id: String,
-    /// Node title.
-    pub title: String,
-    /// Relevance score of this candidate.
-    pub score: f32,
-}
-
-/// A single step in the reasoning chain.
-///
-/// Unlike `NavigationStep` which only records "where" the search went,
-/// `ReasoningStep` also records "why" — the decision rationale,
-/// candidates considered, strategy used, and any LLM calls made.
-#[derive(Debug, Clone, Serialize, Deserialize)]
-pub struct ReasoningStep {
-    /// Which pipeline stage produced this step.
-    pub stage: StageName,
-    /// Node ID visited (if applicable).
-    pub node_id: Option<String>,
-    /// Node title (if applicable).
-    pub title: Option<String>,
-    /// Relevance score at this step.
-    pub score: f32,
-    /// Decision made at this step.
-    pub decision: NavigationDecision,
-    /// Depth in tree.
-    pub depth: usize,
-    /// Human-readable explanation of why this decision was made.
-    pub reasoning: String,
-    /// Candidates considered but not selected at this step.
-    pub candidates: Vec<ReasoningCandidate>,
-    /// Strategy used at this step (e.g. "keyword", "hybrid").
-    pub strategy_used: Option<String>,
-    /// LLM call summary, if an LLM was consulted.
-    pub llm_call: Option<LlmCallSummary>,
-    /// Reference identifiers followed from this step (cross-reference tracking).
-    pub references_followed: Vec<String>,
-}
-
 /// Complete reasoning chain for a retrieval operation.
 ///
 /// Provides an ordered, auditable trace of every decision the engine made
-/// from query analysis through final evaluation. This is the core
-/// differentiator — not just results, but *why* these results.
+/// from query analysis through final evaluation.
 #[derive(Debug, Clone, Default, Serialize, Deserialize)]
 pub struct ReasoningChain {
     /// Ordered reasoning steps.
@@ -577,132 +199,11 @@ impl ReasoningChain {
     pub fn is_empty(&self) -> bool {
         self.steps.is_empty()
     }
-
-    /// Build a human-readable summary of the full chain.
-    #[must_use]
-    pub fn summary(&self) -> String {
-        self.steps
-            .iter()
-            .map(|s| {
-                let node_info = s.title.as_deref().unwrap_or("(no node)");
-                format!(
-                    "[{}] {} (score={:.2}): {}",
-                    s.stage, node_info, s.score, s.reasoning
-                )
-            })
-            .collect::<Vec<_>>()
-            .join("\n")
-    }
 }
 
-/// Search path for multi-path algorithms.
-///
-/// Tracks the sequence of nodes visited, along with the reasoning
-/// for each navigation step. This reasoning is fed back into the
-/// LLM context so the Pilot can understand how it arrived at the
-/// current position and avoid repeating mistakes.
-#[derive(Debug, Clone)]
-pub struct SearchPath {
-    /// Nodes in the path.
-    pub nodes: Vec<NodeId>,
-
-    /// Cumulative score.
-    pub score: f32,
-
-    /// Leaf node (if path ends at leaf).
-    pub leaf: Option<NodeId>,
-
-    /// Per-step reasoning for why each node was chosen.
-    ///
-    /// Same length as `nodes`. Each entry is the reason the
-    /// corresponding node was selected. `None` means no reason
-    /// was captured (e.g., algorithm-only fallback).
-    pub step_reasons: Vec<Option<String>>,
-}
-
-impl SearchPath {
-    /// Create a new empty path.
-    #[must_use]
-    pub fn new() -> Self {
-        Self {
-            nodes: Vec::new(),
-            score: 0.0,
-            leaf: None,
-            step_reasons: Vec::new(),
-        }
-    }
-
-    /// Create a path from a single node.
-    #[must_use]
-    pub fn from_node(node_id: NodeId, score: f32) -> Self {
-        Self {
-            nodes: vec![node_id],
-            score,
-            leaf: Some(node_id),
-            step_reasons: vec![None],
-        }
-    }
-
-    /// Extend the path with a new node and optional reason.
-    #[must_use]
-    pub fn extend(&self, node_id: NodeId, score: f32) -> Self {
-        let mut nodes = self.nodes.clone();
-        let mut step_reasons = self.step_reasons.clone();
-        nodes.push(node_id);
-        step_reasons.push(None);
-        Self {
-            nodes,
-            score: self.score + score,
-            leaf: Some(node_id),
-            step_reasons,
-        }
-    }
-
-    /// Extend the path with a new node and a reason for choosing it.
-    #[must_use]
-    pub fn extend_with_reason(
-        &self,
-        node_id: NodeId,
-        score: f32,
-        reason: impl Into<String>,
-    ) -> Self {
-        let mut nodes = self.nodes.clone();
-        let mut step_reasons = self.step_reasons.clone();
-        nodes.push(node_id);
-        step_reasons.push(Some(reason.into()));
-        Self {
-            nodes,
-            score: self.score + score,
-            leaf: Some(node_id),
-            step_reasons,
-        }
-    }
-}
-
-impl Default for SearchPath {
-    fn default() -> Self {
-        Self::new()
-    }
-}
-
-/// Statistics for a retrieval operation.
-#[derive(Debug, Clone, Default)]
-pub struct RetrievalStats {
-    /// Number of nodes visited.
-    pub nodes_visited: usize,
-
-    /// Number of LLM calls made.
-    pub llm_calls: usize,
-
-    /// Time spent in milliseconds.
-    pub time_ms: u64,
-
-    /// Tokens consumed.
-    pub tokens_used: usize,
-
-    /// Cache hits.
-    pub cache_hits: usize,
-
-    /// Cache misses.
-    pub cache_misses: usize,
+/// A single step in the reasoning chain.
+#[derive(Debug, Clone, Serialize, Deserialize)]
+pub struct ReasoningStep {
+    /// Human-readable explanation of the decision.
+    pub reasoning: String,
 }
diff --git a/rust/src/retrieval/scoring/bm25.rs b/rust/src/scoring/bm25.rs
similarity index 100%
rename from rust/src/retrieval/scoring/bm25.rs
rename to rust/src/scoring/bm25.rs
diff --git a/rust/src/scoring/mod.rs b/rust/src/scoring/mod.rs
new file mode 100644
index 00000000..eac4e435
--- /dev/null
+++ b/rust/src/scoring/mod.rs
@@ -0,0 +1,8 @@
+// Copyright (c) 2026 vectorless developers
+// SPDX-License-Identifier: Apache-2.0
+
+//! Scoring utilities — keyword extraction via BM25.
+
+pub mod bm25;
+
+pub use bm25::extract_keywords;
diff --git a/rust/src/storage/persistence.rs b/rust/src/storage/persistence.rs
index b9d28317..b2dac4d4 100644
--- a/rust/src/storage/persistence.rs
+++ b/rust/src/storage/persistence.rs
@@ -16,7 +16,7 @@ use std::io::{BufReader, BufWriter, Write};
 use std::path::{Path, PathBuf};
 
 use crate::Error;
-use crate::document::{DocumentTree, ReasoningIndex};
+use crate::document::{DocumentTree, NavigationIndex, ReasoningIndex};
 use crate::error::Result;
 
 /// Current format version for persisted documents.
@@ -228,6 +228,10 @@ pub struct PersistedDocument {
     /// Pre-computed reasoning index for retrieval acceleration.
     #[serde(default, skip_serializing_if = "Option::is_none")]
     pub reasoning_index: Option<ReasoningIndex>,
+
+    /// Navigation index for Agent-based retrieval.
+    #[serde(default, skip_serializing_if = "Option::is_none")]
+    pub navigation_index: Option<NavigationIndex>,
 }
 
 impl PersistedDocument {
@@ -239,6 +243,7 @@ impl PersistedDocument {
             tree,
             pages: Vec::new(),
             reasoning_index: None,
+            navigation_index: None,
         }
     }
 
diff --git a/rust/src/storage/workspace.rs b/rust/src/storage/workspace.rs
index 045cf1e6..936fc815 100644
--- a/rust/src/storage/workspace.rs
+++ b/rust/src/storage/workspace.rs
@@ -47,6 +47,7 @@ use crate::Error;
 use crate::error::Result;
 
 const META_KEY: &str = "meta";
+const CATALOG_KEY: &str = "catalog";
 const DEFAULT_CACHE_SIZE: usize = 100;
 
 /// Lightweight metadata entry for the async workspace index.
@@ -108,6 +109,8 @@ struct WorkspaceInner {
     root: Option<PathBuf>,
     /// Document metadata index.
     meta_index: HashMap<String, DocumentMetaEntry>,
+    /// DocCard catalog — lightweight document summaries for Orchestrator analysis.
+    catalog: HashMap<String, crate::document::DocCard>,
     /// LRU cache for loaded documents.
     cache: DocumentCache,
     /// Cross-document relationship graph (cached).
@@ -149,11 +152,13 @@ impl Workspace {
             backend,
             root: None,
             meta_index: HashMap::new(),
+            catalog: HashMap::new(),
             cache: DocumentCache::with_capacity(options.cache_size),
             document_graph: None,
         };
 
         Self::load_meta_index(&mut inner)?;
+        Self::load_catalog_index(&mut inner)?;
 
         Ok(Self {
             inner: Arc::new(RwLock::new(inner)),
@@ -186,11 +191,13 @@ impl Workspace {
             backend,
             root: Some(root),
             meta_index: HashMap::new(),
+            catalog: HashMap::new(),
             cache: DocumentCache::with_capacity(options.cache_size),
             document_graph: None,
         };
 
         Self::load_meta_index(&mut inner)?;
+        Self::load_catalog_index(&mut inner)?;
 
         Ok(Self {
             inner: Arc::new(RwLock::new(inner)),
@@ -254,6 +261,16 @@ impl Workspace {
         inner.meta_index.insert(doc_id.clone(), meta_entry);
         Self::save_meta_index(&inner)?;
 
+        // Update catalog with DocCard
+        if let Some(card) = doc
+            .navigation_index
+            .as_ref()
+            .and_then(|nav| nav.doc_card().cloned())
+        {
+            inner.catalog.insert(doc_id.clone(), card);
+            Self::save_catalog_index(&inner)?;
+        }
+
         // Remove from cache if present
         let _ = inner.cache.remove(&doc_id);
 
@@ -356,10 +373,12 @@ impl Workspace {
 
         inner.meta_index.remove(id);
 
-        // Remove from cache
+        // Remove from cache and catalog
         let _ = inner.cache.remove(id);
+        inner.catalog.remove(id);
 
         Self::save_meta_index(&inner)?;
+        Self::save_catalog_index(&inner)?;
 
         info!("Removed document {} from async workspace", id);
 
@@ -513,10 +532,84 @@ impl Workspace {
         Ok(())
     }
 
+    /// Load the DocCard catalog from backend.
+    fn load_catalog_index(inner: &mut WorkspaceInner) -> Result<()> {
+        match inner.backend.get(CATALOG_KEY)? {
+            Some(bytes) => {
+                let catalog: HashMap<String, crate::document::DocCard> =
+                    serde_json::from_slice(&bytes).map_err(|e| {
+                        Error::Parse(format!("Failed to parse catalog index: {}", e))
+                    })?;
+                inner.catalog = catalog;
+                info!("Loaded DocCard catalog: {} entries", inner.catalog.len());
+            }
+            None => {
+                // Rebuild from existing documents
+                Self::rebuild_catalog(inner)?;
+            }
+        }
+        Ok(())
+    }
+
+    /// Save the DocCard catalog to backend.
+    fn save_catalog_index(inner: &WorkspaceInner) -> Result<()> {
+        let bytes = serde_json::to_vec_pretty(&inner.catalog)
+            .map_err(|e| Error::Parse(format!("Failed to serialize catalog: {}", e)))?;
+        inner.backend.put(CATALOG_KEY, &bytes)?;
+        Ok(())
+    }
+
+    /// Rebuild the DocCard catalog from existing documents.
+    fn rebuild_catalog(inner: &mut WorkspaceInner) -> Result<()> {
+        let keys = inner.backend.keys()?;
+        let reserved = ["meta", "_graph", "catalog"];
+        let doc_keys: Vec<_> = keys
+            .iter()
+            .filter(|k| !reserved.contains(&k.as_str()))
+            .collect();
+
+        for key in doc_keys {
+            if let Some(bytes) = inner.backend.get(key)? {
+                if let Ok(doc) = load_document_from_bytes(&bytes) {
+                    if let Some(card) = doc
+                        .navigation_index
+                        .as_ref()
+                        .and_then(|nav| nav.doc_card().cloned())
+                    {
+                        inner.catalog.insert(doc.meta.id.clone(), card);
+                    }
+                }
+            }
+        }
+
+        if !inner.catalog.is_empty() {
+            Self::save_catalog_index(inner)?;
+            info!("Rebuilt DocCard catalog: {} entries", inner.catalog.len());
+        }
+
+        Ok(())
+    }
+
+    /// Get all DocCards from the catalog.
+    pub async fn list_catalog(&self) -> Vec<(String, crate::document::DocCard)> {
+        let inner = self.inner.read().await;
+        inner
+            .catalog
+            .iter()
+            .map(|(id, card)| (id.clone(), card.clone()))
+            .collect()
+    }
+
+    /// Get a single DocCard by document ID.
+    pub async fn get_doc_card(&self, id: &str) -> Option<crate::document::DocCard> {
+        let inner = self.inner.read().await;
+        inner.catalog.get(id).cloned()
+    }
+
     /// Rebuild the meta index from existing documents.
     fn rebuild_meta_index(inner: &mut WorkspaceInner) -> Result<()> {
         let keys = inner.backend.keys()?;
-        let reserved = ["meta", "_graph"];
+        let reserved = ["meta", "_graph", "catalog"];
         let doc_keys: Vec<_> = keys
             .iter()
             .filter(|k| !reserved.contains(&k.as_str()))
diff --git a/samples/083a0e39-5c92-404b-9fb7-8458152dd65f.bin b/samples/083a0e39-5c92-404b-9fb7-8458152dd65f.bin
deleted file mode 100644
index d8b3841e..00000000
--- a/samples/083a0e39-5c92-404b-9fb7-8458152dd65f.bin
+++ /dev/null
@@ -1 +0,0 @@
-{"version":1,"checksum":"df9576fe0cb0f42948a619b0352f6fbd3c647258317ff2ffe5b170baec6eb302","payload":{"meta":{"content_fingerprint":"zsbpiOWjNlqXOyJuG/CAgQ==","created_at":"2026-04-13T15:19:05.175062664Z","description":"","format":"pdf","id":"083a0e39-5c92-404b-9fb7-8458152dd65f","line_count":null,"logic_fingerprint":"4p/tkAx4Dcrk805539ue0Q==","modified_at":"2026-04-13T15:19:05.178281613Z","name":"Docker_Cheat_Sheet","node_count":7,"page_count":null,"processing_duration_ms":99141,"processing_version":0,"source_path":"/home/ztgx/Desktop/vectorless/samples/Docker_Cheat_Sheet.pdf","total_summary_tokens":378},"pages":[],"reasoning_index":{"config_hash":0,"hot_nodes":{},"section_map":{"1":{"index1":2,"stamp":0},"docker cheat sheet":{"index1":2,"stamp":0}},"summary_shortcut":{"document_summary":"Docker Cheat Sheet: This cheat sheet provides a quick reference for essential Docker and Docker Compose command-line operations. It covers core topics including container process management, image and repository handling, volume and port mapping, and system troubleshooting. Use this guide to quickly locate commands for building, running, and managing Dockerized environments.","root_node":{"index1":1,"stamp":0},"section_summaries":[{"depth":1,"node_id":{"index1":2,"stamp":0},"summary":"This cheat sheet provides a quick reference for essential Docker and Docker Compose command-line operations. It covers core topics including container process management, image and repository handling, volume and port mapping, and system troubleshooting. Use this guide to quickly locate commands for building, running, and managing Dockerized environments.","title":"Docker Cheat Sheet"}]},"topic_paths":{"active":[{"depth":2,"node_id":{"index1":7,"stamp":0},"weight":1.0}],"additionally":[{"depth":2,"node_id":{"index1":6,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":5,"stamp":0},"weight":1.0}],"advanced":[{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":1.0}],"alongside":[{"depth":2,"node_id":{"index1":5,"stamp":0},"weight":1.0}],"analysis":[{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":1.0}],"applications":[{"depth":2,"node_id":{"index1":4,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":6,"stamp":0},"weight":1.0}],"basic":[{"depth":2,"node_id":{"index1":6,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":1.0}],"brief":[{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":1.0}],"building":[{"depth":1,"node_id":{"index1":2,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":4,"stamp":0},"weight":1.0}],"categories":[{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":4,"stamp":0},"weight":1.0}],"cheat":[{"depth":1,"node_id":{"index1":2,"stamp":0},"weight":1.0},{"depth":0,"node_id":{"index1":1,"stamp":0},"weight":0.5714285969734192},{"depth":2,"node_id":{"index1":4,"stamp":0},"weight":0.4285714626312256},{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":0.4285714626312256}],"cli":[{"depth":2,"node_id":{"index1":6,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":7,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":4,"stamp":0},"weight":1.0}],"command":[{"depth":2,"node_id":{"index1":5,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":1.0},{"depth":1,"node_id":{"index1":2,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":4,"stamp":0},"weight":1.0}],"commands":[{"depth":2,"node_id":{"index1":6,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":5,"stamp":0},"weight":0.5},{"depth":2,"node_id":{"index1":7,"stamp":0},"weight":0.5},{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":0.25},{"depth":1,"node_id":{"index1":2,"stamp":0},"weight":0.25},{"depth":2,"node_id":{"index1":4,"stamp":0},"weight":0.25}],"complete":[{"depth":2,"node_id":{"index1":4,"stamp":0},"weight":1.0}],"compose":[{"depth":2,"node_id":{"index1":7,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":6,"stamp":0},"weight":0.8571429252624512},{"depth":2,"node_id":{"index1":4,"stamp":0},"weight":0.4285714626312256},{"depth":2,"node_id":{"index1":5,"stamp":0},"weight":0.4285714626312256},{"depth":1,"node_id":{"index1":2,"stamp":0},"weight":0.4285714626312256},{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":0.4285714626312256}],"comprehensive":[{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":1.0}],"consumption":[{"depth":2,"node_id":{"index1":7,"stamp":0},"weight":1.0}],"container":[{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":5,"stamp":0},"weight":0.6666666865348816},{"depth":2,"node_id":{"index1":6,"stamp":0},"weight":0.6666666865348816},{"depth":2,"node_id":{"index1":4,"stamp":0},"weight":0.6666666865348816},{"depth":1,"node_id":{"index1":2,"stamp":0},"weight":0.3333333432674408}],"containerized":[{"depth":2,"node_id":{"index1":7,"stamp":0},"weight":1.0}],"containers":[{"depth":2,"node_id":{"index1":5,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":6,"stamp":0},"weight":1.0}],"copying":[{"depth":2,"node_id":{"index1":5,"stamp":0},"weight":1.0}],"core":[{"depth":1,"node_id":{"index1":2,"stamp":0},"weight":1.0}],"covering":[{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":1.0}],"covers":[{"depth":1,"node_id":{"index1":2,"stamp":0},"weight":1.0}],"data":[{"depth":2,"node_id":{"index1":5,"stamp":0},"weight":1.0}],"description":[{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":1.0}],"destroy":[{"depth":2,"node_id":{"index1":7,"stamp":0},"weight":1.0}],"details":[{"depth":2,"node_id":{"index1":6,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":5,"stamp":0},"weight":1.0}],"docker":[{"depth":1,"node_id":{"index1":2,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":6,"stamp":0},"weight":0.9000000357627869},{"depth":2,"node_id":{"index1":7,"stamp":0},"weight":0.699999988079071},{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":0.6000000238418579},{"depth":2,"node_id":{"index1":5,"stamp":0},"weight":0.6000000238418579},{"depth":2,"node_id":{"index1":4,"stamp":0},"weight":0.6000000238418579},{"depth":0,"node_id":{"index1":1,"stamp":0},"weight":0.4000000059604645}],"dockerized":[{"depth":1,"node_id":{"index1":2,"stamp":0},"weight":1.0}],"document":[{"depth":2,"node_id":{"index1":4,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":1.0}],"environments":[{"depth":2,"node_id":{"index1":7,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":5,"stamp":0},"weight":1.0},{"depth":1,"node_id":{"index1":2,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":6,"stamp":0},"weight":1.0}],"essential":[{"depth":2,"node_id":{"index1":4,"stamp":0},"weight":1.0},{"depth":1,"node_id":{"index1":2,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":6,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":5,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":7,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":1.0}],"everything":[{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":1.0}],"executing":[{"depth":2,"node_id":{"index1":6,"stamp":0},"weight":1.0}],"features":[{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":1.0}],"files":[{"depth":2,"node_id":{"index1":5,"stamp":0},"weight":1.0}],"five":[{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":1.0}],"four":[{"depth":2,"node_id":{"index1":4,"stamp":0},"weight":1.0}],"fundamental":[{"depth":2,"node_id":{"index1":5,"stamp":0},"weight":1.0}],"guide":[{"depth":1,"node_id":{"index1":2,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":5,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":7,"stamp":0},"weight":1.0}],"handling":[{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":1.0},{"depth":1,"node_id":{"index1":2,"stamp":0},"weight":1.0}],"health":[{"depth":2,"node_id":{"index1":5,"stamp":0},"weight":1.0}],"host":[{"depth":2,"node_id":{"index1":5,"stamp":0},"weight":1.0}],"image":[{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":4,"stamp":0},"weight":1.0},{"depth":1,"node_id":{"index1":2,"stamp":0},"weight":1.0}],"images":[{"depth":2,"node_id":{"index1":4,"stamp":0},"weight":1.0}],"including":[{"depth":1,"node_id":{"index1":2,"stamp":0},"weight":1.0}],"inspecting":[{"depth":2,"node_id":{"index1":6,"stamp":0},"weight":1.0}],"lifecycle":[{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":4,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":6,"stamp":0},"weight":1.0}],"like":[{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":1.0}],"line":[{"depth":2,"node_id":{"index1":5,"stamp":0},"weight":1.0},{"depth":1,"node_id":{"index1":2,"stamp":0},"weight":1.0}],"list":[{"depth":2,"node_id":{"index1":6,"stamp":0},"weight":1.0}],"locate":[{"depth":1,"node_id":{"index1":2,"stamp":0},"weight":1.0}],"log":[{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":1.0}],"logs":[{"depth":2,"node_id":{"index1":7,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":5,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":6,"stamp":0},"weight":1.0}],"machines":[{"depth":2,"node_id":{"index1":5,"stamp":0},"weight":1.0}],"main":[{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":1.0}],"manage":[{"depth":2,"node_id":{"index1":6,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":7,"stamp":0},"weight":1.0}],"management":[{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":4,"stamp":0},"weight":0.4285714626312256},{"depth":1,"node_id":{"index1":2,"stamp":0},"weight":0.4285714626312256}],"managing":[{"depth":2,"node_id":{"index1":5,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":4,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":6,"stamp":0},"weight":1.0},{"depth":1,"node_id":{"index1":2,"stamp":0},"weight":1.0}],"mapping":[{"depth":1,"node_id":{"index1":2,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":5,"stamp":0},"weight":1.0}],"metadata":[{"depth":2,"node_id":{"index1":6,"stamp":0},"weight":1.0}],"monitor":[{"depth":2,"node_id":{"index1":7,"stamp":0},"weight":1.0}],"monitoring":[{"depth":2,"node_id":{"index1":6,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":5,"stamp":0},"weight":1.0}],"mounting":[{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":1.0}],"multi":[{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":4,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":5,"stamp":0},"weight":1.0}],"necessary":[{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":5,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":7,"stamp":0},"weight":1.0}],"networking":[{"depth":2,"node_id":{"index1":4,"stamp":0},"weight":1.0}],"operations":[{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":1.0},{"depth":1,"node_id":{"index1":2,"stamp":0},"weight":0.5},{"depth":2,"node_id":{"index1":5,"stamp":0},"weight":0.5}],"orchestrating":[{"depth":2,"node_id":{"index1":5,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":4,"stamp":0},"weight":1.0}],"orchestration":[{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":1.0}],"organized":[{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":4,"stamp":0},"weight":1.0}],"outlines":[{"depth":2,"node_id":{"index1":5,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":6,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":7,"stamp":0},"weight":1.0}],"overall":[{"depth":2,"node_id":{"index1":7,"stamp":0},"weight":1.0}],"persistence":[{"depth":2,"node_id":{"index1":5,"stamp":0},"weight":1.0}],"port":[{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":1.0},{"depth":1,"node_id":{"index1":2,"stamp":0},"weight":1.0}],"ports":[{"depth":2,"node_id":{"index1":5,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":0.4285714626312256},{"depth":2,"node_id":{"index1":4,"stamp":0},"weight":0.4285714626312256}],"presented":[{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":1.0}],"process":[{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":1.0},{"depth":1,"node_id":{"index1":2,"stamp":0},"weight":0.4285714626312256}],"processes":[{"depth":2,"node_id":{"index1":7,"stamp":0},"weight":1.0}],"provides":[{"depth":2,"node_id":{"index1":7,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":4,"stamp":0},"weight":1.0},{"depth":1,"node_id":{"index1":2,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":6,"stamp":0},"weight":1.0}],"pushing":[{"depth":2,"node_id":{"index1":4,"stamp":0},"weight":1.0}],"quick":[{"depth":1,"node_id":{"index1":2,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":5,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":7,"stamp":0},"weight":1.0}],"quickly":[{"depth":1,"node_id":{"index1":2,"stamp":0},"weight":1.0}],"reference":[{"depth":1,"node_id":{"index1":2,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":7,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":6,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":5,"stamp":0},"weight":1.0}],"repository":[{"depth":2,"node_id":{"index1":4,"stamp":0},"weight":1.0},{"depth":1,"node_id":{"index1":2,"stamp":0},"weight":0.4285714626312256},{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":0.4285714626312256}],"required":[{"depth":2,"node_id":{"index1":6,"stamp":0},"weight":1.0}],"resource":[{"depth":2,"node_id":{"index1":6,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":7,"stamp":0},"weight":1.0}],"resources":[{"depth":2,"node_id":{"index1":7,"stamp":0},"weight":1.0}],"running":[{"depth":1,"node_id":{"index1":2,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":6,"stamp":0},"weight":1.0}],"section":[{"depth":2,"node_id":{"index1":5,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":7,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":6,"stamp":0},"weight":1.0}],"serves":[{"depth":2,"node_id":{"index1":4,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":5,"stamp":0},"weight":1.0}],"sheet":[{"depth":1,"node_id":{"index1":2,"stamp":0},"weight":1.0},{"depth":0,"node_id":{"index1":1,"stamp":0},"weight":0.5714285969734192},{"depth":2,"node_id":{"index1":4,"stamp":0},"weight":0.4285714626312256},{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":0.4285714626312256}],"start":[{"depth":2,"node_id":{"index1":7,"stamp":0},"weight":1.0}],"stop":[{"depth":2,"node_id":{"index1":7,"stamp":0},"weight":1.0}],"storage":[{"depth":2,"node_id":{"index1":4,"stamp":0},"weight":1.0}],"syntax":[{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":1.0}],"system":[{"depth":1,"node_id":{"index1":2,"stamp":0},"weight":1.0}],"topics":[{"depth":1,"node_id":{"index1":2,"stamp":0},"weight":1.0}],"tracking":[{"depth":2,"node_id":{"index1":6,"stamp":0},"weight":1.0}],"troubleshooting":[{"depth":2,"node_id":{"index1":6,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":0.4285714626312256},{"depth":2,"node_id":{"index1":5,"stamp":0},"weight":0.4285714626312256},{"depth":1,"node_id":{"index1":2,"stamp":0},"weight":0.4285714626312256},{"depth":2,"node_id":{"index1":4,"stamp":0},"weight":0.4285714626312256}],"usage":[{"depth":2,"node_id":{"index1":6,"stamp":0},"weight":1.0}],"use":[{"depth":1,"node_id":{"index1":2,"stamp":0},"weight":1.0}],"utilities":[{"depth":2,"node_id":{"index1":5,"stamp":0},"weight":1.0}],"viewing":[{"depth":2,"node_id":{"index1":6,"stamp":0},"weight":1.0}],"volume":[{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":1.0},{"depth":1,"node_id":{"index1":2,"stamp":0},"weight":1.0}],"volumes":[{"depth":2,"node_id":{"index1":5,"stamp":0},"weight":1.0},{"depth":2,"node_id":{"index1":3,"stamp":0},"weight":0.4285714626312256},{"depth":2,"node_id":{"index1":4,"stamp":0},"weight":0.4285714626312256}],"well":[{"depth":2,"node_id":{"index1":7,"stamp":0},"weight":1.0}],"within":[{"depth":2,"node_id":{"index1":6,"stamp":0},"weight":1.0}]}},"tree":{"arena":{"first_free_slot":null,"last_free_slot":null,"nodes":[{"data":{"Data":{"content":"","depth":0,"end_index":1,"end_page":1,"node_id":"0001","physical_index":null,"references":[],"start_index":1,"start_page":1,"structure":"","summary":"","title":"Docker_Cheat_Sheet","token_count":null}},"first_child":{"index1":2,"stamp":0},"last_child":{"index1":2,"stamp":0},"next_sibling":null,"parent":null,"previous_sibling":null,"stamp":0},{"data":{"Data":{"content":"Process Management\n\n# Show all running docker containers\ndocker ps\n\n# Show all docker containers\ndocker ps -a\n\n# Run a container\ndocker run <image>:<tag>\n\n# Run a container and connect to it\ndocker run -it <image>:<tag>\n\n# Run a container in the background\ndocker run -d <image>:<tag>\n\n# Stop a container\ndocker stop <container>\n\n# Kill a container\ndocker kill <container>\n Images/Repository\n\n# List available local images\ndocker images\n\n# Search for docker images\ndocker search <image>\n\n# Pull a docker image\ndocker pull <image>\n\n# Build an image with a dockerﬁle\ndocker build -t <image>:<tag> <run_directory> -f <dockerﬁle>\n\n# Login to a remote repository\ndocker login <repository>\n\n# Push an image to your remotee repository\ndocker push <image>:<tag>\n\n# Remove a local docker image\ndocker rmi <image>:<tag>\n\n# Show metadata for an image\ndocker inspect <image>\n\n# Remove all unused docker images\ndocker image prune\n\nVolumes & Ports\n\n# List volumes\ndocker volume ls\n\n# Create a volume\ndocker volume create <volume>\n\n# Delete a volume\ndocker volume rm <volume>\n\n# Show volume metadata\ndocker volume inspect <volume>\n\n# Delete all volumes not attached to a container\ndocker volume prune\n\n# Mount a local directory to your container\ndocker run -v <local_dir>:<container_dir> <image>\n\n# Copy ﬁle or folder from a docker container to host machine\ndocker cp <container>:<container_dir> <local_dir>\n\n# Copy ﬁle or folder from local machine onto a container\ndocker cp <local_dir> <container>:<container_dir>\n\n# Map a local port to a docker instance\ndocker run -d -p 127.0.0.1:<local_port>:<docker_port> <image>\n\n# List the ports a docker container is running on\ndocker port <container>\n Troubleshooting\n\n# Show the logs of a container\ndocker logs <container>\n\n# Follow/tail the logs of a container\ndocker logs -f <container>\n\n# Show timestamps on docker logs\ndocker logs -t <container>\n\n# Show details/metadata of a container\ndocker inspect <container>\n\n# Show a 'top' view of processes running on a container\ndocker top <container>\n\n# Show a 'top' view of all docker containers\ndocker stats\n\n# Show any ﬁles that have changed since startup\ndocker diﬀ <container>\n\n# Connect to an already running container\ndocker attach <container>\n\n# Execute a command on a container\ndocker exec -it <container_id> /bin/bash\n\n# Show docker system wide information\ndocker system info\n\n# Show docker disk space used\ndocker system df\n\n \n\nDocker Compose\n\n# Start your docker-compose deﬁned resources in detached mode\ndocker-compose up -d -f <docker_compose_yaml>\n\n# Stop all docker-compose resources\ndocker-compose stop\n\n# Destroy all docker-compose resources\ndocker-compose down\n\n# Show docker-compose processes\ndocker-compose ps\n\n# Show docker-compose logs\ndocker-compose logs\n\n# Show docker-compose resource consumption\ndocker-compose top","depth":1,"end_index":1,"end_page":1,"node_id":"0002","physical_index":null,"references":[],"start_index":1,"start_page":1,"structure":"1","summary":"This cheat sheet provides a quick reference for essential Docker and Docker Compose command-line operations. It covers core topics including container process management, image and repository handling, volume and port mapping, and system troubleshooting. Use this guide to quickly locate commands for building, running, and managing Dockerized environments.","title":"Docker Cheat Sheet","token_count":676}},"first_child":{"index1":3,"stamp":0},"last_child":{"index1":7,"stamp":0},"next_sibling":null,"parent":{"index1":1,"stamp":0},"previous_sibling":null,"stamp":0},{"data":{"Data":{"content":"# Show all running docker containers\ndocker ps\n\n# Show all docker containers\ndocker ps -a\n\n# Run a container\ndocker run <image>:<tag>\n\n# Run a container and connect to it\ndocker run -it <image>:<tag>\n\n# Run a container in the background\ndocker run -d <image>:<tag>\n\n# Stop a container\ndocker stop <container>\n\n# Kill a container\ndocker kill <container>\n Images/Repository\n\n# List available local images\ndocker images\n\n# Search for docker images\ndocker search <image>\n\n# Pull a docker image\ndocker pull <image>\n\n# Build an image with a dockerﬁle\ndocker build -t <image>:<tag> <run_directory> -f <dockerﬁle>\n\n# Login to a remote repository\ndocker login <repository>\n\n# Push an image to your remotee repository\ndocker push <image>:<tag>\n\n# Remove a local docker image\ndocker rmi <image>:<tag>\n\n# Show metadata for an image\ndocker inspect <image>\n\n# Remove all unused docker images\ndocker image prune\n\nVolumes & Ports\n\n# List volumes\ndocker volume ls\n\n# Create a volume\ndocker volume create <volume>\n\n# Delete a volume\ndocker volume rm <volume>\n\n# Show volume metadata\ndocker volume inspect <volume>\n\n# Delete all volumes not attached to a container\ndocker volume prune\n\n# Mount a local directory to your container\ndocker run -v <local_dir>:<container_dir> <image>\n\n# Copy ﬁle or folder from a docker container to host machine\ndocker cp <container>:<container_dir> <local_dir>\n\n# Copy ﬁle or folder from local machine onto a container\ndocker cp <local_dir> <container>:<container_dir>\n\n# Map a local port to a docker instance\ndocker run -d -p 127.0.0.1:<local_port>:<docker_port> <image>\n\n# List the ports a docker container is running on\ndocker port <container>\n Troubleshooting\n\n# Show the logs of a container\ndocker logs <container>\n\n# Follow/tail the logs of a container\ndocker logs -f <container>\n\n# Show timestamps on docker logs\ndocker logs -t <container>\n\n# Show details/metadata of a container\ndocker inspect <container>\n\n# Show a 'top' view of processes running on a container\ndocker top <container>\n\n# Show a 'top' view of all docker containers\ndocker stats\n\n# Show any ﬁles that have changed since startup\ndocker diﬀ <container>\n\n# Connect to an already running container\ndocker attach <container>\n\n# Execute a command on a container\ndocker exec -it <container_id> /bin/bash\n\n# Show docker system wide information\ndocker system info\n\n# Show docker disk space used\ndocker system df\n\n \n\nDocker Compose\n\n# Start your docker-compose deﬁned resources in detached mode\ndocker-compose up -d -f <docker_compose_yaml>\n\n# Stop all docker-compose resources\ndocker-compose stop\n\n# Destroy all docker-compose resources\ndocker-compose down\n\n# Show docker-compose processes\ndocker-compose ps\n\n# Show docker-compose logs\ndocker-compose logs\n\n# Show docker-compose resource consumption\ndocker-compose top","depth":2,"end_index":1,"end_page":1,"node_id":"0003","physical_index":null,"references":[],"start_index":1,"start_page":1,"structure":"1.1","summary":"This document provides a comprehensive cheat sheet of essential Docker commands organized into five main categories: container process management, image/repository operations, volumes and ports handling, troubleshooting, and Docker Compose. Each command is presented with a brief description and the necessary syntax, covering everything from basic container lifecycle operations to advanced features like volume mounting, port mapping, log analysis, and multi-container orchestration.","title":"Process Management","token_count":673}},"first_child":null,"last_child":null,"next_sibling":{"index1":4,"stamp":0},"parent":{"index1":2,"stamp":0},"previous_sibling":null,"stamp":0},{"data":{"Data":{"content":"# List available local images\ndocker images\n\n# Search for docker images\ndocker search <image>\n\n# Pull a docker image\ndocker pull <image>\n\n# Build an image with a dockerﬁle\ndocker build -t <image>:<tag> <run_directory> -f <dockerﬁle>\n\n# Login to a remote repository\ndocker login <repository>\n\n# Push an image to your remotee repository\ndocker push <image>:<tag>\n\n# Remove a local docker image\ndocker rmi <image>:<tag>\n\n# Show metadata for an image\ndocker inspect <image>\n\n# Remove all unused docker images\ndocker image prune\n\nVolumes & Ports\n\n# List volumes\ndocker volume ls\n\n# Create a volume\ndocker volume create <volume>\n\n# Delete a volume\ndocker volume rm <volume>\n\n# Show volume metadata\ndocker volume inspect <volume>\n\n# Delete all volumes not attached to a container\ndocker volume prune\n\n# Mount a local directory to your container\ndocker run -v <local_dir>:<container_dir> <image>\n\n# Copy ﬁle or folder from a docker container to host machine\ndocker cp <container>:<container_dir> <local_dir>\n\n# Copy ﬁle or folder from local machine onto a container\ndocker cp <local_dir> <container>:<container_dir>\n\n# Map a local port to a docker instance\ndocker run -d -p 127.0.0.1:<local_port>:<docker_port> <image>\n\n# List the ports a docker container is running on\ndocker port <container>\n Troubleshooting\n\n# Show the logs of a container\ndocker logs <container>\n\n# Follow/tail the logs of a container\ndocker logs -f <container>\n\n# Show timestamps on docker logs\ndocker logs -t <container>\n\n# Show details/metadata of a container\ndocker inspect <container>\n\n# Show a 'top' view of processes running on a container\ndocker top <container>\n\n# Show a 'top' view of all docker containers\ndocker stats\n\n# Show any ﬁles that have changed since startup\ndocker diﬀ <container>\n\n# Connect to an already running container\ndocker attach <container>\n\n# Execute a command on a container\ndocker exec -it <container_id> /bin/bash\n\n# Show docker system wide information\ndocker system info\n\n# Show docker disk space used\ndocker system df\n\n \n\nDocker Compose\n\n# Start your docker-compose deﬁned resources in detached mode\ndocker-compose up -d -f <docker_compose_yaml>\n\n# Stop all docker-compose resources\ndocker-compose stop\n\n# Destroy all docker-compose resources\ndocker-compose down\n\n# Show docker-compose processes\ndocker-compose ps\n\n# Show docker-compose logs\ndocker-compose logs\n\n# Show docker-compose resource consumption\ndocker-compose top","depth":2,"end_index":1,"end_page":1,"node_id":"0004","physical_index":null,"references":[],"start_index":1,"start_page":1,"structure":"1.2","summary":"This document serves as a Docker command cheat sheet organized into four categories: Image/Repository management, Volumes & Ports, Troubleshooting, and Docker Compose. It provides essential CLI commands for the complete container lifecycle, from building and pushing images to managing storage, networking, and orchestrating multi-container applications.","title":"Images/Repository","token_count":578}},"first_child":null,"last_child":null,"next_sibling":{"index1":5,"stamp":0},"parent":{"index1":2,"stamp":0},"previous_sibling":{"index1":3,"stamp":0},"stamp":0},{"data":{"Data":{"content":"# List volumes\ndocker volume ls\n\n# Create a volume\ndocker volume create <volume>\n\n# Delete a volume\ndocker volume rm <volume>\n\n# Show volume metadata\ndocker volume inspect <volume>\n\n# Delete all volumes not attached to a container\ndocker volume prune\n\n# Mount a local directory to your container\ndocker run -v <local_dir>:<container_dir> <image>\n\n# Copy ﬁle or folder from a docker container to host machine\ndocker cp <container>:<container_dir> <local_dir>\n\n# Copy ﬁle or folder from local machine onto a container\ndocker cp <local_dir> <container>:<container_dir>\n\n# Map a local port to a docker instance\ndocker run -d -p 127.0.0.1:<local_port>:<docker_port> <image>\n\n# List the ports a docker container is running on\ndocker port <container>\n Troubleshooting\n\n# Show the logs of a container\ndocker logs <container>\n\n# Follow/tail the logs of a container\ndocker logs -f <container>\n\n# Show timestamps on docker logs\ndocker logs -t <container>\n\n# Show details/metadata of a container\ndocker inspect <container>\n\n# Show a 'top' view of processes running on a container\ndocker top <container>\n\n# Show a 'top' view of all docker containers\ndocker stats\n\n# Show any ﬁles that have changed since startup\ndocker diﬀ <container>\n\n# Connect to an already running container\ndocker attach <container>\n\n# Execute a command on a container\ndocker exec -it <container_id> /bin/bash\n\n# Show docker system wide information\ndocker system info\n\n# Show docker disk space used\ndocker system df\n\n \n\nDocker Compose\n\n# Start your docker-compose deﬁned resources in detached mode\ndocker-compose up -d -f <docker_compose_yaml>\n\n# Stop all docker-compose resources\ndocker-compose stop\n\n# Destroy all docker-compose resources\ndocker-compose down\n\n# Show docker-compose processes\ndocker-compose ps\n\n# Show docker-compose logs\ndocker-compose logs\n\n# Show docker-compose resource consumption\ndocker-compose top","depth":2,"end_index":1,"end_page":1,"node_id":"0005","physical_index":null,"references":[],"start_index":1,"start_page":1,"structure":"1.3","summary":"This section serves as a quick reference guide for essential Docker command-line operations. It details the necessary commands for managing data persistence through volumes, mapping ports, and copying files between host machines and containers. Additionally, it outlines troubleshooting utilities for monitoring container health and logs, alongside fundamental Docker Compose commands for orchestrating multi-container environments.","title":"Volumes & Ports","token_count":441}},"first_child":null,"last_child":null,"next_sibling":{"index1":6,"stamp":0},"parent":{"index1":2,"stamp":0},"previous_sibling":{"index1":4,"stamp":0},"stamp":0},{"data":{"Data":{"content":"# Show the logs of a container\ndocker logs <container>\n\n# Follow/tail the logs of a container\ndocker logs -f <container>\n\n# Show timestamps on docker logs\ndocker logs -t <container>\n\n# Show details/metadata of a container\ndocker inspect <container>\n\n# Show a 'top' view of processes running on a container\ndocker top <container>\n\n# Show a 'top' view of all docker containers\ndocker stats\n\n# Show any ﬁles that have changed since startup\ndocker diﬀ <container>\n\n# Connect to an already running container\ndocker attach <container>\n\n# Execute a command on a container\ndocker exec -it <container_id> /bin/bash\n\n# Show docker system wide information\ndocker system info\n\n# Show docker disk space used\ndocker system df\n\n \n\nDocker Compose\n\n# Start your docker-compose deﬁned resources in detached mode\ndocker-compose up -d -f <docker_compose_yaml>\n\n# Stop all docker-compose resources\ndocker-compose stop\n\n# Destroy all docker-compose resources\ndocker-compose down\n\n# Show docker-compose processes\ndocker-compose ps\n\n# Show docker-compose logs\ndocker-compose logs\n\n# Show docker-compose resource consumption\ndocker-compose top","depth":2,"end_index":1,"end_page":1,"node_id":"0006","physical_index":null,"references":[],"start_index":1,"start_page":1,"structure":"1.4","summary":"This section provides a reference list of essential Docker and Docker Compose CLI commands used for troubleshooting and managing container environments. It details commands for inspecting container metadata, viewing logs, tracking resource usage, and executing commands within running containers. Additionally, it outlines the basic lifecycle and monitoring commands required to manage Docker Compose applications.","title":"Troubleshooting","token_count":252}},"first_child":null,"last_child":null,"next_sibling":{"index1":7,"stamp":0},"parent":{"index1":2,"stamp":0},"previous_sibling":{"index1":5,"stamp":0},"stamp":0},{"data":{"Data":{"content":"# Start your docker-compose deﬁned resources in detached mode\ndocker-compose up -d -f <docker_compose_yaml>\n\n# Stop all docker-compose resources\ndocker-compose stop\n\n# Destroy all docker-compose resources\ndocker-compose down\n\n# Show docker-compose processes\ndocker-compose ps\n\n# Show docker-compose logs\ndocker-compose logs\n\n# Show docker-compose resource consumption\ndocker-compose top","depth":2,"end_index":1,"end_page":1,"node_id":"0007","physical_index":null,"references":[],"start_index":1,"start_page":1,"structure":"1.5","summary":"This section provides a quick reference guide for essential Docker Compose commands used to manage containerized environments. It outlines the CLI commands necessary to start, stop, and destroy resources, as well as how to monitor their active processes, logs, and overall resource consumption.","title":"Docker Compose","token_count":79}},"first_child":null,"last_child":null,"next_sibling":null,"parent":{"index1":2,"stamp":0},"previous_sibling":{"index1":6,"stamp":0},"stamp":0}]},"root_id":{"index1":1,"stamp":0}}}}
\ No newline at end of file
diff --git a/samples/1938cb46-4085-4a70-b9e6-70b97d3c8ba9.bin b/samples/1938cb46-4085-4a70-b9e6-70b97d3c8ba9.bin
deleted file mode 100644
index 31bd59ff..00000000
--- a/samples/1938cb46-4085-4a70-b9e6-70b97d3c8ba9.bin
+++ /dev/null
@@ -1 +0,0 @@
-{"version":1,"checksum":"b27a76f02225295a9dc19b2de9458d1200070919b226a005cd0b59e2c592584c","payload":{"meta":{"id":"1938cb46-4085-4a70-b9e6-70b97d3c8ba9","name":"","format":"md","source_path":"","description":"","page_count":null,"line_count":null,"created_at":"2026-04-12T04:46:05.866270414Z","modified_at":"2026-04-12T04:46:05.866414689Z","logic_fingerprint":"b25J69t2pTTx/z0WFznfGw==","processing_version":0,"node_count":9,"total_summary_tokens":444,"processing_duration_ms":69435},"tree":{"arena":{"nodes":[{"parent":null,"previous_sibling":null,"next_sibling":null,"first_child":{"index1":2,"stamp":0},"last_child":{"index1":2,"stamp":0},"stamp":0,"data":{"Data":{"title":"","structure":"","content":"","summary":"","depth":0,"start_index":1,"end_index":1,"start_page":null,"end_page":null,"node_id":"0001","physical_index":null,"token_count":null,"references":[]}}},{"parent":{"index1":1,"stamp":0},"previous_sibling":null,"next_sibling":null,"first_child":{"index1":3,"stamp":0},"last_child":{"index1":9,"stamp":0},"stamp":0,"data":{"Data":{"title":"Distributed Data Processing Platform","structure":"1","content":"","summary":"","depth":1,"start_index":1,"end_index":1,"start_page":null,"end_page":null,"node_id":"0002","physical_index":null,"token_count":null,"references":[]}}},{"parent":{"index1":2,"stamp":0},"previous_sibling":null,"next_sibling":{"index1":4,"stamp":0},"first_child":null,"last_child":null,"stamp":0,"data":{"Data":{"title":"Introduction","structure":"1.1","content":"This document provides a comprehensive overview of the distributed data processing platform architecture. The system is designed to handle petabyte-scale data workloads with sub-second query latency, supporting both real-time streaming and batch processing paradigms. The architecture follows a microservices-based approach with independent scaling capabilities for each component, enabling cost-effective resource utilization across varying workload patterns.","summary":"This document outlines a distributed data processing platform built to manage petabyte-scale workloads with sub-second latency. By utilizing a microservices-based architecture, the system supports both batch and real-time streaming while allowing independent component scaling for cost-effective resource utilization.","depth":2,"start_index":1,"end_index":1,"start_page":null,"end_page":null,"node_id":"0003","physical_index":null,"token_count":70,"references":[]}}},{"parent":{"index1":2,"stamp":0},"previous_sibling":{"index1":3,"stamp":0},"next_sibling":{"index1":9,"stamp":0},"first_child":{"index1":5,"stamp":0},"last_child":{"index1":8,"stamp":0},"stamp":0,"data":{"Data":{"title":"System Architecture","structure":"1.2","content":"The platform follows a layered architecture pattern with clear separation of concerns between ingestion, processing, storage, and serving layers. Each layer can be independently deployed, scaled, and upgraded without affecting other layers, following the principle of bounded contexts from domain-driven design. Inter-layer communication uses a combination of asynchronous message passing for data flow and synchronous gRPC calls for control plane operations.","summary":"This section details the platform's layered system architecture, encompassing the ingestion, processing, storage, and serving layers. It explains the separation of concerns and independent scalability based on domain-driven design principles. Furthermore, it covers the inter-layer communication mechanisms, including asynchronous message passing and synchronous gRPC calls.","depth":2,"start_index":1,"end_index":1,"start_page":null,"end_page":null,"node_id":"0004","physical_index":null,"token_count":73,"references":[]}}},{"parent":{"index1":4,"stamp":0},"previous_sibling":null,"next_sibling":{"index1":6,"stamp":0},"first_child":null,"last_child":null,"stamp":0,"data":{"Data":{"title":"Ingestion Layer","structure":"1.2.1","content":"The ingestion layer serves as the entry point for all data entering the platform. It supports multiple protocols including HTTP REST, gRPC, Apache Kafka, and AWS Kinesis. The layer is responsible for data validation, schema enforcement, initial transformation, and routing to downstream processing pipelines. Built on a reactive architecture using backpressure-aware operators, the ingestion layer gracefully handles burst traffic patterns without overwhelming downstream services.","summary":"The ingestion layer acts as the primary entry point for data entering the platform, supporting diverse protocols such as HTTP REST, gRPC, Kafka, and Kinesis. It handles critical preliminary tasks including data validation, schema enforcement, transformation, and routing. Additionally, its reactive, backpressure-aware architecture ensures that burst traffic is managed smoothly without overwhelming downstream services.","depth":3,"start_index":1,"end_index":1,"start_page":null,"end_page":null,"node_id":"0005","physical_index":null,"token_count":79,"references":[]}}},{"parent":{"index1":4,"stamp":0},"previous_sibling":{"index1":5,"stamp":0},"next_sibling":{"index1":7,"stamp":0},"first_child":null,"last_child":null,"stamp":0,"data":{"Data":{"title":"Processing Engine","structure":"1.2.2","content":"The processing engine is the core computational component of the platform, responsible for transforming, enriching, aggregating, and analyzing ingested data. It supports both stream processing for real-time analytics and batch processing for historical analysis. The engine is built on a custom execution framework that optimizes query plans based on data statistics and available compute resources.","summary":"The processing engine serves as the platform's core computational component, responsible for transforming and analyzing ingested data through both real-time stream and historical batch processing. It operates on a custom execution framework that dynamically optimizes query plans based on data statistics and available compute resources.","depth":3,"start_index":1,"end_index":1,"start_page":null,"end_page":null,"node_id":"0006","physical_index":null,"token_count":67,"references":[]}}},{"parent":{"index1":4,"stamp":0},"previous_sibling":{"index1":6,"stamp":0},"next_sibling":{"index1":8,"stamp":0},"first_child":null,"last_child":null,"stamp":0,"data":{"Data":{"title":"Storage Layer","structure":"1.2.3","content":"The storage layer provides a unified abstraction over multiple storage backends, each optimized for different access patterns. The hot tier uses an in-memory columnar cache for frequently accessed dimensions and recent fact data, providing microsecond-level access latency. The warm tier uses a distributed key-value store backed by NVMe SSDs for data accessed within the past 30 days. The cold tier uses object storage with Parquet file format for historical data, achieving cost efficiency at the expense of higher access latency.Data is automatically tiered based on configurable policies that consider access frequency, data age, and query patterns. The tiering engine runs as a background service that continuously monitors access patterns and migrates data between tiers. Metadata about data placement is maintained in a distributed metadata service built on etcd, which provides consistent reads and writes with linearizable semantics.","summary":"The storage layer uses a multi-tiered architecture—spanning an in-memory cache, NVMe SSDs, and cost-efficient object storage—to optimize performance for different data access patterns. A background tiering engine automatically migrates data between these hot, warm, and cold tiers based on access frequency, data age, and query patterns. Data placement metadata is consistently tracked and managed across these backends using a distributed etcd-based service.","depth":3,"start_index":1,"end_index":1,"start_page":null,"end_page":null,"node_id":"0007","physical_index":null,"token_count":165,"references":[]}}},{"parent":{"index1":4,"stamp":0},"previous_sibling":{"index1":7,"stamp":0},"next_sibling":null,"first_child":null,"last_child":null,"stamp":0,"data":{"Data":{"title":"Query Serving Layer","structure":"1.2.4","content":"The query serving layer provides the external-facing API for executing analytical queries against the processed data. It supports SQL queries via a PostgreSQL-compatible wire protocol, making it accessible to a wide range of BI tools and existing applications without requiring driver changes. The query router analyzes incoming queries and determines the optimal execution strategy, considering which storage tiers contain the relevant data and whether partial results can be served from cached aggregations.Query results are optionally materialized in a result cache that uses a time-to-live (TTL) policy combined with lazy invalidation based on upstream data freshness markers. The cache achieves a hit rate of approximately 85% for dashboard workloads, significantly reducing the computational load on the processing engine for repetitive query patterns.","summary":"The query serving layer acts as an external API for executing analytical SQL queries through a PostgreSQL-compatible protocol. It utilizes a query router to optimize execution strategies by analyzing storage tiers and leveraging cached aggregations. Additionally, a dedicated result cache achieves an 85% hit rate for dashboard workloads, significantly reducing the computational load on the processing engine.","depth":3,"start_index":1,"end_index":1,"start_page":null,"end_page":null,"node_id":"0008","physical_index":null,"token_count":142,"references":[]}}},{"parent":{"index1":2,"stamp":0},"previous_sibling":{"index1":4,"stamp":0},"next_sibling":null,"first_child":null,"last_child":null,"stamp":0,"data":{"Data":{"title":"Deployment and Operations","structure":"1.3","content":"The platform is deployed on Kubernetes with Helm charts that encapsulate all deployment configurations, resource limits, and scaling policies. Each microservice is packaged as a container image with multi-stage builds that minimize image size and attack surface. The CI/CD pipeline uses a GitOps workflow with ArgoCD, ensuring that all changes to production are auditable, reproducible, and reversible.Monitoring is implemented using a Prometheus and Grafana stack, with custom metrics exported by each service using a shared instrumentation library. Key performance indicators including query latency percentiles, ingestion throughput, processing lag, and error rates are tracked on operational dashboards with automated alerting through PagerDuty integration. Distributed tracing using OpenTelemetry provides end-to-end visibility into request flows across microservices, enabling rapid diagnosis of performance anomalies and error root causes.","summary":"The platform is deployed on Kubernetes using containerized microservices and managed through an automated GitOps CI/CD pipeline with ArgoCD. Comprehensive system observability is maintained using a Prometheus and Grafana stack for metrics, alongside OpenTelemetry for distributed tracing, which collectively enable automated alerting and rapid troubleshooting.","depth":2,"start_index":1,"end_index":1,"start_page":null,"end_page":null,"node_id":"0009","physical_index":null,"token_count":163,"references":[]}}}],"first_free_slot":null,"last_free_slot":null},"root_id":{"index1":1,"stamp":0}},"pages":[],"reasoning_index":{"topic_paths":{"migrates":[{"node_id":{"index1":7,"stamp":0},"weight":1.0,"depth":3}],"layered":[{"node_id":{"index1":4,"stamp":0},"weight":1.0,"depth":2}],"encompassing":[{"node_id":{"index1":4,"stamp":0},"weight":1.0,"depth":2}],"ssds":[{"node_id":{"index1":7,"stamp":0},"weight":1.0,"depth":3}],"ci":[{"node_id":{"index1":9,"stamp":0},"weight":1.0,"depth":2}],"architecture":[{"node_id":{"index1":4,"stamp":0},"weight":1.0,"depth":2},{"node_id":{"index1":3,"stamp":0},"weight":0.42857146,"depth":2},{"node_id":{"index1":5,"stamp":0},"weight":0.42857146,"depth":3},{"node_id":{"index1":7,"stamp":0},"weight":0.42857146,"depth":3}],"burst":[{"node_id":{"index1":5,"stamp":0},"weight":1.0,"depth":3}],"available":[{"node_id":{"index1":6,"stamp":0},"weight":1.0,"depth":3}],"significantly":[{"node_id":{"index1":8,"stamp":0},"weight":1.0,"depth":3}],"layers":[{"node_id":{"index1":4,"stamp":0},"weight":1.0,"depth":2}],"transforming":[{"node_id":{"index1":6,"stamp":0},"weight":1.0,"depth":3}],"scalability":[{"node_id":{"index1":4,"stamp":0},"weight":1.0,"depth":2}],"prometheus":[{"node_id":{"index1":9,"stamp":0},"weight":1.0,"depth":2}],"critical":[{"node_id":{"index1":5,"stamp":0},"weight":1.0,"depth":3}],"http":[{"node_id":{"index1":5,"stamp":0},"weight":1.0,"depth":3}],"section":[{"node_id":{"index1":4,"stamp":0},"weight":1.0,"depth":2}],"consistently":[{"node_id":{"index1":7,"stamp":0},"weight":1.0,"depth":3}],"point":[{"node_id":{"index1":5,"stamp":0},"weight":1.0,"depth":3}],"observability":[{"node_id":{"index1":9,"stamp":0},"weight":1.0,"depth":2}],"age":[{"node_id":{"index1":7,"stamp":0},"weight":1.0,"depth":3}],"compatible":[{"node_id":{"index1":8,"stamp":0},"weight":1.0,"depth":3}],"ingestion":[{"node_id":{"index1":5,"stamp":0},"weight":1.0,"depth":3},{"node_id":{"index1":4,"stamp":0},"weight":0.42857146,"depth":2}],"kubernetes":[{"node_id":{"index1":9,"stamp":0},"weight":1.0,"depth":2}],"reactive":[{"node_id":{"index1":5,"stamp":0},"weight":1.0,"depth":3}],"document":[{"node_id":{"index1":3,"stamp":0},"weight":1.0,"depth":2}],"principles":[{"node_id":{"index1":4,"stamp":0},"weight":1.0,"depth":2}],"kafka":[{"node_id":{"index1":5,"stamp":0},"weight":1.0,"depth":3}],"tiered":[{"node_id":{"index1":7,"stamp":0},"weight":1.0,"depth":3}],"alerting":[{"node_id":{"index1":9,"stamp":0},"weight":1.0,"depth":2}],"layer":[{"node_id":{"index1":7,"stamp":0},"weight":1.0,"depth":3},{"node_id":{"index1":5,"stamp":0},"weight":1.0,"depth":3},{"node_id":{"index1":8,"stamp":0},"weight":1.0,"depth":3},{"node_id":{"index1":4,"stamp":0},"weight":0.42857146,"depth":2}],"efficient":[{"node_id":{"index1":7,"stamp":0},"weight":1.0,"depth":3}],"passing":[{"node_id":{"index1":4,"stamp":0},"weight":1.0,"depth":2}],"built":[{"node_id":{"index1":3,"stamp":0},"weight":1.0,"depth":2}],"workloads":[{"node_id":{"index1":8,"stamp":0},"weight":1.0,"depth":3},{"node_id":{"index1":3,"stamp":0},"weight":1.0,"depth":2}],"details":[{"node_id":{"index1":4,"stamp":0},"weight":1.0,"depth":2}],"result":[{"node_id":{"index1":8,"stamp":0},"weight":1.0,"depth":3}],"aggregations":[{"node_id":{"index1":8,"stamp":0},"weight":1.0,"depth":3}],"routing":[{"node_id":{"index1":5,"stamp":0},"weight":1.0,"depth":3}],"diverse":[{"node_id":{"index1":5,"stamp":0},"weight":1.0,"depth":3}],"ingested":[{"node_id":{"index1":6,"stamp":0},"weight":1.0,"depth":3}],"external":[{"node_id":{"index1":8,"stamp":0},"weight":1.0,"depth":3}],"manage":[{"node_id":{"index1":3,"stamp":0},"weight":1.0,"depth":2}],"resources":[{"node_id":{"index1":6,"stamp":0},"weight":1.0,"depth":3}],"domain":[{"node_id":{"index1":4,"stamp":0},"weight":1.0,"depth":2}],"outlines":[{"node_id":{"index1":3,"stamp":0},"weight":1.0,"depth":2}],"cd":[{"node_id":{"index1":9,"stamp":0},"weight":1.0,"depth":2}],"responsible":[{"node_id":{"index1":6,"stamp":0},"weight":1.0,"depth":3}],"automated":[{"node_id":{"index1":9,"stamp":0},"weight":1.0,"depth":2}],"hit":[{"node_id":{"index1":8,"stamp":0},"weight":1.0,"depth":3}],"hot":[{"node_id":{"index1":7,"stamp":0},"weight":1.0,"depth":3}],"without":[{"node_id":{"index1":5,"stamp":0},"weight":1.0,"depth":3}],"ensures":[{"node_id":{"index1":5,"stamp":0},"weight":1.0,"depth":3}],"preliminary":[{"node_id":{"index1":5,"stamp":0},"weight":1.0,"depth":3}],"metrics":[{"node_id":{"index1":9,"stamp":0},"weight":1.0,"depth":2}],"both":[{"node_id":{"index1":6,"stamp":0},"weight":1.0,"depth":3},{"node_id":{"index1":3,"stamp":0},"weight":1.0,"depth":2}],"queries":[{"node_id":{"index1":8,"stamp":0},"weight":1.0,"depth":3}],"object":[{"node_id":{"index1":7,"stamp":0},"weight":1.0,"depth":3}],"validation":[{"node_id":{"index1":5,"stamp":0},"weight":1.0,"depth":3}],"real":[{"node_id":{"index1":6,"stamp":0},"weight":1.0,"depth":3},{"node_id":{"index1":3,"stamp":0},"weight":1.0,"depth":2}],"introduction":[{"node_id":{"index1":3,"stamp":0},"weight":1.0,"depth":2}],"execution":[{"node_id":{"index1":6,"stamp":0},"weight":1.0,"depth":3},{"node_id":{"index1":8,"stamp":0},"weight":1.0,"depth":3}],"deployment":[{"node_id":{"index1":9,"stamp":0},"weight":1.0,"depth":2}],"platform":[{"node_id":{"index1":2,"stamp":0},"weight":1.0,"depth":1},{"node_id":{"index1":3,"stamp":0},"weight":0.75,"depth":2},{"node_id":{"index1":5,"stamp":0},"weight":0.75,"depth":3},{"node_id":{"index1":4,"stamp":0},"weight":0.75,"depth":2},{"node_id":{"index1":6,"stamp":0},"weight":0.75,"depth":3},{"node_id":{"index1":9,"stamp":0},"weight":0.75,"depth":2}],"grpc":[{"node_id":{"index1":4,"stamp":0},"weight":1.0,"depth":2},{"node_id":{"index1":5,"stamp":0},"weight":1.0,"depth":3}],"85":[{"node_id":{"index1":8,"stamp":0},"weight":1.0,"depth":3}],"supporting":[{"node_id":{"index1":5,"stamp":0},"weight":1.0,"depth":3}],"sql":[{"node_id":{"index1":8,"stamp":0},"weight":1.0,"depth":3}],"entering":[{"node_id":{"index1":5,"stamp":0},"weight":1.0,"depth":3}],"optimizes":[{"node_id":{"index1":6,"stamp":0},"weight":1.0,"depth":3}],"analytical":[{"node_id":{"index1":8,"stamp":0},"weight":1.0,"depth":3}],"microservices":[{"node_id":{"index1":9,"stamp":0},"weight":1.0,"depth":2},{"node_id":{"index1":3,"stamp":0},"weight":1.0,"depth":2}],"across":[{"node_id":{"index1":7,"stamp":0},"weight":1.0,"depth":3}],"system":[{"node_id":{"index1":4,"stamp":0},"weight":1.0,"depth":2},{"node_id":{"index1":9,"stamp":0},"weight":0.42857146,"depth":2},{"node_id":{"index1":3,"stamp":0},"weight":0.42857146,"depth":2}],"grafana":[{"node_id":{"index1":9,"stamp":0},"weight":1.0,"depth":2}],"plans":[{"node_id":{"index1":6,"stamp":0},"weight":1.0,"depth":3}],"rest":[{"node_id":{"index1":5,"stamp":0},"weight":1.0,"depth":3}],"utilization":[{"node_id":{"index1":3,"stamp":0},"weight":1.0,"depth":2}],"multi":[{"node_id":{"index1":7,"stamp":0},"weight":1.0,"depth":3}],"postgresql":[{"node_id":{"index1":8,"stamp":0},"weight":1.0,"depth":3}],"rate":[{"node_id":{"index1":8,"stamp":0},"weight":1.0,"depth":3}],"transformation":[{"node_id":{"index1":5,"stamp":0},"weight":1.0,"depth":3}],"strategies":[{"node_id":{"index1":8,"stamp":0},"weight":1.0,"depth":3}],"background":[{"node_id":{"index1":7,"stamp":0},"weight":1.0,"depth":3}],"message":[{"node_id":{"index1":4,"stamp":0},"weight":1.0,"depth":2}],"protocol":[{"node_id":{"index1":8,"stamp":0},"weight":1.0,"depth":3}],"enable":[{"node_id":{"index1":9,"stamp":0},"weight":1.0,"depth":2}],"component":[{"node_id":{"index1":6,"stamp":0},"weight":1.0,"depth":3},{"node_id":{"index1":3,"stamp":0},"weight":1.0,"depth":2}],"tiering":[{"node_id":{"index1":7,"stamp":0},"weight":1.0,"depth":3}],"second":[{"node_id":{"index1":3,"stamp":0},"weight":1.0,"depth":2}],"dashboard":[{"node_id":{"index1":8,"stamp":0},"weight":1.0,"depth":3}],"independent":[{"node_id":{"index1":3,"stamp":0},"weight":1.0,"depth":2},{"node_id":{"index1":4,"stamp":0},"weight":1.0,"depth":2}],"downstream":[{"node_id":{"index1":5,"stamp":0},"weight":1.0,"depth":3}],"primary":[{"node_id":{"index1":5,"stamp":0},"weight":1.0,"depth":3}],"using":[{"node_id":{"index1":9,"stamp":0},"weight":1.0,"depth":2},{"node_id":{"index1":7,"stamp":0},"weight":0.5,"depth":3}],"cost":[{"node_id":{"index1":3,"stamp":0},"weight":1.0,"depth":2},{"node_id":{"index1":7,"stamp":0},"weight":1.0,"depth":3}],"based":[{"node_id":{"index1":7,"stamp":0},"weight":1.0,"depth":3},{"node_id":{"index1":3,"stamp":0},"weight":0.5,"depth":2},{"node_id":{"index1":6,"stamp":0},"weight":0.5,"depth":3},{"node_id":{"index1":4,"stamp":0},"weight":0.5,"depth":2}],"scale":[{"node_id":{"index1":3,"stamp":0},"weight":1.0,"depth":2}],"uses":[{"node_id":{"index1":7,"stamp":0},"weight":1.0,"depth":3}],"etcd":[{"node_id":{"index1":7,"stamp":0},"weight":1.0,"depth":3}],"driven":[{"node_id":{"index1":4,"stamp":0},"weight":1.0,"depth":2}],"backpressure":[{"node_id":{"index1":5,"stamp":0},"weight":1.0,"depth":3}],"explains":[{"node_id":{"index1":4,"stamp":0},"weight":1.0,"depth":2}],"streaming":[{"node_id":{"index1":3,"stamp":0},"weight":1.0,"depth":2}],"schema":[{"node_id":{"index1":5,"stamp":0},"weight":1.0,"depth":3}],"executing":[{"node_id":{"index1":8,"stamp":0},"weight":1.0,"depth":3}],"operations":[{"node_id":{"index1":9,"stamp":0},"weight":1.0,"depth":2}],"automatically":[{"node_id":{"index1":7,"stamp":0},"weight":1.0,"depth":3}],"cached":[{"node_id":{"index1":8,"stamp":0},"weight":1.0,"depth":3}],"collectively":[{"node_id":{"index1":9,"stamp":0},"weight":1.0,"depth":2}],"aware":[{"node_id":{"index1":5,"stamp":0},"weight":1.0,"depth":3}],"design":[{"node_id":{"index1":4,"stamp":0},"weight":1.0,"depth":2}],"overwhelming":[{"node_id":{"index1":5,"stamp":0},"weight":1.0,"depth":3}],"opentelemetry":[{"node_id":{"index1":9,"stamp":0},"weight":1.0,"depth":2}],"comprehensive":[{"node_id":{"index1":9,"stamp":0},"weight":1.0,"depth":2}],"inter":[{"node_id":{"index1":4,"stamp":0},"weight":1.0,"depth":2}],"protocols":[{"node_id":{"index1":5,"stamp":0},"weight":1.0,"depth":3}],"achieves":[{"node_id":{"index1":8,"stamp":0},"weight":1.0,"depth":3}],"cache":[{"node_id":{"index1":7,"stamp":0},"weight":1.0,"depth":3},{"node_id":{"index1":8,"stamp":0},"weight":1.0,"depth":3}],"alongside":[{"node_id":{"index1":9,"stamp":0},"weight":1.0,"depth":2}],"data":[{"node_id":{"index1":7,"stamp":0},"weight":1.0,"depth":3},{"node_id":{"index1":5,"stamp":0},"weight":0.5,"depth":3},{"node_id":{"index1":6,"stamp":0},"weight":0.5,"depth":3},{"node_id":{"index1":2,"stamp":0},"weight":0.33333334,"depth":1},{"node_id":{"index1":3,"stamp":0},"weight":0.25,"depth":2}],"metadata":[{"node_id":{"index1":7,"stamp":0},"weight":1.0,"depth":3}],"nvme":[{"node_id":{"index1":7,"stamp":0},"weight":1.0,"depth":3}],"memory":[{"node_id":{"index1":7,"stamp":0},"weight":1.0,"depth":3}],"troubleshooting":[{"node_id":{"index1":9,"stamp":0},"weight":1.0,"depth":2}],"query":[{"node_id":{"index1":8,"stamp":0},"weight":1.0,"depth":3},{"node_id":{"index1":7,"stamp":0},"weight":0.3,"depth":3},{"node_id":{"index1":6,"stamp":0},"weight":0.3,"depth":3}],"tiers":[{"node_id":{"index1":7,"stamp":0},"weight":1.0,"depth":3},{"node_id":{"index1":8,"stamp":0},"weight":1.0,"depth":3}],"frequency":[{"node_id":{"index1":7,"stamp":0},"weight":1.0,"depth":3}],"computational":[{"node_id":{"index1":6,"stamp":0},"weight":1.0,"depth":3},{"node_id":{"index1":8,"stamp":0},"weight":1.0,"depth":3}],"compute":[{"node_id":{"index1":6,"stamp":0},"weight":1.0,"depth":3}],"dedicated":[{"node_id":{"index1":8,"stamp":0},"weight":1.0,"depth":3}],"furthermore":[{"node_id":{"index1":4,"stamp":0},"weight":1.0,"depth":2}],"statistics":[{"node_id":{"index1":6,"stamp":0},"weight":1.0,"depth":3}],"operates":[{"node_id":{"index1":6,"stamp":0},"weight":1.0,"depth":3}],"managed":[{"node_id":{"index1":9,"stamp":0},"weight":1.0,"depth":2},{"node_id":{"index1":5,"stamp":0},"weight":1.0,"depth":3},{"node_id":{"index1":7,"stamp":0},"weight":1.0,"depth":3}],"leveraging":[{"node_id":{"index1":8,"stamp":0},"weight":1.0,"depth":3}],"synchronous":[{"node_id":{"index1":4,"stamp":0},"weight":1.0,"depth":2}],"tracked":[{"node_id":{"index1":7,"stamp":0},"weight":1.0,"depth":3}],"containerized":[{"node_id":{"index1":9,"stamp":0},"weight":1.0,"depth":2}],"mechanisms":[{"node_id":{"index1":4,"stamp":0},"weight":1.0,"depth":2}],"traffic":[{"node_id":{"index1":5,"stamp":0},"weight":1.0,"depth":3}],"dynamically":[{"node_id":{"index1":6,"stamp":0},"weight":1.0,"depth":3}],"latency":[{"node_id":{"index1":3,"stamp":0},"weight":1.0,"depth":2}],"tasks":[{"node_id":{"index1":5,"stamp":0},"weight":1.0,"depth":3}],"router":[{"node_id":{"index1":8,"stamp":0},"weight":1.0,"depth":3}],"covers":[{"node_id":{"index1":4,"stamp":0},"weight":1.0,"depth":2}],"custom":[{"node_id":{"index1":6,"stamp":0},"weight":1.0,"depth":3}],"additionally":[{"node_id":{"index1":5,"stamp":0},"weight":1.0,"depth":3},{"node_id":{"index1":8,"stamp":0},"weight":1.0,"depth":3}],"service":[{"node_id":{"index1":7,"stamp":0},"weight":1.0,"depth":3}],"services":[{"node_id":{"index1":5,"stamp":0},"weight":1.0,"depth":3}],"serves":[{"node_id":{"index1":6,"stamp":0},"weight":1.0,"depth":3}],"tracing":[{"node_id":{"index1":9,"stamp":0},"weight":1.0,"depth":2}],"warm":[{"node_id":{"index1":7,"stamp":0},"weight":1.0,"depth":3}],"different":[{"node_id":{"index1":7,"stamp":0},"weight":1.0,"depth":3}],"utilizing":[{"node_id":{"index1":3,"stamp":0},"weight":1.0,"depth":2}],"cold":[{"node_id":{"index1":7,"stamp":0},"weight":1.0,"depth":3}],"supports":[{"node_id":{"index1":3,"stamp":0},"weight":1.0,"depth":2}],"stack":[{"node_id":{"index1":9,"stamp":0},"weight":1.0,"depth":2}],"rapid":[{"node_id":{"index1":9,"stamp":0},"weight":1.0,"depth":2}],"deployed":[{"node_id":{"index1":9,"stamp":0},"weight":1.0,"depth":2}],"core":[{"node_id":{"index1":6,"stamp":0},"weight":1.0,"depth":3}],"allowing":[{"node_id":{"index1":3,"stamp":0},"weight":1.0,"depth":2}],"resource":[{"node_id":{"index1":3,"stamp":0},"weight":1.0,"depth":2}],"reducing":[{"node_id":{"index1":8,"stamp":0},"weight":1.0,"depth":3}],"backends":[{"node_id":{"index1":7,"stamp":0},"weight":1.0,"depth":3}],"gitops":[{"node_id":{"index1":9,"stamp":0},"weight":1.0,"depth":2}],"patterns":[{"node_id":{"index1":7,"stamp":0},"weight":1.0,"depth":3}],"communication":[{"node_id":{"index1":4,"stamp":0},"weight":1.0,"depth":2}],"maintained":[{"node_id":{"index1":9,"stamp":0},"weight":1.0,"depth":2}],"effective":[{"node_id":{"index1":3,"stamp":0},"weight":1.0,"depth":2}],"acts":[{"node_id":{"index1":5,"stamp":0},"weight":1.0,"depth":3},{"node_id":{"index1":8,"stamp":0},"weight":1.0,"depth":3}],"petabyte":[{"node_id":{"index1":3,"stamp":0},"weight":1.0,"depth":2}],"load":[{"node_id":{"index1":8,"stamp":0},"weight":1.0,"depth":3}],"handles":[{"node_id":{"index1":5,"stamp":0},"weight":1.0,"depth":3}],"historical":[{"node_id":{"index1":6,"stamp":0},"weight":1.0,"depth":3}],"scaling":[{"node_id":{"index1":3,"stamp":0},"weight":1.0,"depth":2}],"separation":[{"node_id":{"index1":4,"stamp":0},"weight":1.0,"depth":2}],"stream":[{"node_id":{"index1":6,"stamp":0},"weight":1.0,"depth":3}],"asynchronous":[{"node_id":{"index1":4,"stamp":0},"weight":1.0,"depth":2}],"access":[{"node_id":{"index1":7,"stamp":0},"weight":1.0,"depth":3}],"framework":[{"node_id":{"index1":6,"stamp":0},"weight":1.0,"depth":3}],"processing":[{"node_id":{"index1":6,"stamp":0},"weight":1.0,"depth":3},{"node_id":{"index1":2,"stamp":0},"weight":0.4,"depth":1},{"node_id":{"index1":8,"stamp":0},"weight":0.3,"depth":3},{"node_id":{"index1":3,"stamp":0},"weight":0.3,"depth":2},{"node_id":{"index1":4,"stamp":0},"weight":0.3,"depth":2}],"pipeline":[{"node_id":{"index1":9,"stamp":0},"weight":1.0,"depth":2}],"batch":[{"node_id":{"index1":6,"stamp":0},"weight":1.0,"depth":3},{"node_id":{"index1":3,"stamp":0},"weight":1.0,"depth":2}],"including":[{"node_id":{"index1":5,"stamp":0},"weight":1.0,"depth":3},{"node_id":{"index1":4,"stamp":0},"weight":1.0,"depth":2}],"serving":[{"node_id":{"index1":8,"stamp":0},"weight":1.0,"depth":3},{"node_id":{"index1":4,"stamp":0},"weight":0.42857146,"depth":2}],"spanning":[{"node_id":{"index1":7,"stamp":0},"weight":1.0,"depth":3}],"performance":[{"node_id":{"index1":7,"stamp":0},"weight":1.0,"depth":3}],"storage":[{"node_id":{"index1":7,"stamp":0},"weight":1.0,"depth":3},{"node_id":{"index1":8,"stamp":0},"weight":0.3,"depth":3},{"node_id":{"index1":4,"stamp":0},"weight":0.3,"depth":2}],"time":[{"node_id":{"index1":3,"stamp":0},"weight":1.0,"depth":2},{"node_id":{"index1":6,"stamp":0},"weight":1.0,"depth":3}],"analyzing":[{"node_id":{"index1":8,"stamp":0},"weight":1.0,"depth":3},{"node_id":{"index1":6,"stamp":0},"weight":1.0,"depth":3}],"argocd":[{"node_id":{"index1":9,"stamp":0},"weight":1.0,"depth":2}],"enforcement":[{"node_id":{"index1":5,"stamp":0},"weight":1.0,"depth":3}],"placement":[{"node_id":{"index1":7,"stamp":0},"weight":1.0,"depth":3}],"utilizes":[{"node_id":{"index1":8,"stamp":0},"weight":1.0,"depth":3}],"entry":[{"node_id":{"index1":5,"stamp":0},"weight":1.0,"depth":3}],"concerns":[{"node_id":{"index1":4,"stamp":0},"weight":1.0,"depth":2}],"api":[{"node_id":{"index1":8,"stamp":0},"weight":1.0,"depth":3}],"optimize":[{"node_id":{"index1":7,"stamp":0},"weight":1.0,"depth":3},{"node_id":{"index1":8,"stamp":0},"weight":1.0,"depth":3}],"calls":[{"node_id":{"index1":4,"stamp":0},"weight":1.0,"depth":2}],"smoothly":[{"node_id":{"index1":5,"stamp":0},"weight":1.0,"depth":3}],"sub":[{"node_id":{"index1":3,"stamp":0},"weight":1.0,"depth":2}],"engine":[{"node_id":{"index1":6,"stamp":0},"weight":1.0,"depth":3},{"node_id":{"index1":7,"stamp":0},"weight":0.42857146,"depth":3},{"node_id":{"index1":8,"stamp":0},"weight":0.42857146,"depth":3}],"distributed":[{"node_id":{"index1":2,"stamp":0},"weight":1.0,"depth":1},{"node_id":{"index1":3,"stamp":0},"weight":0.75,"depth":2},{"node_id":{"index1":7,"stamp":0},"weight":0.75,"depth":3},{"node_id":{"index1":9,"stamp":0},"weight":0.75,"depth":2}],"kinesis":[{"node_id":{"index1":5,"stamp":0},"weight":1.0,"depth":3}]},"summary_shortcut":{"root_node":{"index1":1,"stamp":0},"section_summaries":[{"node_id":{"index1":2,"stamp":0},"title":"Distributed Data Processing Platform","summary":"","depth":1}],"document_summary":""},"hot_nodes":{},"section_map":{"1":{"index1":2,"stamp":0},"distributed data processing platform":{"index1":2,"stamp":0}},"config_hash":0}}}
\ No newline at end of file
diff --git a/samples/5dfb586a-9e7a-4087-ad8f-24eb09281269.bin b/samples/5dfb586a-9e7a-4087-ad8f-24eb09281269.bin
deleted file mode 100644
index 79f6334c..00000000
--- a/samples/5dfb586a-9e7a-4087-ad8f-24eb09281269.bin
+++ /dev/null
@@ -1 +0,0 @@
-{"version":1,"checksum":"6206fa81bcf6e44bec5cfb6d37648200284f7ad8e0bb1583e4d771a31fde9164","payload":{"meta":{"id":"5dfb586a-9e7a-4087-ad8f-24eb09281269","name":"","format":"md","source_path":"","description":"","page_count":null,"line_count":null,"created_at":"2026-04-12T04:20:11.338077430Z","modified_at":"2026-04-12T04:20:11.338222591Z","logic_fingerprint":"b25J69t2pTTx/z0WFznfGw==","processing_version":0,"node_count":9,"total_summary_tokens":0,"processing_duration_ms":31021},"tree":{"arena":{"nodes":[{"parent":null,"previous_sibling":null,"next_sibling":null,"first_child":{"index1":2,"stamp":0},"last_child":{"index1":2,"stamp":0},"stamp":0,"data":{"Data":{"title":"","structure":"","content":"","summary":"","depth":0,"start_index":1,"end_index":1,"start_page":null,"end_page":null,"node_id":"0001","physical_index":null,"token_count":null,"references":[]}}},{"parent":{"index1":1,"stamp":0},"previous_sibling":null,"next_sibling":null,"first_child":{"index1":3,"stamp":0},"last_child":{"index1":9,"stamp":0},"stamp":0,"data":{"Data":{"title":"Distributed Data Processing Platform","structure":"1","content":"","summary":"","depth":1,"start_index":1,"end_index":1,"start_page":null,"end_page":null,"node_id":"0002","physical_index":null,"token_count":null,"references":[]}}},{"parent":{"index1":2,"stamp":0},"previous_sibling":null,"next_sibling":{"index1":4,"stamp":0},"first_child":null,"last_child":null,"stamp":0,"data":{"Data":{"title":"Introduction","structure":"1.1","content":"This document provides a comprehensive overview of the distributed data processing platform architecture. The system is designed to handle petabyte-scale data workloads with sub-second query latency, supporting both real-time streaming and batch processing paradigms. The architecture follows a microservices-based approach with independent scaling capabilities for each component, enabling cost-effective resource utilization across varying workload patterns.","summary":"This document outlines a distributed data processing platform designed to handle petabyte-scale workloads with sub-second latency for both streaming and batch processing. It utilizes a microservices-based architecture that enables independent component scaling to ensure cost-effective resource utilization.","depth":2,"start_index":1,"end_index":1,"start_page":null,"end_page":null,"node_id":"0003","physical_index":null,"token_count":70,"references":[]}}},{"parent":{"index1":2,"stamp":0},"previous_sibling":{"index1":3,"stamp":0},"next_sibling":{"index1":9,"stamp":0},"first_child":{"index1":5,"stamp":0},"last_child":{"index1":8,"stamp":0},"stamp":0,"data":{"Data":{"title":"System Architecture","structure":"1.2","content":"The platform follows a layered architecture pattern with clear separation of concerns between ingestion, processing, storage, and serving layers. Each layer can be independently deployed, scaled, and upgraded without affecting other layers, following the principle of bounded contexts from domain-driven design. Inter-layer communication uses a combination of asynchronous message passing for data flow and synchronous gRPC calls for control plane operations.","summary":"This section provides an overview of the platform's layered system architecture, specifically detailing the ingestion, processing, storage, and serving components. It also covers deployment and scaling strategies based on domain-driven design, as well as the inter-layer communication protocols utilizing asynchronous messaging and synchronous gRPC.","depth":2,"start_index":1,"end_index":1,"start_page":null,"end_page":null,"node_id":"0004","physical_index":null,"token_count":73,"references":[]}}},{"parent":{"index1":4,"stamp":0},"previous_sibling":null,"next_sibling":{"index1":6,"stamp":0},"first_child":null,"last_child":null,"stamp":0,"data":{"Data":{"title":"Ingestion Layer","structure":"1.2.1","content":"The ingestion layer serves as the entry point for all data entering the platform. It supports multiple protocols including HTTP REST, gRPC, Apache Kafka, and AWS Kinesis. The layer is responsible for data validation, schema enforcement, initial transformation, and routing to downstream processing pipelines. Built on a reactive architecture using backpressure-aware operators, the ingestion layer gracefully handles burst traffic patterns without overwhelming downstream services.","summary":"The ingestion layer serves as the primary entry point for platform data, supporting multiple protocols such as HTTP REST, gRPC, Apache Kafka, and AWS Kinesis. It is responsible for essential early-stage tasks including data validation, schema enforcement, transformation, and routing to downstream pipelines. Additionally, its reactive, backpressure-aware architecture enables it to gracefully handle sudden traffic bursts without overwhelming downstream services.","depth":3,"start_index":1,"end_index":1,"start_page":null,"end_page":null,"node_id":"0005","physical_index":null,"token_count":79,"references":[]}}},{"parent":{"index1":4,"stamp":0},"previous_sibling":{"index1":5,"stamp":0},"next_sibling":{"index1":7,"stamp":0},"first_child":null,"last_child":null,"stamp":0,"data":{"Data":{"title":"Processing Engine","structure":"1.2.2","content":"The processing engine is the core computational component of the platform, responsible for transforming, enriching, aggregating, and analyzing ingested data. It supports both stream processing for real-time analytics and batch processing for historical analysis. The engine is built on a custom execution framework that optimizes query plans based on data statistics and available compute resources.","summary":"The processing engine serves as the platform's core computational component, responsible for transforming, enriching, and analyzing ingested data. It supports both real-time stream processing and historical batch processing, utilizing a custom execution framework to dynamically optimize query plans based on data statistics and available compute resources.","depth":3,"start_index":1,"end_index":1,"start_page":null,"end_page":null,"node_id":"0006","physical_index":null,"token_count":67,"references":[]}}},{"parent":{"index1":4,"stamp":0},"previous_sibling":{"index1":6,"stamp":0},"next_sibling":{"index1":8,"stamp":0},"first_child":null,"last_child":null,"stamp":0,"data":{"Data":{"title":"Storage Layer","structure":"1.2.3","content":"The storage layer provides a unified abstraction over multiple storage backends, each optimized for different access patterns. The hot tier uses an in-memory columnar cache for frequently accessed dimensions and recent fact data, providing microsecond-level access latency. The warm tier uses a distributed key-value store backed by NVMe SSDs for data accessed within the past 30 days. The cold tier uses object storage with Parquet file format for historical data, achieving cost efficiency at the expense of higher access latency.Data is automatically tiered based on configurable policies that consider access frequency, data age, and query patterns. The tiering engine runs as a background service that continuously monitors access patterns and migrates data between tiers. Metadata about data placement is maintained in a distributed metadata service built on etcd, which provides consistent reads and writes with linearizable semantics.","summary":"The storage layer is a multi-tiered system using in-memory caching (hot), NVMe SSDs (warm), and object storage (cold) to optimize for both performance and cost. A background tiering engine automatically migrates data between these tiers based on configurable policies, data age, and access patterns. The system relies on an etcd-backed distributed metadata service to consistently track data placement across all tiers.","depth":3,"start_index":1,"end_index":1,"start_page":null,"end_page":null,"node_id":"0007","physical_index":null,"token_count":165,"references":[]}}},{"parent":{"index1":4,"stamp":0},"previous_sibling":{"index1":7,"stamp":0},"next_sibling":null,"first_child":null,"last_child":null,"stamp":0,"data":{"Data":{"title":"Query Serving Layer","structure":"1.2.4","content":"The query serving layer provides the external-facing API for executing analytical queries against the processed data. It supports SQL queries via a PostgreSQL-compatible wire protocol, making it accessible to a wide range of BI tools and existing applications without requiring driver changes. The query router analyzes incoming queries and determines the optimal execution strategy, considering which storage tiers contain the relevant data and whether partial results can be served from cached aggregations.Query results are optionally materialized in a result cache that uses a time-to-live (TTL) policy combined with lazy invalidation based on upstream data freshness markers. The cache achieves a hit rate of approximately 85% for dashboard workloads, significantly reducing the computational load on the processing engine for repetitive query patterns.","summary":"The query serving layer provides a PostgreSQL-compatible API that enables BI tools to seamlessly execute analytical queries. A query router optimizes these executions by evaluating storage tiers and available cached aggregations. To significantly reduce the computational load on the processing engine, a result cache utilizing time-to-live and lazy invalidation achieves an 85% hit rate for dashboard workloads.","depth":3,"start_index":1,"end_index":1,"start_page":null,"end_page":null,"node_id":"0008","physical_index":null,"token_count":142,"references":[]}}},{"parent":{"index1":2,"stamp":0},"previous_sibling":{"index1":4,"stamp":0},"next_sibling":null,"first_child":null,"last_child":null,"stamp":0,"data":{"Data":{"title":"Deployment and Operations","structure":"1.3","content":"The platform is deployed on Kubernetes with Helm charts that encapsulate all deployment configurations, resource limits, and scaling policies. Each microservice is packaged as a container image with multi-stage builds that minimize image size and attack surface. The CI/CD pipeline uses a GitOps workflow with ArgoCD, ensuring that all changes to production are auditable, reproducible, and reversible.Monitoring is implemented using a Prometheus and Grafana stack, with custom metrics exported by each service using a shared instrumentation library. Key performance indicators including query latency percentiles, ingestion throughput, processing lag, and error rates are tracked on operational dashboards with automated alerting through PagerDuty integration. Distributed tracing using OpenTelemetry provides end-to-end visibility into request flows across microservices, enabling rapid diagnosis of performance anomalies and error root causes.","summary":"The platform is deployed on Kubernetes using Helm charts and containerized microservices, with deployments managed by a GitOps CI/CD pipeline via ArgoCD to ensure auditable and reversible updates. Operational observability is achieved using a Prometheus and Grafana stack for monitoring key metrics and automated alerting, combined with OpenTelemetry for distributed tracing to rapidly diagnose performance issues.","depth":2,"start_index":1,"end_index":1,"start_page":null,"end_page":null,"node_id":"0009","physical_index":null,"token_count":163,"references":[]}}}],"first_free_slot":null,"last_free_slot":null},"root_id":{"index1":1,"stamp":0}},"pages":[]}}
\ No newline at end of file
diff --git a/samples/Docker_Cheat_Sheet.pdf b/samples/Docker_Cheat_Sheet.pdf
deleted file mode 100755
index 0768f1c3..00000000
Binary files a/samples/Docker_Cheat_Sheet.pdf and /dev/null differ
diff --git a/samples/_graph.bin b/samples/_graph.bin
deleted file mode 100644
index e1d51275..00000000
--- a/samples/_graph.bin
+++ /dev/null
@@ -1 +0,0 @@
-{"nodes":{"5dfb586a-9e7a-4087-ad8f-24eb09281269":{"doc_id":"5dfb586a-9e7a-4087-ad8f-24eb09281269","title":"","format":"md","top_keywords":[],"node_count":9}},"edges":{},"keyword_index":{},"metadata":{"document_count":1,"edge_count":0}}
\ No newline at end of file
diff --git a/samples/meta.bin b/samples/meta.bin
deleted file mode 100644
index a3518791..00000000
--- a/samples/meta.bin
+++ /dev/null
@@ -1,23 +0,0 @@
-{
-  "5dfb586a-9e7a-4087-ad8f-24eb09281269": {
-    "id": "5dfb586a-9e7a-4087-ad8f-24eb09281269",
-    "doc_name": "",
-    "doc_description": "",
-    "doc_type": "md",
-    "path": ""
-  },
-  "6998c9ff-ba74-4762-9f3e-c838be2f425b": {
-    "id": "6998c9ff-ba74-4762-9f3e-c838be2f425b",
-    "doc_name": "",
-    "doc_description": "",
-    "doc_type": "md",
-    "path": ""
-  },
-  "1938cb46-4085-4a70-b9e6-70b97d3c8ba9": {
-    "id": "1938cb46-4085-4a70-b9e6-70b97d3c8ba9",
-    "doc_name": "",
-    "doc_description": "",
-    "doc_type": "md",
-    "path": ""
-  }
-}
\ No newline at end of file