vectorlessflow · zTgx · Apr 20, 2026 · Apr 18, 2026 · Apr 18, 2026 · Apr 18, 2026
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -1,28 +1,51 @@
 # CLAUDE.md
 
-A hierarchical, reasoning-native document intelligence engine written in Rust.
+Vectorless is a reasoning-native document intelligence engine written in Rust.
+
+## Principles
+
+- **Reason, don't vector.** — Every retrieval decision is an LLM decision.
+- **Model fails, we fail.** — No silent degradation. No heuristic fallbacks.
+- **No thought, no answer.** — Only LLM-reasoned output counts as an answer.
 
 ## Project Structure
 
 - `rust/` - Rust core engine
-  - `src/client/` - Client API (EngineBuilder, Engine)
-  - `src/config/` - Configuration types
-  - `src/document/` - Document parsers (Markdown, PDF)
-  - `src/index/` - Index building and pipeline
-  - `src/retrieval/` - Retrieval engine (beam search, MCTS, greedy, hybrid strategies)
-  - `src/storage/` - Storage layer
-  - `src/llm/` - LLM client abstraction
+  - `src/client/` - Client API (EngineBuilder, Engine) - facade layer, no business logic
+  - `src/document/` - Document data structures (DocumentTree, NavigationIndex, ReasoningIndex)
+  - `src/index/` - Compile pipeline (8-stage, checkpointing, incremental update)
+  - `src/retrieval/` - Retrieval dispatch layer (preprocessing, dispatch, postprocessing, cache, streaming)
+  - `src/query/` - Query understanding and planning (intent classification, rewrite, decomposition)
+  - `src/agent/` - Retrieval execution (Worker: doc navigation, Orchestrator: supervisor loop + multi-doc fusion)
+  - `src/rerank/` - Result reranking and answer synthesis (dedup, scoring, fusion, synthesis)
+  - `src/scoring/` - Scoring and ranking strategies (BM25, relevance scoring, score combination)
+  - `src/llm/` - LLM client (connection pool, memo/caching, throttle/rate-limiting, fallback)
+  - `src/storage/` - Persistence (Workspace, LRU cache, backend abstraction file/memory)
   - `src/graph/` - Cross-document relationship graph
-  - `src/memo/` - Caching and reasoning memo
-  - `src/metrics/` - Metrics and usage tracking
+  - `src/metrics/` - Metrics collection and reporting
   - `src/events/` - Event system for progress monitoring
-  - `src/throttle/` - Rate limiting
-  - `src/utils/` - Utility functions
+  - `src/config/` - Configuration types and validation
+  - `src/error.rs` - Unified error types
+  - `src/utils/` - Utility functions (token counting, fingerprinting, validation)
   - `examples/` - Rust examples (flow, indexing, pdf, batch, etc.)
-- `python/` - Python SDK (PyO3 bindings)
+- `python/` - Python SDK (PyO3 bindings) + CLI
 - `docs/` - Docusaurus documentation site
 - `samples/` - Sample files
 
+### Retrieval Call Flow
+
+```
+Engine.query()
+  → retrieval/dispatcher
+    → query/understand() → QueryPlan (LLM intent + concepts + strategy)
+    → Orchestrator (always, single or multi-doc)
+      → analyze(QueryPlan) → dispatch plan
+      → supervisor loop:
+          dispatch Workers → evaluate() →
+          if insufficient → replan() → loop
+      → rerank/ (dedup → BM25 score → synthesis/fusion)
+```
+
 ## Build Commands
 
 ```bash

diff --git a/Cargo.toml b/Cargo.toml
@@ -3,7 +3,7 @@ members = ["rust", "python"]
 resolver = "2"
 
 [workspace.package]
-version = "0.1.29"
+version = "0.1.30"
 edition = "2024"
 authors = ["zTgx <beautifularea@gmail.com>"]
 license = "Apache-2.0"

diff --git a/README.md b/README.md
@@ -13,58 +13,15 @@
 
 </div>
 
-**Vectorless** is a reasoning-native document engine designed to be the foundational layer for AI applications that need structured access to documents, with the core written in Rust. It does not use vector databases, embeddings, or similarity search. Instead, it will reason through any of your structured documents — **PDFs, Markdown, reports, contracts** — and retrieve only what's relevant. Nothing more, nothing less.
+**Vectorless** is a reasoning-native document engine with the core written in Rust. It will reason through any of your structured documents — **PDFs, Markdown, reports, contracts** — and retrieve only what's relevant. Nothing more, nothing less.
 
+- **Reason, don't vector.** — Retrieval is guided by reasoning over document structure.
+- **Model fails, we fail.** — No silent degradation. No heuristic fallbacks.
+- **No thought, no answer.** — Only LLM-reasoned output counts as an answer.
 
 
-## How It Works
-
-<div align="center">
-  <img src="https://vectorless.dev/img/workflow.svg" alt="Vectorless Workflow" width="900">
-</div>
-
-<div align="center">
-  <img src="https://vectorless.dev/img/demo.gif" alt="Vectorless Demo" width="900">
-</div>
-
 ## Quick Start
 
-### Rust
-
-```toml
-[dependencies]
-vectorless = "0.1"
-```
-
-```rust
-use vectorless::{EngineBuilder, IndexContext, QueryContext};
-
-#[tokio::main]
-async fn main() -> vectorless::Result<()> {
-    let engine = EngineBuilder::new()
-        .with_key("sk-...")
-        .with_model("gpt-4o")
-        .with_endpoint("https://api.openai.com/v1")
-        .build()
-        .await?;
-
-    // Index a document
-    let result = engine.index(IndexContext::from_path("./report.pdf")).await?;
-    let doc_id = result.doc_id().unwrap();
-
-    // Query
-    let result = engine.query(
-        QueryContext::new("What is the total revenue?")
-            .with_doc_ids(vec![doc_id.to_string()])
-    ).await?;
-    println!("{}", result.content);
-
-    Ok(())
-}
-```
-
-### Python
-
 ```bash
 pip install vectorless
 ```
@@ -89,52 +46,6 @@ async def main():
 asyncio.run(main())
 ```
 
-## Core Concepts
-
-### Semantic Tree Index
-
-When you index a document, Vectorless builds a tree structure that mirrors the document's hierarchy:
-
-```
-Annual Report 2024
-├── Executive Summary
-│   ├── Financial Highlights
-│   └── Strategic Outlook
-├── Financial Statements
-│   ├── Revenue Analysis        ← "What is the total revenue?" lands here
-│   ├── Operating Expenses
-│   └── Net Income
-└── Risk Factors
-    ├── Market Risks
-    └── Regulatory Risks
-```
-
-Each node contains a summary generated by the LLM. During retrieval, the engine uses these summaries to reason about which path to follow — just like a human would scan a table of contents.
-
-### Cross-Document Graph
-
-When multiple documents are indexed, Vectorless builds a relationship graph connecting them through shared keywords and concepts. This enables queries across your entire document collection.
-
-```python
-# Query across all indexed documents
-result = await engine.query(
-    QueryContext("Compare revenue trends across all reports")
-)
-```
-
-### Workspace Persistence
-
-Indexed documents are stored in a workspace — there's no need to reprocess files between sessions:
-
-```python
-engine = Engine(api_key="sk-...", model="gpt-4o", endpoint="https://api.openai.com/v1")
-
-# List all indexed documents
-docs = await engine.list()
-for doc in docs:
-    print(f"{doc.name} ({doc.format}) — {doc.page_count} pages")
-```
-
 ## What It's For
 
 Vectorless is designed for applications that need **precise** document retrieval:

diff --git a/docs/docs/architecture.mdx b/docs/docs/architecture.mdx
@@ -55,29 +55,66 @@ TreeNode
 
 ## Retrieval Pipeline
 
-The retrieval pipeline consists of four phases:
+The retrieval pipeline is a supervisor loop driven entirely by LLM reasoning. Every decision — which documents to query, how to navigate, whether evidence is sufficient — is made by the model, not by heuristics.
 
-1. **Analyze** — Detect query complexity, extract keywords, decompose complex queries
-2. **Plan** — Select retrieval strategy and search algorithm
-3. **Search** — Execute tree traversal with Pilot guidance
-4. **Evaluate** — Score, deduplicate, and aggregate results
+### Principles
 
-### Pilot
+- **Reason, don't vector.** — Every retrieval decision is an LLM decision.
+- **Model fails, we fail.** — No silent degradation. No heuristic fallbacks.
+- **No thought, no answer.** — Only LLM-reasoned output counts as an answer.
 
-The Pilot is the core intelligence component. It provides LLM-guided navigation at key decision points:
+### Flow
 
-- **Fork points** — When multiple children exist, Pilot evaluates which path to follow
-- **Backtracking** — When a path yields insufficient results, Pilot suggests alternatives
-- **Binary pruning** — Quick relevance filter for nodes with many children
+```text
+Engine.query()
+  → Dispatcher
+    → Query Understanding (LLM) → QueryPlan (intent, concepts, strategy)
+    → Orchestrator (always — single or multi-doc)
+      → Analyze (LLM selects documents + tasks)
+      → Supervisor Loop:
+          Dispatch Workers → Evaluate (LLM sufficiency check)
+          → if insufficient → Replan (LLM) → loop
+      → Rerank (dedup → BM25 score → synthesis/fusion)
+```
+
+### Query Understanding
+
+Every query first passes through LLM-based understanding:
+
+| Field | Description |
+|-------|-------------|
+| **Intent** | Factual, Analytical, Navigational, or Summary |
+| **Complexity** | Simple, Moderate, or Complex |
+| **Key Concepts** | LLM-extracted concepts (distinct from keywords) |
+| **Strategy Hint** | focused, exploratory, comparative, or summary |
+
+### Orchestrator (Supervisor)
+
+The Orchestrator is the central coordinator. It always runs — even for single-document queries. Its supervisor loop:
+
+1. **Analyze** — LLM reviews DocCards and selects relevant documents with specific tasks
+2. **Dispatch** — Fan-out Workers in parallel (one per document)
+3. **Evaluate** — LLM checks if collected evidence is sufficient to answer the query
+4. **Replan** (if insufficient) — LLM identifies missing information and dispatches additional Workers
+
+### Worker (Evidence Collector)
+
+Each Worker navigates a single document's tree to collect evidence:
+
+1. **Bird's-eye** — `ls` the root for an overview
+2. **Plan** — LLM generates a navigation plan
+3. **Navigate** — Loop: LLM → command → execute → repeat (with budget)
+4. **Return** — Collected evidence only — no answer synthesis
+
+Workers use tree commands (`ls`, `cd`, `cat`, `grep`, `find`, `findtree`) and a `check` command for self-evaluation.
+
+### Rerank Pipeline
 
-### Search Algorithms
+After all Workers complete, the Orchestrator runs the final pipeline:
 
-| Algorithm | Description | Use Case |
-|-----------|-------------|----------|
-| **Beam Search** | Explores multiple paths with backtracking | General purpose (recommended) |
-| **MCTS** | Monte Carlo Tree Search with UCT selection | Complex multi-hop queries |
-| **Pure Pilot** | Greedy single-path, Pilot at every level | High-accuracy, higher token cost |
-| **ToC Navigator** | Table-of-contents based location | Broad queries ("what is this about?") |
+1. **Dedup** — Remove duplicate and low-quality evidence
+2. **BM25 Scoring** — Rank evidence by keyword relevance
+3. **Answer Generation** — LLM synthesizes or fuses evidence into a final answer
 
 ## Cross-Document Graph