Redleaf is a private, local-first knowledge engine. It transforms a directory of documents (PDFs, text, emails, HTML, and transcripts) into a searchable, interconnected knowledge graph, complete with an integrated AI assistant—all running entirely on your local machine.
Built for researchers, archivists, and knowledge workers, Redleaf makes it easy to find meaningful connections, synthesize findings, and chat with your documents while protecting your privacy.
Redleaf Engine 2.0 Blog Post: https://nathanielwestveer.com/posts/redleaf-engine-update/
Redleaf Engine 2.5 Blog Post: https://nathanielwestveer.com/posts/redleaf-engine-2-5/
- Why Redleaf?
- Key Features
- Workflows: Building Your Knowledge Base
- Distributing Knowledge: The Precomputed Model
- Technology Stack
- Getting Started
- Advanced Features
- Management Scripts
- License
- About the Developer
Modern researchers often face hundreds of scattered PDFs, transcripts, and notes. Recalling where a piece of information came from is difficult, and time is wasted re-reading documents instead of making new connections.
Redleaf solves this by creating a searchable, structured knowledge graph on your own computer. It’s local-first, privacy-respecting, and designed to let you focus on analysis rather than file management.
- 🧠 AI Semantic Assistant – Chat with your documents using local LLMs via Ollama.
- ⚙️ Dual Processing Pipelines – Simple dashboard workflow or high-throughput DuckDB batch pipeline.
- 📦 Precomputed & Distributable – Package and share the entire finished knowledge base.
- 📄 Multi-Format Indexing –
.pdf,.html,.txt,.srt,.eml. - ✍️ Synthesis Environment – Dual-pane writing studio with automatic citations.
- 🎙️ Transcript & Media Sync – Auto-scrolls
.srttranscripts while playing media. - ☁️ Cloud Media Linking – Seamlessly connect transcripts to media on Archive.org.
- 🔍 Hybrid Search – FTS5 + semantic search + embedding vectors.
- 🔎 Advanced Search & Entity Intersection – Combine full-text keywords, collections, document types, custom tags, and required NLP entities.
- 🕸️ Knowledge Graph – Automatic entity + relationship extraction using spaCy.
- 📈 Entity Profiling & Boosting – View full entity profiles and manually boost relationships to curate the graph.
- ⚡ GPU Acceleration Optional – CUDA for embeddings & NLP.
- 👥 Multi-User Support – Admin/User roles + invitation flow.
Redleaf offers two workflows:
- Add Files – Put documents into
documents/ - Discover – Click "1. Discover Docs"
- Process – Click "2. Process All 'New'"
A multi-process job queue keeps the UI responsive.
For very large datasets or full rebuilds:
python curator_reset.py
python curator_cli.py process-docs run-all
python curator_cli.py bake-sqliteSee: DUCKDB_PIPELINE_GUIDE.md
A Redleaf Precomputed Knowledge Base is a ready-to-explore, fully processed dataset.
Curator:
-
Processes documents
-
Runs:
python bulk_manage.py export-precomputed-state
-
Publishes the package
Explorer:
-
Clones the repository
-
Runs:
python run.py
-
Redleaf auto-builds their local copy
| Layer | Technology |
|---|---|
| Backend | Python (Flask) |
| Web DB | SQLite + FTS5 |
| Pipeline DB | DuckDB |
| AI / NLP | Ollama, spaCy (en_core_web_lg) |
| Parsing | PyMuPDF, BeautifulSoup4 |
| Frontend | HTML5, CSS, JS, Tiptap |
| Async | ProcessPoolExecutor |
- Python 3.9+
- Conda recommended
- Ollama installed & running
ollama pull gemma3:12b
ollama pull embeddinggemma:latestgit clone https://github.com/nathanfx330/redleaf.git
cd redleaf
conda env create -f environment.yml
conda activate redleaf-env
python -m spacy download en_core_web_lgClick to expand
python3 -m venv venv
source venv/bin/activate # Linux/macOS
.\venv\Scripts\activate # Windows
pip install -r requirements.txt
python -m spacy download en_core_web_lgpython run.pyThen go to:
-
Semantic Assistant:
python semantic_assistant.py
-
Collaborative Annotations: Users export contributions to the curator.
-
Synthesis Environment: Dual-pane writing + automatic bibliography.
-
GPU Acceleration:
pip install cupy-cuda12x
Then enable via Settings → System & Processing
manage.py– User adminbulk_manage.py– System-wide toolscurator_cli.py– DuckDB pipeline entrypoint
MIT License
Copyright (c) 2026 Nathaniel Westveer
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Created by Nathaniel Westveer. Free to use, distribute, and modify.

