100% local and private document search with AI.
PrivateSearch lets you search and chat with your personal documents using artificial intelligence, without ever sending data over the network. Everything is processed locally on your machine.
You can use a remote NVIDIA DGX server as your LLM Vision provider through Ollama!
- Offload heavy multimodal (Vision+Text) LLM tasks to a powerful remote DGX, while keeping your search and UI local.
- Just set the Ollama endpoint to your DGX's Ollama server (e.g.
http://your-dgx-ip:11434) in the Config tab. - Works for both chat and OCR (Vision) tasks, including Qwen3.5 multimodal models.
- All document indexing and search logic remains local and private.
Example:
- Run Ollama with Qwen3.5 on your DGX (with vision support)
- Set the Ollama URL in PrivateSearch to your DGX's address
- Enjoy local search with remote LLM Vision power!
This enables high-performance document understanding and OCR even on lightweight local machines, leveraging enterprise-grade hardware only for LLM inference.
- π Hybrid search β semantic search (ChromaDB + bge-m3) combined with keyword search (BM25), fused via Reciprocal Rank Fusion (RRF)
- π€ Chat with your documents β ask questions in natural language, get cited answers
- π Multi-format support β PDF, DOCX, TXT, CSV, images (JPG, PNG, TIFFβ¦)
- ποΈ Intelligent OCR β extracts text from images and scanned PDFs via multimodal LLM (handwriting-aware)
- π§ Unified model family β Qwen3.5 for everything: chat, OCR and reasoning, with 3 size profiles
- π Incremental updates β automatically detects new, modified, or deleted files
- π₯οΈ Cross-platform β works on Linux and Windows (Docker + Ollama)
- π‘οΈ 100% offline β no internet connection required after initial setup
| Component | Minimum | Recommended |
|---|---|---|
| OS | Linux or Windows 10/11 | Linux (best GPU performance) |
| Docker | Docker Desktop installed and running | |
| Ollama | Installed and running (ollama.com) | |
| NVIDIA GPU | 1Γ GPU with 6 GB VRAM (Fast profile) | 2Γ GPUs with 12 GB VRAM each |
| RAM | 16 GB | 32 GB |
| Disk space | ~25 GB for AI models |
PrivateSearch uses the Qwen3.5 family as a unified model for all operations:
- Chat & RAG β answers questions based on indexed documents
- Multimodal OCR β reads images and scanned PDFs, including handwritten text
- Reasoning β understands complex queries and aggregates information from multiple sources
This "one model for everything" approach brings real advantages:
| Advantage | Detail |
|---|---|
| Less GPU swapping | Ollama loads only 2 models (bge-m3 + qwen3.5) instead of 3 separate ones |
| Consistent OCR + Chat | The same model that reads the document answers your questions |
| Scalable profiles | Same model in 3 sizes: 4B, 9B, 27B β choose based on your GPU |
| Multilingual | Qwen3.5 excels in English, Italian, and many other languages |
| GPU Configuration | Recommended profile |
|---|---|
| 1Γ GPU 6β8 GB (e.g. RTX 3060 8GB) | β‘ Fast (qwen3.5:4b) β quick answers, good quality |
| 1Γ GPU 12 GB (e.g. RTX 3060 12GB) | π― Precise (qwen3.5:9b) β accurate and detailed answers |
| 2Γ GPU 12 GB (e.g. 2Γ RTX 3060) | π Maximum (qwen3.5:27b) β highest quality, model is distributed across both GPUs |
| 1Γ GPU 24 GB (e.g. RTX 3090/4090) | π Maximum (qwen3.5:27b) β top performance with a single GPU |
# Docker Desktop
# Download from: https://www.docker.com/products/docker-desktop/
# Ollama
curl -fsSL https://ollama.com/install.sh | shLinux:
./start.shWindows:
Double-click start.bat
The launcher script will:
- Check that Docker and Ollama are running
- Configure Docker so the container can reach Ollama on the host
- Build and start the container
- Open the browser at http://localhost:7860
If the browser does not open automatically, visit:
http://localhost:7860
- βοΈ Setup β Click "Check system", then "Download models" (~25 GB, first time only)
- π Documents β Enter the path to your documents folder (Linux or Windows format) and click "Check", then "Index documents"
- π¬ Chat β Start asking questions!
All profiles use Qwen3.5 β same model, different sizes:
| Profile | Required GPU | Model (Chat + OCR) | Size | Use case |
|---|---|---|---|---|
| β‘ Fast | 4β6 GB VRAM | qwen3.5:4b |
~3.4 GB | Quick answers, good quality |
| π― Precise | 8β10 GB VRAM | qwen3.5:9b |
~6.6 GB | Accurate and detailed answers |
| π Maximum | 20+ GB VRAM | qwen3.5:27b |
~17 GB | One model for everything β highest quality |
With the Maximum profile, the same
qwen3.5:27bhandles both OCR and Chat. Ollama loads only 2 models in VRAM (bge-m3 + qwen3.5:27b), eliminating swapping and greatly improving speed.
| Model | Role | Size | Notes |
|---|---|---|---|
bge-m3 |
Embeddings (semantic search) | ~2 GB | Always loaded, 1024 dimensions |
qwen3.5:4b |
Chat + OCR β Fast profile | ~3.4 GB | Multimodal (text + images) |
qwen3.5:9b |
Chat + OCR β Precise profile | ~6.6 GB | Multimodal (text + images) |
qwen3.5:27b |
Chat + OCR β Maximum profile | ~17 GB | Multimodal, can be distributed across multiple GPUs |
| Category | Extensions |
|---|---|
| Text | .txt, .md, .csv, .json, .xml, .html, .log |
| Documents | .pdf, .docx, .doc, .odt, .rtf |
| Images (OCR) | .jpg, .jpeg, .png, .tiff, .tif, .bmp, .webp |
βββββββββββββββββββββββββββββββββββββββββββ
β Your machine β
β β
β βββββββββββββββ βββββββββββββββββββ β
β β Ollama β β Docker β β
β β (on host) β β Container β β
β β β β β β
β β β’ qwen3.5 βββββ β’ Gradio UI β β
β β (Chat+OCR) β β β’ ChromaDB β β
β β β’ bge-m3 β β β’ BM25 Index β β
β β (embeddings)β β β’ RAG Engine β β
β β β β β’ RRF Fusion β β
β β GPU βββββββββ β (no GPU needed) β β
β βββββββββββββββ βββββββββββββββββββ β
β β
β π Your documents (read-only access) β
βββββββββββββββββββββββββββββββββββββββββββ
User query
β
βββΊ Semantic Search (ChromaDB + bge-m3) ββ
β βββΊ RRF Fusion ββΊ Top-K chunks ββΊ LLM (qwen3.5)
βββΊ Keyword Search (BM25) ββ
- Ollama runs on the host and manages GPU access for OCR, embeddings, and chat LLM
- The Docker container is lightweight (~500 MB), requires no GPU, and is cross-platform (Linux/Windows)
- Your documents are mounted read-only β the app never modifies your files
- Hybrid search: semantic (ChromaDB + bge-m3) + keyword (BM25) with RRF fusion (k=60)
- Map-Reduce: for aggregation queries ("list allβ¦"), iterates all files in batches of 5
OCR uses the same Qwen3.5 model as the active profile, with:
- Image pre-processing: grayscale conversion β contrast (1.6Γ) β sharpening
- PDF at 300 DPI: high-res rendering to capture handwritten text
- Italian OCR prompt: optimized for government forms, tax codes, dates, IBANs
- Persistent cache: each page is processed only once and saved to disk
PrivateSearch works on both operating systems and supports remote LLM providers:
- Document paths: you can enter either
/home/mario/Documents(Linux) orC:\Users\mario\Documents(Windows) β normalization is automatic - Docker networking: uses bridge network +
host.docker.internalto reach Ollama on the host, compatible with Docker Desktop (Windows) and Docker Engine (Linux) - Remote Ollama / DGX: you can configure a remote Ollama server (e.g.
http://192.168.1.100:11434or your DGX address) from the Config tab. This allows you to use a remote NVIDIA DGX as your LLM Vision provider for chat and OCR.
Windows note: Ollama must listen on
0.0.0.0(not just127.0.0.1) to be reachable from the Docker container.
# Linux
docker compose -f docker/docker-compose.yml down
# Windows
docker compose -f docker\docker-compose.yml down# View app logs
docker logs privatesearch -f
# Rebuild after code changes
docker compose -f docker/docker-compose.yml up -d --builddocker compose -f docker/docker-compose.yml down -v
## π Privacy
- **Zero telemetry** β no data collected
- **Zero outbound connections** β everything processed locally
- **Zero cloud** β no external services
- Your documents are mounted read-only inside the container
- Source code is fully auditable and transparent
---
*PrivateSearch v1.0 β I tuoi dati, il tuo dispositivo, la tua privacy.*