Skip to content

lordpba/PrivateSearch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

15 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ”’ PrivateSearch

100% local and private document search with AI.

PrivateSearch lets you search and chat with your personal documents using artificial intelligence, without ever sending data over the network. Everything is processed locally on your machine.


πŸ–₯οΈπŸ’‘ Use a Remote DGX as LLM Vision Provider (via Ollama)

You can use a remote NVIDIA DGX server as your LLM Vision provider through Ollama!

  • Offload heavy multimodal (Vision+Text) LLM tasks to a powerful remote DGX, while keeping your search and UI local.
  • Just set the Ollama endpoint to your DGX's Ollama server (e.g. http://your-dgx-ip:11434) in the Config tab.
  • Works for both chat and OCR (Vision) tasks, including Qwen3.5 multimodal models.
  • All document indexing and search logic remains local and private.

Example:

  • Run Ollama with Qwen3.5 on your DGX (with vision support)
  • Set the Ollama URL in PrivateSearch to your DGX's address
  • Enjoy local search with remote LLM Vision power!

This enables high-performance document understanding and OCR even on lightweight local machines, leveraging enterprise-grade hardware only for LLM inference.


✨ Features

  • πŸ” Hybrid search β€” semantic search (ChromaDB + bge-m3) combined with keyword search (BM25), fused via Reciprocal Rank Fusion (RRF)
  • πŸ€– Chat with your documents β€” ask questions in natural language, get cited answers
  • πŸ“„ Multi-format support β€” PDF, DOCX, TXT, CSV, images (JPG, PNG, TIFF…)
  • πŸ‘οΈ Intelligent OCR β€” extracts text from images and scanned PDFs via multimodal LLM (handwriting-aware)
  • 🧠 Unified model family β€” Qwen3.5 for everything: chat, OCR and reasoning, with 3 size profiles
  • πŸ”„ Incremental updates β€” automatically detects new, modified, or deleted files
  • πŸ–₯️ Cross-platform β€” works on Linux and Windows (Docker + Ollama)
  • πŸ›‘οΈ 100% offline β€” no internet connection required after initial setup

πŸ“‹ Requirements

Component Minimum Recommended
OS Linux or Windows 10/11 Linux (best GPU performance)
Docker Docker Desktop installed and running
Ollama Installed and running (ollama.com)
NVIDIA GPU 1Γ— GPU with 6 GB VRAM (Fast profile) 2Γ— GPUs with 12 GB VRAM each
RAM 16 GB 32 GB
Disk space ~25 GB for AI models

🧠 Why Qwen3.5 (Unified Model)

PrivateSearch uses the Qwen3.5 family as a unified model for all operations:

  • Chat & RAG β€” answers questions based on indexed documents
  • Multimodal OCR β€” reads images and scanned PDFs, including handwritten text
  • Reasoning β€” understands complex queries and aggregates information from multiple sources

This "one model for everything" approach brings real advantages:

Advantage Detail
Less GPU swapping Ollama loads only 2 models (bge-m3 + qwen3.5) instead of 3 separate ones
Consistent OCR + Chat The same model that reads the document answers your questions
Scalable profiles Same model in 3 sizes: 4B, 9B, 27B β€” choose based on your GPU
Multilingual Qwen3.5 excels in English, Italian, and many other languages

⚠️ GPU Notes

GPU Configuration Recommended profile
1Γ— GPU 6–8 GB (e.g. RTX 3060 8GB) ⚑ Fast (qwen3.5:4b) β€” quick answers, good quality
1Γ— GPU 12 GB (e.g. RTX 3060 12GB) 🎯 Precise (qwen3.5:9b) β€” accurate and detailed answers
2Γ— GPU 12 GB (e.g. 2Γ— RTX 3060) πŸš€ Maximum (qwen3.5:27b) β€” highest quality, model is distributed across both GPUs
1Γ— GPU 24 GB (e.g. RTX 3090/4090) πŸš€ Maximum (qwen3.5:27b) β€” top performance with a single GPU

πŸš€ Quick Start

1. Install prerequisites

# Docker Desktop
# Download from: https://www.docker.com/products/docker-desktop/

# Ollama
curl -fsSL https://ollama.com/install.sh | sh

2. Launch PrivateSearch

Linux:

./start.sh

Windows:

Double-click start.bat

The launcher script will:

  1. Check that Docker and Ollama are running
  2. Configure Docker so the container can reach Ollama on the host
  3. Build and start the container
  4. Open the browser at http://localhost:7860

If the browser does not open automatically, visit:

http://localhost:7860

3. In-app setup

  1. βš™οΈ Setup β€” Click "Check system", then "Download models" (~25 GB, first time only)
  2. πŸ“‚ Documents β€” Enter the path to your documents folder (Linux or Windows format) and click "Check", then "Index documents"
  3. πŸ’¬ Chat β€” Start asking questions!

🎯 Profiles

All profiles use Qwen3.5 β€” same model, different sizes:

Profile Required GPU Model (Chat + OCR) Size Use case
⚑ Fast 4–6 GB VRAM qwen3.5:4b ~3.4 GB Quick answers, good quality
🎯 Precise 8–10 GB VRAM qwen3.5:9b ~6.6 GB Accurate and detailed answers
πŸš€ Maximum 20+ GB VRAM qwen3.5:27b ~17 GB One model for everything β€” highest quality

With the Maximum profile, the same qwen3.5:27b handles both OCR and Chat. Ollama loads only 2 models in VRAM (bge-m3 + qwen3.5:27b), eliminating swapping and greatly improving speed.

Models used

Model Role Size Notes
bge-m3 Embeddings (semantic search) ~2 GB Always loaded, 1024 dimensions
qwen3.5:4b Chat + OCR β€” Fast profile ~3.4 GB Multimodal (text + images)
qwen3.5:9b Chat + OCR β€” Precise profile ~6.6 GB Multimodal (text + images)
qwen3.5:27b Chat + OCR β€” Maximum profile ~17 GB Multimodal, can be distributed across multiple GPUs

πŸ“ Supported Formats

Category Extensions
Text .txt, .md, .csv, .json, .xml, .html, .log
Documents .pdf, .docx, .doc, .odt, .rtf
Images (OCR) .jpg, .jpeg, .png, .tiff, .tif, .bmp, .webp

πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           Your machine                  β”‚
β”‚                                         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
β”‚  β”‚   Ollama     β”‚   β”‚   Docker        β”‚  β”‚
β”‚  β”‚  (on host)   β”‚   β”‚  Container      β”‚  β”‚
β”‚  β”‚              β”‚   β”‚                 β”‚  β”‚
β”‚  β”‚ β€’ qwen3.5    │◄──│ β€’ Gradio UI     β”‚  β”‚
β”‚  β”‚  (Chat+OCR)  β”‚   β”‚ β€’ ChromaDB      β”‚  β”‚
β”‚  β”‚ β€’ bge-m3     β”‚   β”‚ β€’ BM25 Index    β”‚  β”‚
β”‚  β”‚  (embeddings)β”‚   β”‚ β€’ RAG Engine    β”‚  β”‚
β”‚  β”‚              β”‚   β”‚ β€’ RRF Fusion    β”‚  β”‚
β”‚  β”‚  GPU ←───────│   β”‚ (no GPU needed) β”‚  β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β”‚
β”‚                                         β”‚
β”‚  πŸ“‚ Your documents (read-only access)    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Search pipeline

User query
    β”‚
    β”œβ”€β–Ί Semantic Search (ChromaDB + bge-m3)  ─┐
    β”‚                                          β”œβ”€β–Ί RRF Fusion ─► Top-K chunks ─► LLM (qwen3.5)
    └─► Keyword Search (BM25)                β”€β”˜
  • Ollama runs on the host and manages GPU access for OCR, embeddings, and chat LLM
  • The Docker container is lightweight (~500 MB), requires no GPU, and is cross-platform (Linux/Windows)
  • Your documents are mounted read-only β€” the app never modifies your files
  • Hybrid search: semantic (ChromaDB + bge-m3) + keyword (BM25) with RRF fusion (k=60)
  • Map-Reduce: for aggregation queries ("list all…"), iterates all files in batches of 5

Smart OCR

OCR uses the same Qwen3.5 model as the active profile, with:

  • Image pre-processing: grayscale conversion β†’ contrast (1.6Γ—) β†’ sharpening
  • PDF at 300 DPI: high-res rendering to capture handwritten text
  • Italian OCR prompt: optimized for government forms, tax codes, dates, IBANs
  • Persistent cache: each page is processed only once and saved to disk

🌐 Cross-platform: Linux + Windows + Remote DGX

PrivateSearch works on both operating systems and supports remote LLM providers:

  • Document paths: you can enter either /home/mario/Documents (Linux) or C:\Users\mario\Documents (Windows) β€” normalization is automatic
  • Docker networking: uses bridge network + host.docker.internal to reach Ollama on the host, compatible with Docker Desktop (Windows) and Docker Engine (Linux)
  • Remote Ollama / DGX: you can configure a remote Ollama server (e.g. http://192.168.1.100:11434 or your DGX address) from the Config tab. This allows you to use a remote NVIDIA DGX as your LLM Vision provider for chat and OCR.

Windows note: Ollama must listen on 0.0.0.0 (not just 127.0.0.1) to be reachable from the Docker container.

πŸ›‘ Stop

# Linux
docker compose -f docker/docker-compose.yml down

# Windows
docker compose -f docker\docker-compose.yml down

πŸ”§ Useful Commands

# View app logs
docker logs privatesearch -f

# Rebuild after code changes
docker compose -f docker/docker-compose.yml up -d --build

Delete all data (index, cache, config)

docker compose -f docker/docker-compose.yml down -v


## πŸ”’ Privacy

- **Zero telemetry** β€” no data collected
- **Zero outbound connections** β€” everything processed locally
- **Zero cloud** β€” no external services
- Your documents are mounted read-only inside the container
- Source code is fully auditable and transparent

---

*PrivateSearch v1.0 β€” I tuoi dati, il tuo dispositivo, la tua privacy.*

About

PrivateSearch lets you search and chat with your personal documents using artificial intelligence, without ever sending data over the network. Everything is processed locally on your machine.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors