Find the exact page that answers your question.
A lightweight desktop app for students and researchers to search PDF folders using natural language.
- Hybrid search: BM25 keyword retrieval + semantic reranking (FastEmbed)
- Two index modes: Fast (quick startup) and Deep (precomputed embeddings)
- Chunked indexing: better precision while keeping page numbers
- Optional OCR: for scanned PDFs or image-only pages
- Multilingual search: cross-lingual matching with the multilingual model
- Open PDF at page: jump directly to the relevant page
- Model manager: download/delete models and choose fusion method
Download the latest release from Releases and run Locus.exe.
# Install dependencies
pip install -r requirements.txt
# Run
python gui.py- Click Browse and select a folder containing PDFs.
- Click Load Index (or Rebuild Index when files/models change).
- Choose index mode:
- Fast Index: faster startup, good for small collections
- Deep Index: slower startup, best recall
- Type a query and press Search.
- Double-click a result to open the PDF at the correct page.
- Balanced / High / Best control embedding model size and accuracy.
- Multilingual enables cross-lingual search.
Tip: Use the Manage Models window to download/delete models.
- OCR is off by default and can be enabled in the OCR selector.
- Fast mode: OCR only for image-heavy pages with little text.
- Deep mode: OCR for all pages that contain images.
OCR results are cached to speed up later runs.
Choose in Manage Models:
- RRF (Rank Fusion) (default)
- Percentile Blend
Caches are stored outside your PDF folder:
- Index cache:
- Windows:
%LOCALAPPDATA%\Locus\index_cache - macOS:
~/Library/Caches/Locus/index_cache - Linux:
~/.cache/Locus/index_cache
- Windows:
- OCR cache:
- Windows:
%LOCALAPPDATA%\Locus\ocr_cache - macOS:
~/Library/Caches/Locus/ocr_cache - Linux:
~/.cache/Locus/ocr_cache
- Windows:
- Model cache:
- Windows:
%LOCALAPPDATA%\Locusastembed_cache - macOS:
~/Library/Caches/Locus/fastembed_cache - Linux:
~/.cache/Locus/fastembed_cache
- Windows:
Use Manage Models to clear index or OCR cache.
- Python 3.8+
- PDF viewer with page navigation support (SumatraPDF recommended on Windows)
Dependencies:
PyMuPDF
rank-bm25
fastembed
numpy
customtkinter
rapidocr-onnxruntime
Why is indexing slow?
Deep mode precomputes embeddings and OCR can be expensive. Use Fast mode or lower OCR quality.
Why don't I see a score in RRF mode?
RRF is rank-based; numeric scores are hidden by design.
MIT - free for personal and educational use.
