Repository: github.com/LuisMRaimundo/Text-seeker
Multi-format boolean full-text search for local documents (PDF, DOCX, HTML, TXT, Markdown, Excel, CSV, images via OCR). Runs offline on your machine; indexes and caches live under your home directory.
Supported formats: TXT, PDF, DOCX, HTML, Markdown, Excel (.xlsx/.xls), CSV, common image formats (OCR).
See installers/README.md:
| Platform | Launcher |
|---|---|
| Windows 10/11 | Double-click installers\windows\Install and Run.bat |
| macOS | Double-click installers/macos/Install and Run.command (after chmod +x) |
| Linux | ./installers/linux/install-and-run.sh |
First run downloads a private Python and libraries (~200–400 MB). No system Python required.
pip install -r requirements.txt
python app.py --guiOr: start_gui.bat (Windows, if Python is on PATH).
run_tests.batOr: python -m unittest discover -s tests -v
Continuous integration runs the same test suite on push (see .github/workflows/test.yml).
| Path | Role |
|---|---|
app.py, main.py |
CLI orchestrator and Tkinter GUI |
boolean_parser.py, nlp_utils.py |
Query parsing, stemming, tokenization |
indexing.py, text_extract.py |
Inverted index and full-document extraction |
search_*.py, html_search.py, text_search.py |
Per-format search |
installers/ |
One-click setup (private Python on first run) |
tests/ |
Unit and integration tests |
| File | Contents |
|---|---|
| README_STARTING.md | Launch, optional Tesseract & Poppler |
| QUICK_GUIDE.md | Boolean query syntax |
| TECHNICAL_MANUAL.md | Architecture |
| Purpose | Path |
|---|---|
| Search index | ~/.text-seeker_index/ |
| PDF/OCR cache | ~/.text-seeker_cache/ |
Copyright © 2026 Luís Raimundo. All rights reserved.
This repository and its contents are proprietary research material. No open-source licence is granted. No permission to copy, redistribute, modify, publish, or derive works without prior written permission from the copyright holder.
Contact: lmr.2020@outlook.pt
This project was developed by Luís Raimundo with the support and funding of the Fundação para a Ciência e a Tecnologia (FCT) and Universidade NOVA de Lisboa.
Funding DOI: https://doi.org/10.54499/2020.08817.BD
The author also gratefully acknowledges Isabel Pires for her support throughout the development of this work.