CLARITY Convertor is a PDF-to-structured-data tool with a React frontend and a FastAPI backend. Users can upload PDFs, convert them into page-by-page structured output, review past conversions, and manage access via role-based authentication.
- Authenticated dashboard with persistent sessions and dark/light theme toggle
- Secure PDF upload with streaming writes and per-page LaTeX conversion via Ollama
- Conversion history with per-page structured output and deletion controls
- Admin management for activating users and adjusting roles
- Ollama model discovery integration for configurable conversion models
- Install Docker and Docker Compose on your machine.
- From the project root, build and start all services:
docker-compose up -d --build
- Access the web UI at http://localhost:3000 and the API at http://localhost:8000/docs.
- Stop the stack when finished:
docker-compose down
Uploads and database data persist in the project directory. To remove them completely, delete the backend/uploads/ folder and the named Docker volume postgres_data. The conversion stage now renders each PDF page to PNG using Poppler (pdf2image) before sending it to Ollama, so ensure Poppler is installed when running outside Docker.
- Backend
cd backend python -m venv .venv source .venv/bin/activate pip install -r requirements.txt # Poppler dependency for pdf2image (Debian/Ubuntu) sudo apt install poppler-utils uvicorn app.main:app --reload
- Frontend
cd frontend npm install npm start
Update the API base URLs in the frontend as needed when running on non-default hosts.
OLLAMA_API_URL— base URL for the Ollama service (http://host.docker.internal:11434inside Docker).OLLAMA_TIMEOUT_SECONDS— per-request timeout (seconds). Set to0or a negative number to disable the limit for long-running conversions.REACT_APP_API_BASE_URL— frontend build-time override (defaults to/apiso nginx proxies to the backend).
See future_improvements.md for a roadmap toward asynchronous conversions, richer progress feedback, and operational tooling.
documentation.md— in-depth architecture and API detailsserver.md— deployment runbook for remote servers
For contributor guidelines and troubleshooting tips, see documentation.md.