Skip to content

IITJ-CLARITY-Lab/CLARITY-Convertor

Repository files navigation

CLARITY Convertor

CLARITY Convertor is a PDF-to-structured-data tool with a React frontend and a FastAPI backend. Users can upload PDFs, convert them into page-by-page structured output, review past conversions, and manage access via role-based authentication.

Features

  • Authenticated dashboard with persistent sessions and dark/light theme toggle
  • Secure PDF upload with streaming writes and per-page LaTeX conversion via Ollama
  • Conversion history with per-page structured output and deletion controls
  • Admin management for activating users and adjusting roles
  • Ollama model discovery integration for configurable conversion models

Quick Start with Docker

  1. Install Docker and Docker Compose on your machine.
  2. From the project root, build and start all services:
    docker-compose up -d --build
  3. Access the web UI at http://localhost:3000 and the API at http://localhost:8000/docs.
  4. Stop the stack when finished:
    docker-compose down

Uploads and database data persist in the project directory. To remove them completely, delete the backend/uploads/ folder and the named Docker volume postgres_data. The conversion stage now renders each PDF page to PNG using Poppler (pdf2image) before sending it to Ollama, so ensure Poppler is installed when running outside Docker.

Local Development (without Docker)

  • Backend
    cd backend
    python -m venv .venv
    source .venv/bin/activate
    pip install -r requirements.txt
    # Poppler dependency for pdf2image (Debian/Ubuntu)
    sudo apt install poppler-utils
    uvicorn app.main:app --reload
  • Frontend
    cd frontend
    npm install
    npm start

Update the API base URLs in the frontend as needed when running on non-default hosts.

Environment Highlights

  • OLLAMA_API_URL — base URL for the Ollama service (http://host.docker.internal:11434 inside Docker).
  • OLLAMA_TIMEOUT_SECONDS — per-request timeout (seconds). Set to 0 or a negative number to disable the limit for long-running conversions.
  • REACT_APP_API_BASE_URL — frontend build-time override (defaults to /api so nginx proxies to the backend).

Future Work

See future_improvements.md for a roadmap toward asynchronous conversions, richer progress feedback, and operational tooling.

Documentation

  • documentation.md — in-depth architecture and API details
  • server.md — deployment runbook for remote servers

For contributor guidelines and troubleshooting tips, see documentation.md.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors