Skip to content

andrewccadman/rag-python-react-3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

Python + React PDF RAG framework

This project is a minimal full-stack RAG app with recursive PDF ingestion, a React chatbot UI, OpenAI embeddings using text-embedding-ada-002, GPT-4o answers, clickable source citations, and an in-memory vector store.

Features

  1. Reads PDFs recursively from backend/data/pdfs and all subfolders.
  2. Provides a React chatbot built with Vite.
  3. Exposes FastAPI endpoints that call OpenAI embeddings and chat models by API key.
  4. Stores vector embeddings in memory using a small NumPy cosine-similarity vector database.
  5. Turns model citations like [1] into clickable links to the cited source PDF and page.

Run backend

cd backend
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
export OPENAI_API_KEY="sk-your-key-here"
python run.py

The API runs at http://localhost:8000.

Run frontend

cd frontend
npm install
npm run dev

The UI runs at http://localhost:5173.

Add PDFs

Drop PDFs into backend/data/pdfs, including nested folders like backend/data/pdfs/policies or backend/data/pdfs/contracts/client-a. Then click Re-index PDFs in the UI or call:

curl -X POST "http://localhost:8000/ingest?reset=true"

Main API endpoints

  • GET /health checks API and in-memory vector count.
  • GET /pdfs lists PDFs found recursively.
  • GET /source?path=folder/file.pdf serves a PDF source file. Browser page anchors are added as #page=3.
  • POST /upload uploads a PDF into data/pdfs/uploads.
  • POST /ingest?reset=true extracts, chunks, embeds, and stores vectors in memory.
  • POST /chat retrieves relevant chunks and answers with GPT-4o.

How citation links work

The backend stores metadata.source_url for every chunk, for example /source?path=policies/handbook.pdf#page=7. The prompt asks GPT-4o to cite using bracket numbers like [1]. The React app detects [1], [2], etc. in assistant messages and converts them into links using the matching item from the returned sources array.

Important note

Because this uses an in-memory vector store, embeddings disappear when the backend restarts. That matches the requested requirement. For production, swap InMemoryVectorStore for Qdrant, Postgres pgvector, Pinecone, Weaviate, Chroma, or Redis.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors