A full-stack application that allows users to upload PDF documents and ask natural language questions about their content. It utilizes a Hybrid Retrieval-Augmented Generation (RAG) approach, combining both dense semantic search and sparse lexical (keyword) search to provide highly accurate answers based strictly on the uploaded document.
- PDF Upload & Processing: Seamlessly ingests, chunks, and locally embeds PDF content.
- Hybrid RAG Engine: Uses Hugging Face embeddings (dense) + BM25 (sparse) to fetch the most relevant context.
- Local Vector Storage: Uses ChromaDB to persist data locally without relying on paid cloud databases.
- OpenAI Integration: Synthesizes final answers securely using
gpt-4o-mini(or your chosen model). - Modern Frontend: Built with React for an intuitive chat interface.
- Backend: Python, Flask, LangChain, ChromaDB, HuggingFace Transformers
- Frontend: React.js, TailwindCSS (or Vanilla CSS)
- LLM Provider: OpenAI
Navigate to the root directory and create a .env file with your API key:
OPENAI_API_KEY=your_openai_api_key_hereInstall dependencies:
python -m venv .venv
source .venv/bin/activate # (On Windows use .venv\Scripts\Activate.ps1)
pip install flask flask-cors python-dotenv langchain langchain-openai langchain-community langchain-text-splitters chromadb sentence-transformers pypdf langchain-huggingfaceRun the API Server:
python app.py(The server will start on http://127.0.0.1:5000)
Open a new terminal tab and navigate into the deeply nested frontend folder:
cd frontend/frontendInstall Node dependencies and start the React server:
npm install
npm start(The web application will open on http://localhost:3000)
- Open the React web interface in your browser.
- Select a PDF file from your computer using the Upload button.
- Wait for the success message to verify it has been locally embedded & saved effectively.
- Start asking questions directly related to the document's content!