WebDoc AI is a full-stack, enterprise-grade Retrieval-Augmented Generation (RAG) application. It allows users to upload various documents (PDFs, Word documents, text files) or provide web URLs, and instantly chat with an intelligent assistant to extract insights, summarize information, and answer complex questions based strictly on the provided context.
The application is built using a modern three-tier architecture to ensure security, modularity, and high performance.
- Tech: React 18, Vite, CSS (Glassmorphism UI), Lucide React, React Markdown.
- Role: The user-facing interface. It manages chat sessions, handles file selection, and beautifully renders AI responses in Markdown format.
- Tech: Express.js, Multer, Axios.
- Role: Acts as a secure middleware layer. It intercepts file uploads from the frontend, temporarily stores them using Multer, and formats the payloads before securely proxying them to the core Python engine.
- Tech: FastAPI, LangChain, Pinecone (Vector DB), Google Gemini API, HuggingFace, FlashRank.
- Role: The "brain" of the application. It handles text extraction, chunking, vector embedding, semantic retrieval, and LLM generation.
Here is exactly what happens under the hood when you use WebDoc AI:
sequenceDiagram
actor User
participant Frontend (React)
participant Proxy (Node.js)
participant Engine (FastAPI)
participant Pinecone (Vector DB)
participant LLM (Gemini)
%% Document Upload Phase
Note over User, LLM: Phase 1: Document/URL Ingestion
User->>Frontend: Uploads Document / Enters URL
Frontend->>Proxy: POST /api/upload
Proxy->>Engine: Forwards Document
Engine->>Engine: Extract Text & Split into Chunks
Engine->>Engine: Generate Vector Embeddings
Engine->>Pinecone: Store Vectors in isolated Session Namespace
Pinecone-->>Engine: Acknowledge
Engine-->>Frontend: Return unique Session ID
%% Chat Phase
Note over User, LLM: Phase 2: Q&A Retrieval
User->>Frontend: Asks a Question
Frontend->>Proxy: POST /api/ask (Query + History + Session ID)
Proxy->>Engine: Forward Query
Engine->>LLM: Rewrite query based on Chat History
Engine->>Pinecone: Semantic Search for relevant Chunks
Pinecone-->>Engine: Return top matching context
Engine->>Engine: Re-rank results for maximum accuracy
Engine->>LLM: Generate answer using retrieved Context
LLM-->>Engine: Streaming / Final Answer
Engine-->>Frontend: Display Markdown Answer & Sources
- Ingestion: When a file or URL is uploaded, specialized Python loaders extract the raw text. The text is split into overlapping chunks to preserve context.
- Embedding & Storage: These chunks are converted into numerical representations (embeddings) and stored in a Pinecone Serverless database. Each upload gets a unique UUID to isolate data across different chat sessions.
- Contextual Retrieval: When a user asks a question, the system looks at the previous chat history to figure out exactly what the user means. It then searches the Pinecone database for the chunks that best answer the question.
- Generation: An LLM (Large Language Model) reads the retrieved context and formulates a human-readable response, guaranteeing that the answer is grounded only in the uploaded document or URL.
Since this is a three-tier app, you will need three separate terminal windows to run it.
- Rename the
.env.example(or create a.env) inrag-engine-pythonand add your API keys:PINECONE_API_KEY=your_key GEMINI_API_KEY=your_key # Add any other required keys (e.g. HuggingFace)
cd rag-engine-python
python -m venv venv
# Windows: .\venv\Scripts\Activate.ps1
# Mac/Linux: source venv/bin/activate
pip install -r requirements.txt
python -m uvicorn main:app --reload(Runs on port 8000)
cd backend-node
npm install
node server.js(Runs on port 5000)
cd frontend-react
npm install
npm run dev(Runs on port 5173)
Navigate to http://localhost:5173 in your browser to start chatting with your documents!