PDF Summarizer AI is a full-stack AI-powered application that allows users to upload PDF documents and interact with them through intelligent summarization and question answering. It uses vector embeddings, a semantic search engine, and modern AI tooling to deliver accurate, context-aware responses based on document content.
- Upload any PDF and process its content
- Chunk text intelligently using Langchain
- Convert chunks to vector embeddings using OpenAI
- Store and query semantic vectors with Qdrant
- Ask questions and receive answers grounded in document context
- Seamless API and frontend integration
- Containerized deployment via Docker
- Frontend: React + TypeScript + Tailwind css + clerk authentication services
- Backend: Node.js with Express
- AI & Embedding: OpenAI (
text-embedding-3-small) , (gpt-3.5-turbo) - Chunking: Langchain
- Vector DB: Qdrant
- Containerization: Docker & Docker Compose
PDF content is often long and unstructured. Langchain helps split the text into semantically meaningful chunks while maintaining coherence and overlap for context retention. This ensures high-quality embedding and accurate downstream search.
Each text chunk is converted to a dense vector using OpenAI's text-embedding-3-small model. These vectors capture semantic meaning, enabling similarity search when a user asks a question about the document.
Qdrant stores the vector embeddings along with metadata (e.g., chunk ID, page number). During a query, the system retrieves the most relevant vectors by comparing the query embedding to those stored in the collection.
The backend is built with Express and handles:
- PDF upload and parsing (
pdf-parse) - Text chunking with Langchain
- Embedding and storage in Qdrant
- Query vector generation and retrieval
- Sending context to OpenAI’s GPT for answer generation
The app is containerized with Docker. Docker Compose runs:
- Qdrant (as a service)
- The Node.js backend
Frontend can be run separately or added to the Docker stack.
git clone https://github.com/your-username/pdf-summarizer-ai.git
cd PDFsummarizer- Configure Environment Variables Create a .env file inside the server folder:
OPENAI_API_KEY=your_openai_api_key
QDRANT_URL=http://qdrant:6333
- Run the Application Start backend and Qdrant with Docker:
docker-compose up --build
Start frontend manually:
cd client
npm install
npm run dev
.
├── client/ # React frontend
├── server/ # Express backend with Langchain, OpenAI, Qdrant
│ ├── index.js
│ ├── worker.js
│ ├── routes/
│ ├── utils/
│ └── .env
├── docker-compose.yml
└── README.md
Demo video : https://youtu.be/NhSFq8AqdV8
The video includes:
PDF upload
Chunking and embedding preview
Real-time Q&A on uploaded content
Backend logs showing vector interactions
Students & Researchers: Summarize academic papers and books
Legal Professionals: Analyze contracts and case files
Knowledge Workers: Review whitepapers, reports, and manuals efficiently