GitHub - simranduggal75/Document-QA-Assistant: An end-to-end NLP-based web application that allows users to upload documents (PDF/TXT) and ask questions based on their content through an interactive UI.

📄 Document Q&A Assistant

An end-to-end NLP-based web application that allows users to upload documents (PDF/TXT) and ask questions based on their content through an interactive UI.

Live Demo

🔗 https://document-qa-assistant.onrender.com

Live Demo Note

The application is deployed on Render's free tier.
If inactive, the server may take 30–60 seconds to wake up on the first request.

🚀 Features

📤 Upload multiple documents (PDF, TXT)
📄 Automatic text extraction
✂️ Text chunking for efficient processing
🔍 Keyword-based search (information retrieval)
🤖 Answer extraction from relevant content
🌐 Interactive web interface (no Postman required)
⚡ AJAX-based upload and query (no page reload)

🧠 How It Works

Upload Documents Files are uploaded and stored locally.
Text Extraction Extracts content from PDF and TXT files using PyPDF2.
Chunking Splits large text into smaller chunks for better retrieval.
Search (Retrieval) Matches user query with chunks using keyword overlap.
Answer Extraction Selects the most relevant sentence as the final answer.

🛠️ Tech Stack

Python
Flask
HTML / CSS
JavaScript (Fetch API)
PyPDF2
Basic NLP (tokenization, keyword matching)

🔌 API Endpoints

1. Upload Documents

POST "/upload"

Upload one or more PDF/TXT files
Files are processed and stored locally

Request Type: "multipart/form-data"

Form Data:

"files": one or multiple files

Response:

Uploaded: file1.pdf, file2.txt

2. Ask Question

POST "/ask"

Takes a user query and returns an answer based on uploaded documents

Request Body (JSON):

{ "query": "What are access modifiers?" }

Response (JSON):

{ "answer": "Access modifiers are keywords...", "source": "java1st.pdf" }

⚠️ Limitations

Uses keyword-based search (not semantic understanding)
May return approximate results for vague queries
Data stored in-memory (resets when server restarts)

🔮 Future Improvements

Semantic search using embeddings
Integration with LLMs (GPT / Gemini)
Persistent storage (database / vector DB)
Improved UI/UX

💡 Key Learning

Built a complete pipeline from document ingestion → text processing → retrieval → answer generation, similar to real-world RAG (Retrieval-Augmented Generation) systems.

👩‍💻 Author

Simran Duggal

AI/ML Engineer

⭐ If you found this useful

Give this repo a ⭐ on GitHub!

📸 Output Screenshot

Please refer to screenshots folder for output screenshots

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
Screenshots		Screenshots
templates		templates
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📄 Document Q&A Assistant

Live Demo

Live Demo Note

🚀 Features

🧠 How It Works

🛠️ Tech Stack

🔌 API Endpoints

1. Upload Documents

2. Ask Question

⚠️ Limitations

🔮 Future Improvements

💡 Key Learning

👩‍💻 Author

⭐ If you found this useful

📸 Output Screenshot

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📄 Document Q&A Assistant

Live Demo

Live Demo Note

🚀 Features

🧠 How It Works

🛠️ Tech Stack

🔌 API Endpoints

1. Upload Documents

2. Ask Question

⚠️ Limitations

🔮 Future Improvements

💡 Key Learning

👩‍💻 Author

⭐ If you found this useful

📸 Output Screenshot

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages