AI-powered mortgage document extraction and intelligent policy Q&A system with RAG (Retrieval-Augmented Generation).
- π€ Smart Document Upload - Multi-file drag & drop with automatic processing
- π Intelligent Text Extraction - OCR for images, PDF parsing, key-value pair extraction
- π¬ Unified AI Assistant - Query uploaded documents & official mortgage policies
- π Policy Knowledge Base - Pre-trained on Fannie Mae, FHA, USDA, Freddie Mac guidelines (14,383+ chunks)
- π Text-to-Speech - Optional voice responses with ElevenLabs AI
- π Web Search Fallback - Live mortgage data when documents don't contain the answer
- π― Smart Document Routing - Single-page: key-value extraction | Multi-page: RAG indexing
| Technology | Purpose |
|---|---|
| FastAPI | High-performance Python web framework |
| Uvicorn | ASGI server |
| Python 3.9+ | Runtime |
| Technology | Purpose |
|---|---|
| Google Gemini 2.0 Flash | Large language model for intelligent responses |
| Sentence Transformers | Text embeddings (all-MiniLM-L6-v2) |
| ChromaDB | Vector database for document similarity search |
| LangChain | Text splitting and chunking utilities |
| ElevenLabs | Text-to-speech AI (multilingual) |
| SerpAPI | Real-time web search integration |
| Technology | Purpose |
|---|---|
| Google Document AI | Advanced OCR & entity extraction (optional) |
| PyPDF2 / pdfplumber | PDF text extraction |
| Pytesseract | OCR for images |
| Pillow | Image processing |
| Technology | Purpose |
|---|---|
| React 18 | UI framework |
| TypeScript | Type-safe JavaScript |
| Tailwind CSS | Utility-first styling |
| React Router | Client-side routing |
# Python 3.9+
python3 --version
# Node.js 16+
node --version
# Tesseract OCR (for image text extraction)
brew install tesseract # macOS
# or: sudo apt-get install tesseract-ocr # Linuxcd backend
# Install Python dependencies
pip install -r requirements.txt
# Configure environment variables
cp .env.example .env
# Edit .env with your API keys:
# - GEMINI_API_KEY
# - SERPAPI_API_KEY
# - ELEVENLABS_API_KEY (optional, for TTS)
# Start backend server
python3 -m uvicorn main:app --reload --host 0.0.0.0 --port 8000cd frontend
# Install dependencies
npm install
# Build production version
npm run build
# Serve static build
npx serve -s build -l 3000
# Or run development server
npm start- GEMINI_API_KEY - Get from Google AI Studio
- SERPAPI_API_KEY - Get from SerpAPI
- ELEVENLABS_API_KEY - Get from ElevenLabs (for TTS)
- DOCAI_PROJECT_ID - Google Cloud Document AI (fallback uses free OCR)
Add these to backend/.env:
GEMINI_API_KEY="your_key_here"
SERPAPI_API_KEY="your_key_here"
ELEVENLABS_API_KEY="your_key_here" # Optionalβββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Frontend (React + TypeScript + Tailwind) β
β http://localhost:3000 β
ββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββ
β
β HTTP/REST API
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Backend (FastAPI + Python) β
β http://localhost:8000 β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β βββββββββββββββββββ βββββββββββββββββββ β
β β Document AI β β Mortgage KB β β
β β OCR + Extractionβ β RAG System β β
β βββββββββββββββββββ βββββββββββββββββββ β
β β
β βββββββββββββββββββββββββββββββββββββββ β
β β ChromaDB Vector Database β β
β β β’ Policy Collection: 14,383 chunks β β
β β β’ User Collection: Dynamic β β
β βββββββββββββββββββββββββββββββββββββββ β
β β
β ββββββββββββ ββββββββββββ ββββββββββββ β
β β Gemini β β SerpAPI β β ElevenLabsβ β
β β AI β β Search β β TTS β β
β ββββββββββββ ββββββββββββ ββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Upload File
β
Page Count Detection
β
βββββββββββββ΄ββββββββββββ
β β
1 Page 2+ Pages
β β
Key-Value Key-Value +
Extraction RAG Indexing
β β
Display Queryable in Chat
POST /api/documents/upload Upload files
POST /api/documents/{id}/process Process & extract
GET /api/documents List all documents
GET /api/documents/{id} Get document details
DELETE /api/documents/{id} Delete document
POST /api/mortgage-kb/query Query unified KB (ChromaDB β Web)
GET /api/mortgage-kb/stats Get KB statistics
POST /api/mortgage-kb/tts Text-to-speech conversion
POST /api/rag/query RAG query with web search
GET /api/rag/stats RAG statistics
HackUTA/
βββ backend/
β βββ main.py # FastAPI application
β βββ routers.py # API endpoints
β βββ mortgage_kb_service.py # Unified RAG + TTS
β βββ document_ai_service.py # OCR & extraction
β βββ rag_service.py # RAG with web search
β βββ storage_service.py # Google Cloud Storage
β βββ audio_cache/ # TTS audio files
β βββ uploads/ # User uploaded files
β βββ chroma_storage/ # Vector DB
β βββ RAG/
β βββ documents/ # Policy PDFs (14K+ chunks)
β
βββ frontend/
β βββ src/
β β βββ components/
β β β βββ DocumentUpload.tsx
β β β βββ MortgageKnowledgeBase.tsx # Unified chat
β β β βββ ExtractedDataView.tsx
β β β βββ Header.tsx
β β βββ pages/
β β β βββ Dashboard.tsx
β β βββ config/
β β βββ constants.ts
β βββ build/ # Production build
β
βββ requirements.txt # Python dependencies
βββ README.md # This file
- User uploads PDFs, images, or text files
- System detects page count automatically
- Single-page: Extract key-values (invoices, IDs)
- Multi-page: Extract + add to RAG for Q&A
Priority Order:
- π₯ ChromaDB - Search user documents + policy documents first
- π₯ Web Search - Only if no relevant docs found in ChromaDB
- π₯ Gemini AI - Generate intelligent answers from context
- Toggle TTS on/off in UI
- Responses read aloud with professional voice
- Cached for instant replay (saves API calls)
- β Upload client loan applications
- β Extract key information automatically
- β Query multi-page loan agreements
- β Compare against policy guidelines
- β Understand document requirements
- β Get mortgage process guidance
- β Ask policy-related questions
- β Organize loan documents
# Visit http://localhost:3000
# Upload a PDF or image
# Watch automatic processing
# View extracted key-value pairs# In the Document & Policy Assistant:
# Ask: "What are Fannie Mae debt-to-income requirements?"
# Get answer with policy document citations# Enable TTS toggle (purple checkbox)
# Ask any question
# Hear response read aloud- Single-page extraction: < 2 seconds
- Multi-page RAG indexing: ~1 second per page
- OCR fallback: 2-5 seconds (images)
- ChromaDB search: < 500ms
- Gemini generation: 1-3 seconds
- Cached TTS: < 50ms (instant!)
- New TTS: 2-4 seconds
- 14,383+ policy document chunks indexed
- Semantic search across 500+ pages
- Sub-second retrieval
- β CORS configured for local development
- β
API keys stored in
.env(never committed) - β File uploads validated and sanitized
- β UUID-based file naming prevents collisions
β οΈ For production: Add authentication, rate limiting, and HTTPS
# Check Python version
python3 --version # Must be 3.9+
# Install dependencies
pip install -r requirements.txt
# Check API keys
grep -E "GEMINI|SERP" backend/.env# Verify ElevenLabs key is set
grep ELEVENLABS backend/.env
# Check backend logs
tail -20 backend/backend.log | grep -i elevenlabs- Single-page docs are NOT queryable (by design)
- Only multi-page PDFs are added to RAG
- Check processing logs for "added to RAG" message
google-generativeai # Gemini AI
chromadb # Vector database
sentence-transformers # Embeddings
elevenlabs # Text-to-speech
google-search-results # SerpAPI
fastapi # Web framework
pdfplumber # PDF processing
pytesseract # OCR
react # UI framework
typescript # Type safety
tailwindcss # Styling
react-router-dom # Routing
- Document upload with drag & drop
- Unified AI assistant interface
- TTS toggle for voice responses
- Document list with status tracking
- Color-coded sources (blue = user docs, green = policies)
- Real-time query with source citations
- Suggested questions
- Auto-play audio responses
This project was built for HackUTA. Key features:
- Clean, modular architecture
- Comprehensive error handling
- Persistent vector storage
- Intelligent fallback systems
- Production-ready codebase
MIT License - Built for HackUTA 2025
Technologies Used:
- Google Gemini AI
- ElevenLabs TTS
- ChromaDB Vector Database
- Sentence Transformers
- SerpAPI
- FastAPI
- React + TypeScript
Built with β€οΈ at HackUTA
# 1. Clone and install
cd backend && pip install -r requirements.txt
cd frontend && npm install
# 2. Configure API keys in backend/.env
GEMINI_API_KEY="your_key"
SERPAPI_API_KEY="your_key"
ELEVENLABS_API_KEY="your_key" # Optional
# 3. Start services
cd backend && python3 -m uvicorn main:app --reload --host 0.0.0.0 --port 8000 &
cd frontend && npx serve -s build -l 3000 &
# 4. Open browser
open http://localhost:3000You're ready to go! π
For detailed documentation, see:
- System architecture & API docs: http://localhost:8000/docs
- Frontend: http://localhost:3000