Hybrid RAG Pipeline, Semantic Search, and Multi-Document Intelligence.
In the era of information density, extracting precise answers from large PDF collections is critical. InvenioAI is a high-performance Document Q&A system that implements a state-of-the-art Hybrid RAG pipeline.
It transforms static PDF documents into a searchable, intelligent knowledge base, allowing users to ask complex questions and receive answers grounded in retrieved context with verifiable source citations.
- Hugging Face Space: https://felixhrdyn-invenioai.hf.space
- Hybrid RAG Pipeline: Combines dense semantic retrieval (MultiQuery + MMR) with lexical BM25 search, fused via weighted Reciprocal Rank Fusion (RRF).
- Advanced Reranking: Utilizes Cross-Encoder models to re-evaluate top candidates, ensuring the most relevant context is provided to the LLM.
- Async Job Orchestration: Background indexing and query execution with real-time status polling for a smooth user experience.
- Deep Analytics Dashboard: Built-in metrics tracking for retrieval accuracy (nDCG, HitRate), latency, and API usage.
- Cloud-Ready Architecture: Ships with an all-in-one Docker configuration optimized for Hugging Face Spaces and Azure Container Apps.
- Flexible UI: Premium Streamlit interface featuring a custom design system, glassmorphism aesthetics, and interactive chat history.
- Framework: FastAPI
- RAG Engine: LangChain
- Models: Google Gemini 3.1 Flash Lite Preview, all-MiniLM-L6-v2 (Local Embedding)
- Reranker: Cross-Encoder (MS-MARCO MiniLM)
- Search: BM25 (Lexical) + Qdrant (Dense)
- Framework: Streamlit
- Visualization: Plotly, Pandas
- Styling: Vanilla CSS (Custom Design System)
- Icons: Lucide (SVG)
- Vector Database: Qdrant (Local / Server / Cloud)
- Deployment: Docker, GitHub Actions (CI/CD)
- Environment: Python 3.10+
graph TD
subgraph Data_Layer [Ingestion Layer]
PDF[PDF Documents] -->|Upload| API[FastAPI Backend]
API -->|Chunking| Split[Text Splitter]
end
subgraph Intelligence_Layer [Processing & RAG]
Split -->|Dense| QDR[Qdrant Vector DB]
Split -->|Lexical| BM25[BM25 Index]
API -->|Query| RAG[Hybrid Retriever]
RAG -->|RRF Fusion| Fuse[Candidate Fusion]
Fuse -->|Reranking| Rerank[Cross-Encoder]
Rerank -->|Context| LLM[Gemini LLM]
end
subgraph Presentation_Layer [UI & Analytics]
UI[Streamlit Dashboard] -->|REST API| API
LLM -->|Answer| UI
API -->|Log| Metrics[Local Metrics Store]
Metrics -->|Visualize| Dashboard[Analytics Page]
end
InvenioAI is optimized for speed and retrieval precision while maintaining low operational costs.
| Parameter | Value | Description |
|---|---|---|
| Retrieval Mode | Hybrid | Dense (MMR) + Lexical (BM25) |
| Fusion Limit | Top 20 | Candidates kept after RRF fusion |
| QA Latency | ~10-15s | Average end-to-end response time |
| Indexing Speed | ~32 chunks/batch | Optimized for memory-constrained runtimes |
- Python 3.10+
- Google Gemini API Key
- Qdrant Instance (Optional, defaults to local storage)
Step 1: Environment Setup
python -m venv venv
source venv/bin/activate # venv\Scripts\activate on Windows
pip install -r requirements.txt
cp .env.example .envStep 2: Run Application
# Terminal 1: Backend API
uvicorn app.main:app --reload
# Terminal 2: Streamlit UI
streamlit run frontend/streamlit_app.pyStep 3: Docker (Production)
docker build -t invenioai .
docker run -p 7860:7860 invenioaiThe application is configured via .env. Key variables include:
GEMINI_API_KEY: Required for LLM and Query Rewriting.QDRANT_URL: Optional server URL (defaults to local./qdrant_storage).INVENIOAI_ENABLE_HYBRID_SEARCH: Toggle dense+lexical mode (Default:1).INVENIOAI_DELETE_UPLOADED_PDFS: Clean up storage after indexing (Default:0).
Felix Hardyan