| title | FinRAG |
|---|---|
| emoji | 📊 |
| colorFrom | blue |
| colorTo | gray |
| sdk | docker |
| app_port | 7860 |
| pinned | false |
Production-style retrieval-augmented generation system for SEC 10-K analysis.
FinRAG ingests annual filings, parses them into structured markdown, builds a hybrid retrieval index and serves grounded answers through a FastAPI backend and React frontend. The project is designed to answer filing-specific questions with traceable source passages rather than free-form financial commentary.
- Ingests 10-K PDF filings with LlamaCloud parsing.
- Normalizes parsed markdown into section-aware chunks with metadata.
- Stores embeddings in a persistent Chroma collection.
- Combines dense retrieval, BM25 sparse retrieval, RRF fusion, and cross-encoder reranking.
- Serves answers with retrieved source passages through an API and web UI.
PDF upload / local filing
|
v
LlamaCloud parse -> page markdown -> normalized sections -> rechunked chunks
|
v
Embeddings (BAAI/bge-large-en-v1.5)
|
v
Chroma vector store + BM25 index
|
v
RRF fusion -> cross-encoder reranker
|
v
LLM answer generation with source passages
|
v
FastAPI API + React frontend
FinRAG uses a multi-stage retrieval stack rather than a single vector search.
- Parse the 10-K into markdown with page-level structure.
- Split by filing headers, then rechunk long sections while preserving tables.
- Embed chunks with
BAAI/bge-large-en-v1.5. - Retrieve candidates from:
- Chroma dense similarity search
- BM25 sparse keyword search
- Merge candidate lists with Reciprocal Rank Fusion.
- Rerank the merged shortlist with
BAAI/bge-reranker-base. - Generate the final answer from retrieved evidence only.
This design improves recall on both semantic and keyword-heavy questions, which matters for filings containing exact legal phrasing, section titles, and numeric disclosures.
- Backend: FastAPI, Python
- Retrieval: ChromaDB, BM25, RRF fusion, cross-encoder reranking
- Embeddings:
BAAI/bge-large-en-v1.5 - Parsing: LlamaCloud
- LLM serving: Groq via
langchain_groq - Frontend: React + Vite
- Packaging: Docker
FinRAG/
├─ app/
│ ├─ app.py # FastAPI application
│ └─ main.py # CLI entrypoint for ingestion and query
├─ config/
│ └─ settings.py # Central configuration
├─ rag/
│ ├─ ingestion.py # Parse, normalize, rechunk, persist artifacts
│ ├─ embeddings.py # Embedding model wrapper
│ ├─ retriever.py # Dense + BM25 + RRF + reranker
│ ├─ pipeline.py # Prompting and structured query output
│ └─ vectorstore.py # Chroma collection management
├─ frontend/
│ └─ src/ # React client
├─ data/
│ ├─ raw/ # Uploaded / local 10-K PDFs
│ ├─ processed/ # Saved parse artifacts and chunk manifests
│ └─ vectorstore/ # Persistent Chroma data
├─ Dockerfile
└─ requirements.txt
- Python 3.10+
- Node.js 18+
- Groq API key
- LlamaCloud API key
Create a .env file:
GROQ_API_KEY=...
LLAMA_CLOUD_API_KEY=...cd FinRAG
python -m venv .venv
.venv\Scripts\activate
python -m pip install --upgrade pip
pip install -r requirements.txtpython -m app.main --ingestThis will:
- parse the most recent PDF in
data/raw/if no explicit path is wired in, - create processed parse artifacts under
data/processed/<doc_id>/, - build or extend the Chroma collection.
uvicorn app.app:app --reload --port 8000cd frontend
npm install
npm run devFrontend runs on http://localhost:5173 and calls the backend on
http://localhost:8000.
GET /health- liveness check plus current vector-store document count
POST /api/upload- upload a PDF and trigger ingestion
POST /api/query- submit a natural-language question against the indexed filings
POST /api/reset-index- clear the vector index for a fresh dev session
python -m app.main --query "What are the main risk factors disclosed in this filing?"Expected response shape:
{
"answer": "...",
"top_retrieval_score": 0.61,
"sources": [
{
"source": "Tesla 2025 10-K",
"section": "Item 1A. Risk Factors",
"score": 0.61,
"preview": "..."
}
]
}