This repository implements a Medallion Architecture (Bronze–Silver–Gold) using FinDrum, enabling two advanced applications:
- A financial Q&A chatbot powered by local LLMs.
- A declarative comparative analysis API built on a Subject Store.
| Layer | Responsibility | Storage / Destination |
|---|---|---|
| Bronze | Raw ingestion from SEC API, with metadata | JSON files in MinIO bronze/ bucket |
| Silver | Cleaning, normalization, and semantic enrichment | Parquet files in MinIO silver/ bucket |
| Gold | Structured data for Q&A and API consumption | PostgreSQL (financial_data table) and Subject Store |
Data flows through Docker containers, orchestrated by FinDrum and triggered by MinIO events.
A daily pipeline (scheduled at 10:20) ingests SEC companyfacts for selected companies (e.g., Apple, Tesla):
- SecFactSource fetches raw financial facts via API.
- MinioWriter stores raw JSON files in
bronze/bucket.
Triggered by new files in bronze/ bucket:
- DictFlattener converts nested JSON to flat structure.
- ValueFilter / ColumnFilter remove irrelevant or malformed records.
- ValueMapper harmonizes financial terms (e.g.,
NetIncome,Assets). - The result is written as Parquet files into MinIO’s
silver/bucket.
Triggered by new Silver Parquet files:
- MinioReader reads the Parquet data.
- ValueMapper extracts fiscal year info (e.g.,
CY2021) using regex. - ColumnFilter removes unnecessary columns.
- PostgresInsertOperator inserts cleaned data into
financial_datatable.
Local LLMs (OpenHermes‑2.5‑Mistral‑7B, Meta‑LLaMA‑3‑8B‑Instruct, via llama.cpp) answer queries such as “What was Apple's net income in 2021?” using a Gradio interface:
Another Silver-to-Gold pipeline:
- MinioReader loads Silver data.
- InferMissingQuarters fills missing quarterly data based on annual totals.
- ValueFilter ensures quarterly records.
- SendToAPI sends normalized data to Subject Store via HTTP.
A REST endpoint /view accepts:
- Company ID
- Data topic ("financial", "stock-market"...)
- YAML-defined view configuration (rows, columns, metrics)
This service was used on a companies benchmarking website:
| Component | Purpose | Container / Technology |
|---|---|---|
| MinIO | Data Lake for Bronze and Silver | minio:9000/9001 |
| PostgreSQL | Gold storage for structured Q&A data | postgres:5432 |
| Subject Store | Gold storage for declarative API views | HTTP / Internal API |
| FinDrum | Pipeline orchestration | Python |
| LLMs + Gradio | Local QA chatbot interface | Python + llama.cpp |
| API Server | Exposes /view endpoint for analysis |
Python (e.g., FastAPI) |
| Docker Compose | Environment orchestration | Docker |
git clone https://github.com/FinDrum/example-medallion.git
cd infrastructuredocker-compose up --buildcd ..
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python main.py- MinIO UI:
http://localhost:9001(minioadmin:minioadmin) - PostgreSQL:
localhost:5432, tablefinancial_data - Gradio QA interface:
http://localhost:7860 - Subject Store API:
http://localhost:8080/view, accepts YAML payloads
- Natural language financial queries, e.g., “What was Tesla’s revenue in 2022?”
- Custom time-series analysis via YAML-defined views for benchmarking companies.
- Docker & Docker Compose
- Python 3.12+
- Java 21
- Local LLMs in GGUF format
- Add Delta Lake support for version control in Silver/Gold layers.


