Skip to content

HexCrystal69/MFIS

Repository files navigation

Multimodal Financial Intelligence Platform (MFIS)

A production-grade, high-performance financial intelligence and analytics platform. MFIS features a premium Bloomberg-style terminal dashboard and advanced machine learning (FinBERT, PyTorch LSTM, XGBoost, and Qwen3 RAG) pipeline capabilities.

The system runs real-time price updates, SEC filing scrapers, news wires, feature storage, multi-agent orchestrations, and backtesting simulations.


System Architecture & Data Flow

graph TD
    Client[Web Browser Frontend] -->|WebSockets| WS[WebSocket Live Server]
    Client -->|HTTP API| API[FastAPI Endpoints]
    
    subgraph FastAPI Backend App
        API -->|Copilot Router| Copilot[Qwen3 Copilot Engine]
        API -->|RAG Router| RAG[Hybrid Search & QA Engine]
        API -->|Agents Router| Agents[Multi-Agent Coordination]
        API -->|Backtest Router| Backtest[Backtesting Simulator]
        API -->|Stocks Router| Stocks[Database / Feature Store]
        API -->|Monitoring Router| Monitoring[Metrics & Health]
        API -->|Explainability Router| SHAP[XGBoost & SHAP Explainer]
        API -->|Knowledge Graph Router| KG[Knowledge Graph Builder]
    end

    subgraph Data Stores
        Stocks -->|SQLAlchemy| DB[(PostgreSQL Database)]
        RAG -->|Dense Search| FAISS[FAISS Vector Store]
        RAG -->|Sparse Search| BM25[BM25 Index]
        API -->|Caching Layer| Cache[Msgpack In-Memory Cache]
    end

    WS -->|Live Updates| Stream[Background Streaming Task]
Loading

Component Breakdown & Data Lifecycles

  1. Frontend: Custom single-page layout styled using vanilla HSL CSS tokens, responsive layout, Chart.js for data visualization, and Canvas network structures for corporate relationship representation.
  2. Backend Services: Built using FastAPI with ASGI concurrency. Features a background WS thread broadcasting live ticks, database connection pooling using SQLAlchemy and asyncpg, and a local Msgpack serialization caching layer.
  3. Ingestion Pipeline (ETL):
    • Yahoo Finance API: Imports historical price bars and volume data.
    • SEC Filing Scraper: Automated parsing of quarterly/annual filings (Form 10-Q/10-K).
    • RSS News Wire: Real-time RSS feeds monitoring news sentiment.
  4. Analytical Database (PostgreSQL): Stores company profiles, stock listings, historical price points, sentiment ratings, calculated feature records, machine learning outputs, and system logs.

Technical Details: Machine Learning & Core AI Engine

1. Financial Sentiment Engine (FinBERT)

The platform processes market news wires and parsed SEC disclosures using FinBERT (ProsusAI/finbert).

  • Model Type: Pre-trained financial BERT (AutoModelForSequenceClassification).
  • Classification Output: Emits probability vectors for positive, negative, and neutral sentiments.
  • Cache TTL: Outputs cached for 30 minutes to optimize inference latency.

2. Deep Learning Forecast Engine (PyTorch LSTM)

Provides multi-horizon forecasts projecting 1, 7, and 30-day target prices.

  • Model Architecture: Multi-layer Long Short-Term Memory (LSTM) network built in PyTorch.
  • Inputs: 60-day historical sequence of normalized open, high, low, close, and volume features combined with FinBERT sentiment parameters.
  • Training Strategy: Retrained on startup if data exists, avoiding synthetic datasets.

3. Risk Explainability Dashboard (XGBoost + SHAP)

Evaluates volatility, financial, and market risk scores using gradient boosted trees and tree-explainers.

  • Core Classifier: XGBoost models training on Technical, Volatility, Sentiment, and Fundamental groups.
  • SHAP (SHapley Additive exPlanations): Tree SHAP explainers extract exact mathematical feature contributions for each classification probability.
  • Guarantees: Fails explicitly with a 503 Service Unavailable error if model files are missing or uninitialized instead of falling back to simulated scores.

4. Hybrid RAG (FAISS Dense + BM25 Sparse Search)

Implements double-retrieval QA over corporate disclosures:

  • Dense Vector Retrieval: Uses SentenceTransformer (all-MiniLM-L6-v2) to encode documents and index them in a FAISS index.
  • Sparse Term Retrieval: Tokenizes document corpuses for keywords using a BM25 index (rank-bm25).
  • Reranker: Employs CrossEncoder (ms-marco-MiniLM-L-6-v2) to score query-document pairs, keeping the top k candidates.
  • Generator: Intersects candidate fragments and passes them to Qwen3 (Qwen/Qwen2.5-1.5B-Instruct) pipeline to build readable reports.

5. Multi-Agent Arena Orchestrator

Coordinates independent specialist agents to draft a comprehensive investment thesis:

  • Sentiment Agent: Scans public RSS wire sentiment scores.
  • Forecast Agent: Queries the LSTM forecast sequence.
  • Risk Agent: Scans XGBoost SHAP volatility weights.
  • Fundamental Agent: Evaluates debt-to-equity and price-to-earnings ratios.
  • Coordinator: Aggregates all reports, runs a chain of thought, and saves the final thesis in the database.

Detailed API reference

All API endpoints are prefixed with /api. Authenticated routes require a valid JWT bearer header: Authorization: Bearer <token>.

Authentication

  • POST /api/auth/token: Generates a JWT access token.
    • Request Body (JSON): {"api_key": "string"} (optional)
    • Response (JSON): {"access_token": "token_str", "token_type": "bearer", "expires_in_hours": 24}

Stock Data

  • GET /api/stocks: Paginated coverage list. Query params: sector, active_only, page, page_size. (No auth required)
  • GET /api/stocks/{ticker}: Company profile, status, and latest close price. Requires JWT Header. (Cached for 5 minutes)
  • GET /api/stocks/{ticker}/prices: Historical closing bars. Query params: start_date, end_date, limit (max 2520). (No auth required, cached for 15 minutes)

Advanced Analytics

  • GET /api/features/{ticker}: Computed technical/volatility/fundamental feature map. Query params: groups, recompute. (Cached for 15 minutes)
  • GET /api/explainability/{ticker}: XGBoost risk levels and SHAP values for volatility, financial, and market risks.
  • GET /api/agents/analyze/{ticker}: Orchestrates Multi-Agent analysis. Requires JWT Header.
  • GET /api/knowledge-graph: Emits relationships map of companies, events, executives, products, and competitors. Query params: ticker (optional).
  • GET /api/backtesting/run: Backtests trading strategies on historical data. Query params: ticker, strategy (sentiment, momentum, hybrid), capital.
  • POST /api/portfolio/analyze: Computes returns, volatility, Sharpe ratio, and diversification metrics for custom portfolios.
    • Body: {"portfolio": [{"ticker": "AAPL", "weight": 40.0}, ...]}
  • POST /api/copilot/query: Chatbot answering finance queries. Body: {"query": "string"}
  • POST /api/rag/query: QA over indexed documents. Body: {"query": "string", "top_k": 3}

System Health & Monitoring

  • GET /api/monitoring/health: Basic liveness check.
  • GET /api/monitoring/detailed-health: Liveness details for Postgres database and Redis cache backend.
  • GET /monitoring/dashboard: ETL schedule details, health summaries, and cache stats.
  • GET /metrics: Prometheus metric exposition text.

WebSocket

  • WS /ws/live: Live streaming socket. Emits price updates (price_update ticks) and RSS news alerts (news_alert blocks).

Security Model: JWT Implementation

The backend features an authentication layer that safeguards critical operations:

  1. Token Generation: Access tokens expire after 24 hours and are signed using standard HMAC SHA-256 (HS256) with a cryptographically secure SECRET_KEY.
  2. Dependency Injection: Route endpoints use FastAPI's Depends(verify_token) validation. If the Authorization header is missing, malformed, or has an expired token, the system returns a 401 Unauthorized or 403 Forbidden response.
  3. Frontend Integration: On load, static/js/app.js issues a request to /api/auth/token to acquire a bearer token, storing it in-memory and appending it to all protected fetches.

Local Setup & Deployment

Prerequisites

  • Python 3.11+
  • PostgreSQL database engine
  • (Optional) Redis cache server

Step 1: Set Environment Variables

Copy .env.example to .env and configure your credentials:

POSTGRES_USER=postgres
POSTGRES_PASSWORD=your_secure_password
POSTGRES_DB=mfis
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
DATABASE_URL=postgresql+asyncpg://postgres:your_secure_password@localhost:5432/mfis

# Authentication Secret
SECRET_KEY=generate_a_secure_token_key_here

Step 2: Install Dependencies & Run Migrations

Initialize your virtual environment, install the requirements, and upgrade database tables using Alembic:

# Set up virtual environment
python -m venv venv
# Activate virtual environment
# On Windows:
.\venv\Scripts\activate
# On Linux/macOS:
source venv/bin/activate

# Install requirements
pip install -r requirements.txt

# Run migrations
alembic upgrade head

Step 3: Run the Application

Start the uvicorn development server:

uvicorn app.main:app --host 127.0.0.1 --port 8000 --reload

Open http://localhost:8000/ in your web browser. The platform will automatically seed reference assets and train ML models on startup.

Run Verification Tests

To run all tests (including ETL, caching, and API endpoint suites):

pytest

Production Deployment & Docker

Running under Docker Compose

The system includes configuration files to orchestrate the backend app, PostgreSQL, and Redis in individual containers:

# Build and launch containers
docker-compose up --build -d

# Verify logs and statuses
docker-compose ps
docker-compose logs -f

Production ASGI Servers

For production, avoid running uvicorn directly. Use Gunicorn with Uvicorn workers to enable process management and multiple workers:

gunicorn -w 4 -k uvicorn.workers.UvicornWorker app.main:app --bind 0.0.0.0:8000

Troubleshooting & Maintenance

1. Endpoint Returns "403 Forbidden" or "401 Unauthorized"

  • Cause: The route requires a JWT bearer header.
  • Solution: Confirm your client makes a POST /api/auth/token request and sets the returned token as a Bearer Header: Authorization: Bearer <token_string>.

2. Ingested data is stale or missing

  • Cause: The ETL scheduler runs daily or hourly in the background. If you just initialized the database, historical prices may not have downloaded yet.
  • Solution: Let the backend run for a few minutes or trigger data ingestion manually for a ticker via the API. Under standard development settings, querying /api/stocks/{ticker} automatically triggers ingestion via ensure_ticker_data.

3. Model Training Warnings / Issues

  • Cause: If PyTorch, transformers, or XGBoost libraries fail to load pre-trained models due to lack of connection to HuggingFace hubs.
  • Solution: Ensure your host machine has internet access on first startup to download the models. Model parameters are cached locally in your standard cache directories.

About

Production-grade financial intelligence platform using FinBERT, LSTM, XGBoost, RAG, FastAPI, PostgreSQL, and explainable AI for forecasting and market analysis.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors