Earnings Thesis Evaluator

A containerized research system that ingests earnings-call transcripts and SEC filings, runs a multi-stage LangGraph pipeline powered by DigitalOcean Serverless Inference, generates structured research theses with falsifiable predictions, and backtests those predictions against subsequent price action.

Disclaimer: Research only. Not financial advice. Not a recommendation to buy or sell any security.

Architecture

graph TD
    UI[React Dashboard] -->|REST| API[FastAPI :8000]
    API -->|enqueue| Worker[Celery Worker]
    Worker -->|run| Pipeline[LangGraph Pipeline]
    Pipeline -->|LLM calls| DO[DigitalOcean Serverless Inference]
    Pipeline -->|embeddings| DO
    Pipeline -->|read/write| DB[(Postgres + pgvector)]
    Pipeline -->|SEC filings| EDGAR[SEC EDGAR API]
    Pipeline -->|price data| Market[yfinance / sample fallback]
    Pipeline -->|transcripts| Transcripts[sample_data/transcripts/]
    API -->|read| DB
    UI -->|view| DB

    subgraph LangGraph Stages
        S1[fetch_sources] --> S2[segment_transcript]
        S2 --> S3[extract_guidance]
        S3 --> S4[extract_analyst_pushback]
        S4 --> S5[retrieve_prior_context RAG]
        S5 --> S6[detect_sentiment_shift]
        S6 -->|lowered/vague guidance| DD1[deep_dive_guidance_risk]
        S6 -->|pushback >= 3| DD2[deep_dive_pushback]
        S6 -->|shift <= -2| DD3[deep_dive_negative_shift]
        S6 -->|skip| S7[generate_thesis]
        DD1 --> DD2 --> DD3 --> S7
        S7 --> S8[score_thesis]
        S8 --> S9[backtest_outcome]
    end

Stack

Layer	Technology
LLM & Embeddings	DigitalOcean Serverless Inference (OpenAI-compatible)
Orchestration	LangGraph 0.2
Backend	FastAPI + Uvicorn
Background Jobs	Celery + Redis
Database	Postgres 16 + pgvector
Vector Search	NumPy cosine similarity (pgvector-ready)
Market Data	yfinance (deterministic fallback if unavailable)
SEC Filings	EDGAR REST API
Frontend	React 18 + Vite
Container	Docker + Docker Compose

DigitalOcean Serverless Inference Setup

1. Create a model access key

Log in to DigitalOcean Cloud
Navigate to AI & ML → Serverless Inference
Click Create Access Key
Copy the key — it is shown only once

2. Pick a model

Run the check script (after setting your key) to see available models:

python scripts/check_do_inference.py

Common models available on DigitalOcean Serverless Inference:

meta-llama-3-70b-instruct — good balance of speed and quality
meta-llama-3.1-405b-instruct — highest quality
mistral-7b-instruct — fastest
Check /v1/models for the current list

3. Configure environment

Copy .env.example to .env and fill in:

cp .env.example .env

DIGITALOCEAN_INFERENCE_API_KEY=dop_v1_xxxx...
DIGITALOCEAN_LLM_MODEL=meta-llama-3-70b-instruct
DIGITALOCEAN_EMBEDDING_MODEL=text-embedding-ada-002   # if available, else leave blank

If DIGITALOCEAN_EMBEDDING_MODEL is blank, the pipeline uses zero-vectors (RAG retrieval will be random but the pipeline still runs).

Quick Start

Option A — Docker Compose (recommended)

# 1. Clone and configure
cp .env.example .env
# Edit .env — set DIGITALOCEAN_INFERENCE_API_KEY and DIGITALOCEAN_LLM_MODEL

# 2. Build and start
docker compose up --build

# 3. Frontend:  http://localhost:3000
# 4. API docs:  http://localhost:8000/docs

Option B — Local development

# Backend
python -m venv .venv
source .venv/bin/activate          # Windows: .venv\Scripts\activate
pip install -r requirements.txt
cp .env.example .env               # fill in your keys

# Start Postgres locally (or set DATABASE_URL to SQLite for dev)
# DATABASE_URL=sqlite:///./dev.db

uvicorn app.main:app --reload

# Frontend (separate terminal)
cd frontend
npm install
npm run dev                        # http://localhost:5173

Demo Walkthrough

1. Verify DigitalOcean connectivity

python scripts/check_do_inference.py

Expected output:

✅  All checks passed. DigitalOcean Inference is ready.

2. Seed sample data

python scripts/seed_demo.py

Ingests 3 bundled transcripts: AAPL 2024Q4, NVDA 2024Q3, MSFT 2024Q2.

3. Run a full pipeline analysis

python scripts/run_demo_analysis.py --ticker AAPL --period 2024Q4

Output includes: thesis, bull/bear case, falsifiable predictions, backtest results.

4. Use the dashboard

Open http://localhost:3000
Select ticker + period, click Start Analysis
Watch the stage progress bar fill as LangGraph executes
Click View Full Thesis for evidence, predictions, and backtest table

API Reference

Method	Endpoint	Description
GET	`/api/v1/health`	App health
GET	`/api/v1/health/llm`	DigitalOcean Inference connectivity + model check
GET	`/api/v1/models`	List available DO inference models
POST	`/api/v1/ingest/transcript`	Upload transcript text for RAG
POST	`/api/v1/runs`	Start analysis pipeline
GET	`/api/v1/runs/{run_id}`	Poll run status + stage outputs
GET	`/api/v1/theses`	List generated theses
GET	`/api/v1/theses/{id}`	Full thesis detail
POST	`/api/v1/backtest/{run_id}`	(Re)run backtest for a run
GET	`/api/v1/backtests`	List all backtest results

Interactive docs: http://localhost:8000/docs

Environment Variables

Variable	Required	Description
`DIGITALOCEAN_INFERENCE_API_KEY`	Yes	DO model access key
`DIGITALOCEAN_LLM_MODEL`	Yes	Chat model ID
`DIGITALOCEAN_EMBEDDING_MODEL`	No	Embedding model ID (blank = zero vectors)
`DIGITALOCEAN_INFERENCE_BASE_URL`	No	Defaults to `https://inference.do-ai.run/v1`
`DATABASE_URL`	Yes	Postgres or SQLite URL
`REDIS_URL`	No	Celery broker (defaults to `redis://localhost:6379/0`)
`SEC_USER_AGENT`	Yes	`Name email@example.com` for EDGAR headers
`MARKET_DATA_PROVIDER`	No	`yfinance` (default); falls back to sample data

Supported Tickers (MVP)

AAPL, MSFT, NVDA, AMZN, META

Sample transcripts bundled:

sample_data/transcripts/AAPL_2024Q4.txt — Apple Q4 2024 (Oct 31, 2024)
sample_data/transcripts/NVDA_2024Q3.txt — NVIDIA Q3 FY2025 (Nov 20, 2024)
sample_data/transcripts/MSFT_2024Q2.txt — Microsoft Q2 FY2025 (Jan 29, 2025)

Running Tests

pip install -r requirements.txt
pytest -v

Test coverage:

test_do_provider.py — DigitalOcean provider (mocked HTTP)
test_chunker.py — RAG text chunker
test_backtest.py — Return calculations + deterministic sample data
test_schema_validation.py — Pydantic schemas
test_transcript_segmentation.py — Transcript parser utilities
test_rag_retrieval.py — Vector store ingest + cosine retrieval (SQLite)
test_pipeline_smoke.py — Full LangGraph graph with mocked LLM

Limitations

RAG is cold on first run. Prior context retrieval only works after multiple transcripts have been ingested. Run seed_demo.py first.
Embeddings require a model. If DIGITALOCEAN_EMBEDDING_MODEL is blank, all embeddings are zero-vectors and RAG ranking is non-functional (pipeline still runs).
Backtest uses close-to-close returns. Intraday dynamics, bid-ask spread, and transaction costs are not modeled.
Market data falls back to deterministic synthetic data if yfinance is unavailable or the date range has no data. The synthetic data is clearly labeled is_sample_data=true.
SEC filing text fetch is best-effort. EDGAR HTML parsing may fail for older filings; the pipeline gracefully continues without filing text.
Concurrency is limited. The FastAPI background task runner and Celery worker are single-threaded by default. For production, scale Celery workers.
No authentication. This is an MVP. Add OAuth2 / API key auth before exposing publicly.
China/macro context and true multi-quarter trend analysis require more historical data than the MVP seeds.

Future Work

Project Structure

earnings-thesis-eval/
├── app/
│   ├── config.py               # Settings (pydantic-settings)
│   ├── database.py             # SQLAlchemy engine + session
│   ├── main.py                 # FastAPI app entry point
│   ├── models/
│   │   └── core.py             # DB models: Company, Thesis, Backtest, etc.
│   ├── llm/
│   │   ├── digitalocean.py     # DO Inference provider (chat, embed, models)
│   │   └── prompts.py          # All LLM prompt templates
│   ├── tools/
│   │   ├── sec_edgar.py        # SEC EDGAR MCP-style tool
│   │   ├── market_data.py      # Price data + deterministic fallback
│   │   ├── transcripts.py      # Transcript loader
│   │   └── news.py             # News stub
│   ├── rag/
│   │   ├── chunker.py          # Text chunking
│   │   └── store.py            # Vector store (cosine sim / pgvector-ready)
│   ├── pipeline/
│   │   ├── state.py            # LangGraph PipelineState TypedDict
│   │   ├── nodes.py            # All pipeline node implementations
│   │   ├── graph.py            # LangGraph StateGraph construction
│   │   └── runner.py           # High-level run_pipeline() entry point
│   ├── api/
│   │   ├── schemas.py          # Pydantic request/response schemas
│   │   └── routes.py           # FastAPI route handlers
│   └── workers/
│       ├── celery_app.py       # Celery configuration
│       └── tasks.py            # Celery tasks
├── frontend/
│   ├── src/
│   │   ├── App.jsx             # Router + nav
│   │   ├── api.js              # Axios API client
│   │   ├── components/         # Card, Badge
│   │   └── pages/              # Dashboard, RunPage, ThesisPage, BacktestsPage, HealthPage
│   ├── index.html
│   ├── package.json
│   └── vite.config.js
├── sample_data/transcripts/    # Bundled earnings call transcripts
├── scripts/
│   ├── check_do_inference.py   # DO connectivity smoke test
│   ├── seed_demo.py            # Database seed
│   └── run_demo_analysis.py    # CLI pipeline runner
├── tests/                      # pytest test suite
├── Dockerfile.backend
├── Dockerfile.worker
├── Dockerfile.frontend
├── docker-compose.yml
├── nginx.conf
├── requirements.txt
├── pytest.ini
└── .env.example

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Earnings Thesis Evaluator

Architecture

Stack

DigitalOcean Serverless Inference Setup

1. Create a model access key

2. Pick a model

3. Configure environment

Quick Start

Option A — Docker Compose (recommended)

Option B — Local development

Demo Walkthrough

1. Verify DigitalOcean connectivity

2. Seed sample data

3. Run a full pipeline analysis

4. Use the dashboard

API Reference

Environment Variables

Supported Tickers (MVP)

Running Tests

Limitations

Future Work

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
app		app
frontend		frontend
sample_data/transcripts		sample_data/transcripts
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
AUDIT.md		AUDIT.md
Dockerfile.backend		Dockerfile.backend
Dockerfile.frontend		Dockerfile.frontend
Dockerfile.worker		Dockerfile.worker
README.md		README.md
docker-compose.yml		docker-compose.yml
nginx.conf		nginx.conf
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Earnings Thesis Evaluator

Architecture

Stack

DigitalOcean Serverless Inference Setup

1. Create a model access key

2. Pick a model

3. Configure environment

Quick Start

Option A — Docker Compose (recommended)

Option B — Local development

Demo Walkthrough

1. Verify DigitalOcean connectivity

2. Seed sample data

3. Run a full pipeline analysis

4. Use the dashboard

API Reference

Environment Variables

Supported Tickers (MVP)

Running Tests

Limitations

Future Work

Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages