Personal Learning Project: This is an educational project created in my free time to learn about LLM orchestration, agentic pipelines, and financial data analysis. All data used is publicly available.
This project is a hands-on learning exercise to understand:
- LangGraph for building multi-step LLM agentic workflows
- LLM orchestration — chaining extraction, enrichment, and reasoning steps
- Prompt engineering for structured data extraction and signal generation
- Data pipeline design with SQLite caching and JSON output
- Financial data analysis using public market data
Disclaimer: This is purely for educational purposes. Not financial advice. All data is from public sources.
-
LLM Concepts & Frameworks
- LangGraph
StateGraphfor defining multi-node agentic pipelines - State management across graph nodes using
TypedDict - Conditional edges and early exits in directed graphs
- Subgraph composition (main pipeline + per-article strategy subgraph)
- LLM prompt design for structured JSON extraction
- Multi-step reasoning: extract → enrich → verdict
- LangGraph
-
Python Programming
- Abstract base classes and strategy pattern
- TypedDict for structured state definitions
- SQLite caching to avoid redundant LLM calls and HTTP fetches
- Environment variable management with
python-dotenv - Modular project layout for maintainability
-
Financial Data Analysis
- Understanding press release events (equity offerings, partnerships, etc.)
- Fetching public market data with
yfinance - Computing financial ratios (dilution %, discount to close, deal % of market cap)
- Anchoring market data to a pre-event timestamp (T-1) to avoid look-ahead bias
-
Data Engineering
- ETL pipeline design (scrape → parse → cache → enrich → classify → output)
- SQLite caching strategies (article bodies + signal records)
- JSON output design with nested structured fields
- Scrapes public press releases from GlobeNewswire
- Fetches public market data from Yahoo Finance via
yfinance
- Build URL — construct a GlobeNewswire search URL from configured filters
- Scrape — paginate through search results, collect article stubs
- Fetch — retrieve full article bodies (SQLite-cached to avoid re-fetching)
- Extract — LLM reads each press release and outputs structured facts (ticker, event type, deal size, etc.)
- Enrich —
yfinancefetches pre-event market data anchored to T-1 (prev close, 52-week range, avg volume, market cap) - Verdict — LLM combines extracted facts and market context to produce a 7-point signal
- Output — single annotated JSON file with all articles and their signal records
speculative_bearish → bearish → mildly_bearish → neutral → mildly_bullish → bullish → speculative_bullish
- Python 3.11 — core language
- LangGraph — agentic pipeline orchestration
- LangChain Core — LLM abstractions
- GlobeNewswire — public press releases
- Yahoo Finance (
yfinance) — public market data - QGenie / OpenRouter — LLM providers (API key required, configured in
.env)
- SQLite — local caching for article bodies and signal records
- BeautifulSoup4 — HTML parsing
- python-dotenv — environment variable management
release_evaluation_langgraph/
│
├── CONFIG.py ← all user settings (edit this to change anything)
├── run.py ← entry point
├── graph.py ← main LangGraph StateGraph definition
├── state.py ← PipelineState and ArticleSignalState TypedDicts
│
├── nodes/ ← one file per graph node
│ ├── build_url.py
│ ├── scrape_pages.py
│ ├── fetch_bodies.py
│ ├── save_articles.py
│ ├── run_signals.py
│ └── merge_output.py
│
├── signal_strategies/
│ ├── base.py ← BaseSignalStrategy abstract class
│ ├── __init__.py ← strategy registry
│ └── default/
│ ├── strategy.py ← DefaultSignalStrategy (LangGraph subgraph)
│ ├── STRATEGY.md ← taxonomy, schema, and signal logic docs
│ ├── extractor.py ← Step 1: LLM extraction
│ ├── enricher.py ← Step 2: yfinance market data
│ └── verdict.py ← Step 3: LLM verdict
│
├── utils/ ← infrastructure (not strategy-specific)
│ ├── url_builder.py
│ ├── article_scraper.py
│ ├── article_fetcher.py
│ ├── article_cache.py ← SQLite cache for article bodies
│ ├── signal_cache.py ← SQLite cache for signal records
│ ├── llm_client.py ← unified LLM abstraction (QGenie / OpenRouter)
│ └── utils.py
│
├── filter_mapping.json ← maps human labels to GlobeNewswire URL codes
└── export/ ← output JSON files (created at runtime, gitignored)
START
│
▼
build_url_node Resolves date range, builds GlobeNewswire search URL
│
▼
scrape_pages_node Paginates through results → article stubs
│
├─[no stubs]──► END
│
▼
fetch_bodies_node Fetches full article bodies (SQLite-cached)
│
▼
save_articles_node Saves raw articles to export/
│
▼
run_signals_node Loops over articles → calls signal strategy per article
│
▼
merge_output_node Merges signal records into JSON, overwrites file
│
▼
END
Each article runs through a signal strategy subgraph:
extract_node → [no ticker? → skip] → enrich_node → verdict_node → save_signal_node
- Python 3.11+
- Git
- Virtual environment (recommended)
- API key for QGenie or OpenRouter
-
Clone the repository
git clone <repo-url> cd release_evaluation_langgraph
-
Create virtual environment
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies
pip install langgraph langchain-core requests beautifulsoup4 yfinance python-dotenv openai pip install qgenie # if using QGenie -
Configure environment
cp .env.example .env # Edit .env and fill in your API keys and model names -
Edit CONFIG.py
RUN_MODE = "backtest" # or "live" DATE_RANGE = ("2026-04-01", "2026-04-20") # for backtest LLM_PROVIDER = "qgenie" # or "openrouter"
-
Run the pipeline
python run.py
- JSON file:
export/articles_YYYY-MM-DD_HHMMSS_<mode>.json
The main graph is strategy-agnostic. To add your own:
- Create
signal_strategies/my_strategy/strategy.pyimplementingBaseSignalStrategy - Create
signal_strategies/my_strategy/STRATEGY.mddocumenting your logic - Register it in
signal_strategies/__init__.py - Set
SIGNAL_STRATEGY = "my_strategy"inCONFIG.py
Zero changes to graph.py, nodes/, or state.py.
- All data is publicly available
- No proprietary or confidential information
- Respects rate limits and public API terms
- Educational purposes only
- Not financial advice
- No commercial use
- Personal learning project
- No CCI (Confidential Company Information)
- No PII (Personally Identifiable Information)
- No API keys or credentials in code (all in
.env, gitignored) - No internal company systems, endpoints, or infrastructure references
- Building multi-step LLM agentic pipelines with LangGraph
- Structuring a strategy pattern for pluggable LLM logic
- Implementing SQLite caching for both HTTP responses and LLM outputs
- Avoiding look-ahead bias when enriching with market data
- Designing JSON output schemas for downstream evaluation
- Understanding press release event categories (dilutive equity, partnerships, earnings, etc.)
- How offering discounts and dilution percentages relate to short-term price movement
- Pre-event vs. post-event data boundaries in signal generation
- Separating infrastructure (
utils/) from strategy-specific logic (signal_strategies/) - Keeping all configuration in one place (
CONFIG.py) - Using environment variables for credentials — never in code
- Documenting signal taxonomy and decision rules in
STRATEGY.md
- Step 4: Evaluator — fetch T+1/T+2/T+5 closes, score signal accuracy
- Parallel article processing for faster runs
- Additional signal strategies (earnings, M&A, partnerships)
- Unit tests with pytest
- REST API wrapper (FastAPI)
- Visualization of signal distribution and accuracy
- Not Financial Advice — this project is for learning LLM concepts and data pipeline design
- Public Data Only — all data sources are publicly available
- Educational Purpose — built in free time for skill development
- No Guarantees — signals are experimental outputs of an LLM prompt, not validated predictions
- Personal Project — not affiliated with any organization
Status: Active Learning Project
Purpose: Educational — LLM orchestration, LangGraph, agentic pipelines, financial data analysis