GlobeNewswire LLM Signal Pipeline — Learning Project

Personal Learning Project: This is an educational project created in my free time to learn about LLM orchestration, agentic pipelines, and financial data analysis. All data used is publicly available.

Project Overview

This project is a hands-on learning exercise to understand:

LangGraph for building multi-step LLM agentic workflows
LLM orchestration — chaining extraction, enrichment, and reasoning steps
Prompt engineering for structured data extraction and signal generation
Data pipeline design with SQLite caching and JSON output
Financial data analysis using public market data

Disclaimer: This is purely for educational purposes. Not financial advice. All data is from public sources.

Learning Objectives

What I'm Learning

LLM Concepts & Frameworks
- LangGraph StateGraph for defining multi-node agentic pipelines
- State management across graph nodes using TypedDict
- Conditional edges and early exits in directed graphs
- Subgraph composition (main pipeline + per-article strategy subgraph)
- LLM prompt design for structured JSON extraction
- Multi-step reasoning: extract → enrich → verdict
Python Programming
- Abstract base classes and strategy pattern
- TypedDict for structured state definitions
- SQLite caching to avoid redundant LLM calls and HTTP fetches
- Environment variable management with python-dotenv
- Modular project layout for maintainability
Financial Data Analysis
- Understanding press release events (equity offerings, partnerships, etc.)
- Fetching public market data with yfinance
- Computing financial ratios (dilution %, discount to close, deal % of market cap)
- Anchoring market data to a pre-event timestamp (T-1) to avoid look-ahead bias
Data Engineering
- ETL pipeline design (scrape → parse → cache → enrich → classify → output)
- SQLite caching strategies (article bodies + signal records)
- JSON output design with nested structured fields

What This Project Does

Data Collection (Public Sources Only)

Scrapes public press releases from GlobeNewswire
Fetches public market data from Yahoo Finance via yfinance

Processing Pipeline

Build URL — construct a GlobeNewswire search URL from configured filters
Scrape — paginate through search results, collect article stubs
Fetch — retrieve full article bodies (SQLite-cached to avoid re-fetching)
Extract — LLM reads each press release and outputs structured facts (ticker, event type, deal size, etc.)
Enrich — yfinance fetches pre-event market data anchored to T-1 (prev close, 52-week range, avg volume, market cap)
Verdict — LLM combines extracted facts and market context to produce a 7-point signal
Output — single annotated JSON file with all articles and their signal records

Signal Scale

speculative_bearish → bearish → mildly_bearish → neutral → mildly_bullish → bullish → speculative_bullish

Technical Stack

Languages & Frameworks

Python 3.11 — core language
LangGraph — agentic pipeline orchestration
LangChain Core — LLM abstractions

APIs & Data Sources

GlobeNewswire — public press releases
Yahoo Finance (yfinance) — public market data
QGenie / OpenRouter — LLM providers (API key required, configured in .env)

Infrastructure

SQLite — local caching for article bodies and signal records
BeautifulSoup4 — HTML parsing
python-dotenv — environment variable management

Project Architecture

release_evaluation_langgraph/
│
├── CONFIG.py                           ← all user settings (edit this to change anything)
├── run.py                              ← entry point
├── graph.py                            ← main LangGraph StateGraph definition
├── state.py                            ← PipelineState and ArticleSignalState TypedDicts
│
├── nodes/                              ← one file per graph node
│   ├── build_url.py
│   ├── scrape_pages.py
│   ├── fetch_bodies.py
│   ├── save_articles.py
│   ├── run_signals.py
│   └── merge_output.py
│
├── signal_strategies/
│   ├── base.py                         ← BaseSignalStrategy abstract class
│   ├── __init__.py                     ← strategy registry
│   └── default/
│       ├── strategy.py                 ← DefaultSignalStrategy (LangGraph subgraph)
│       ├── STRATEGY.md                 ← taxonomy, schema, and signal logic docs
│       ├── extractor.py                ← Step 1: LLM extraction
│       ├── enricher.py                 ← Step 2: yfinance market data
│       └── verdict.py                  ← Step 3: LLM verdict
│
├── utils/                              ← infrastructure (not strategy-specific)
│   ├── url_builder.py
│   ├── article_scraper.py
│   ├── article_fetcher.py
│   ├── article_cache.py                ← SQLite cache for article bodies
│   ├── signal_cache.py                 ← SQLite cache for signal records
│   ├── llm_client.py                   ← unified LLM abstraction (QGenie / OpenRouter)
│   └── utils.py
│
├── filter_mapping.json                 ← maps human labels to GlobeNewswire URL codes
└── export/                             ← output JSON files (created at runtime, gitignored)

LangGraph Pipeline Flow

START
  │
  ▼
build_url_node         Resolves date range, builds GlobeNewswire search URL
  │
  ▼
scrape_pages_node      Paginates through results → article stubs
  │
  ├─[no stubs]──► END
  │
  ▼
fetch_bodies_node      Fetches full article bodies (SQLite-cached)
  │
  ▼
save_articles_node     Saves raw articles to export/
  │
  ▼
run_signals_node       Loops over articles → calls signal strategy per article
  │
  ▼
merge_output_node      Merges signal records into JSON, overwrites file
  │
  ▼
END

Each article runs through a signal strategy subgraph:

extract_node → [no ticker? → skip] → enrich_node → verdict_node → save_signal_node

Getting Started

Prerequisites

Python 3.11+
Git
Virtual environment (recommended)
API key for QGenie or OpenRouter

Installation

Clone the repository

git clone <repo-url>
cd release_evaluation_langgraph

Create virtual environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies

pip install langgraph langchain-core requests beautifulsoup4 yfinance python-dotenv openai
pip install qgenie          # if using QGenie

Configure environment

cp .env.example .env
# Edit .env and fill in your API keys and model names

Edit CONFIG.py

RUN_MODE        = "backtest"                       # or "live"
DATE_RANGE      = ("2026-04-01", "2026-04-20")    # for backtest
LLM_PROVIDER    = "qgenie"                         # or "openrouter"

Run the pipeline
```
python run.py
```

Output

JSON file: export/articles_YYYY-MM-DD_HHMMSS_<mode>.json

Adding a New Signal Strategy

The main graph is strategy-agnostic. To add your own:

Create signal_strategies/my_strategy/strategy.py implementing BaseSignalStrategy
Create signal_strategies/my_strategy/STRATEGY.md documenting your logic
Register it in signal_strategies/__init__.py
Set SIGNAL_STRATEGY = "my_strategy" in CONFIG.py

Zero changes to graph.py, nodes/, or state.py.

Legal & Ethical Considerations

Data Sources

All data is publicly available
No proprietary or confidential information
Respects rate limits and public API terms

Usage

Educational purposes only
Not financial advice
No commercial use
Personal learning project

Compliance

No CCI (Confidential Company Information)
No PII (Personally Identifiable Information)
No API keys or credentials in code (all in .env, gitignored)
No internal company systems, endpoints, or infrastructure references

What I Learned

Technical Skills

Building multi-step LLM agentic pipelines with LangGraph
Structuring a strategy pattern for pluggable LLM logic
Implementing SQLite caching for both HTTP responses and LLM outputs
Avoiding look-ahead bias when enriching with market data
Designing JSON output schemas for downstream evaluation

Domain Knowledge

Understanding press release event categories (dilutive equity, partnerships, earnings, etc.)
How offering discounts and dilution percentages relate to short-term price movement
Pre-event vs. post-event data boundaries in signal generation

Best Practices

Separating infrastructure (utils/) from strategy-specific logic (signal_strategies/)
Keeping all configuration in one place (CONFIG.py)
Using environment variables for credentials — never in code
Documenting signal taxonomy and decision rules in STRATEGY.md

Future Enhancements (Learning Goals)

Step 4: Evaluator — fetch T+1/T+2/T+5 closes, score signal accuracy
Parallel article processing for faster runs
Additional signal strategies (earnings, M&A, partnerships)
Unit tests with pytest
REST API wrapper (FastAPI)
Visualization of signal distribution and accuracy

Important Notes

Not Financial Advice — this project is for learning LLM concepts and data pipeline design
Public Data Only — all data sources are publicly available
Educational Purpose — built in free time for skill development
No Guarantees — signals are experimental outputs of an LLM prompt, not validated predictions
Personal Project — not affiliated with any organization

Status: Active Learning Project
Purpose: Educational — LLM orchestration, LangGraph, agentic pipelines, financial data analysis

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
nodes		nodes
signal_strategies		signal_strategies
utils		utils
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CONFIG.py		CONFIG.py
Legal.md		Legal.md
README.md		README.md
TODO		TODO
filter_mapping.json		filter_mapping.json
graph.py		graph.py
run.py		run.py
state.py		state.py

Folders and files

Latest commit

History

Repository files navigation

GlobeNewswire LLM Signal Pipeline — Learning Project

Project Overview

Learning Objectives

What I'm Learning

What This Project Does

Data Collection (Public Sources Only)

Processing Pipeline

Signal Scale

Technical Stack

Languages & Frameworks

APIs & Data Sources

Infrastructure

Project Architecture

LangGraph Pipeline Flow

Getting Started

Prerequisites

Installation

Output

Adding a New Signal Strategy

Legal & Ethical Considerations

Data Sources

Usage

Compliance

What I Learned

Technical Skills

Domain Knowledge

Best Practices

Future Enhancements (Learning Goals)

Important Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages