Skip to content

Source Data Deep Dive

Syed Ibrahim Omer edited this page Apr 13, 2026 · 1 revision

Source Data (Deep Dive)

This page documents how source_data() in src/indicators.py turns Yahoo Finance history into per-ticker Polars lazy frames.

Entry point

source_data(tickers, period, timeframe)
  • tickers: list of symbols (after run_main parsing).
  • period: a single period string for this call (e.g. 1y, 5y).
  • timeframe: either a Yahoo interval string (1d, 1wk, …) or a dict mapping period → interval when using a timeframe JSON file.

Yahoo Finance request

  • Uses yf.Tickers(tickers).history(period=period, interval=...).
  • If timeframe is a dict, the interval used is timeframe[period] (same period as the outer loop in run_main).

Pandas → Polars

  • The result is reset_index() then converted with pl.from_pandas(..., schema_overrides=schema).lazy().
  • On conversion failure, the function prints and returns [] (empty list), not an empty lazy frame.

Multi-ticker column layout

yfinance returns a MultiIndex column layout for multiple tickers. The code:

  1. Collects schema names from the lazy frame.
  2. For each requested ticker, checks for a column named like ('Close', 'AAPL') (string form in schema: ('Close', 'AAPL')).
  3. If that column is missing, the ticker is skipped with a log line.
  4. For valid tickers, it selects only the columns that exist and renames them to simple names: Close, High, Low, Open, Volume, Dividends, Stock Splits, plus Date.

Return shape

Returns a list of dict “packages”, one per valid ticker:

Field Meaning
data LazyFrame with columns Date, Close, … (before lowercasing in calculate_indicators)
ticker symbol
period the period string used for this fetch

Performance notes

  • collect_schema() may be called multiple times per ticker during validation and selection (cost scales with schema complexity).
  • One combined history() call fetches all tickers for that (period, interval) pair.

Failure modes

Symptom Likely cause
Empty return list Conversion exception, or no tickers had Close data
Some tickers missing Delisted, bad symbol, or no data for that period/interval

Related pages:

Clone this wiki locally