Problem
Iterating yfd.get(sym, 'price') across all ~7,700 symbols takes 5+ minutes even when fully cached. Each call reads a separate parquet file from disk, deserializes it, and returns a DataFrame. For screener/scanner workflows that need OHLCV for the entire universe, this is the bottleneck.
Current workflow
for sym in yfd.symbols(): # 7,679 iterations
df = yfd.get(sym, 'price') # one parquet read per symbol
# compute indicators...
~5 minutes just for the load step. The actual computation is seconds.
Suggestion
Provide a bulk access path for price data, similar to how screener and info are stored as single parquet-bulk files.
Options (any would help):
-
yfd.get_all('price') → single DataFrame with a symbol column, stored as one partitioned parquet file. One read, one deserialize.
-
yfd.get_all('price') → dict[str, DataFrame] loaded from a single concatenated parquet, split in memory.
-
Pre-built universe parquet at ~/.cache/yfd/price_all.parquet generated by yfd.sync(), containing all symbols stacked. Refreshed on sync.
The screener bulk format already proves this pattern works. Price data is the most common access pattern for quantitative workflows and would benefit the most from bulk loading.
Context
Building weekly relative strength scanners that rank all US stocks. The scanner needs Close + Volume for the full universe on every run. Current per-symbol access makes iteration slow despite data being local.
Problem
Iterating
yfd.get(sym, 'price')across all ~7,700 symbols takes 5+ minutes even when fully cached. Each call reads a separate parquet file from disk, deserializes it, and returns a DataFrame. For screener/scanner workflows that need OHLCV for the entire universe, this is the bottleneck.Current workflow
~5 minutes just for the load step. The actual computation is seconds.
Suggestion
Provide a bulk access path for price data, similar to how
screenerandinfoare stored as single parquet-bulk files.Options (any would help):
yfd.get_all('price')→ single DataFrame with asymbolcolumn, stored as one partitioned parquet file. One read, one deserialize.yfd.get_all('price')→ dict[str, DataFrame] loaded from a single concatenated parquet, split in memory.Pre-built universe parquet at
~/.cache/yfd/price_all.parquetgenerated byyfd.sync(), containing all symbols stacked. Refreshed on sync.The screener bulk format already proves this pattern works. Price data is the most common access pattern for quantitative workflows and would benefit the most from bulk loading.
Context
Building weekly relative strength scanners that rank all US stocks. The scanner needs Close + Volume for the full universe on every run. Current per-symbol access makes iteration slow despite data being local.