For backtesters: what's missing from the historical Polymarket data that you'd actually use? #4

manja316 · 2026-05-25T06:51:20Z

manja316
May 25, 2026
Maintainer

The screener is the surface, but the thing behind it is a snapshot pipeline — 10.8M+ rows across 13,963 markets, refreshed daily. I want to ask people who actually backtest prediction-market strategies what fields they keep wishing were there.

What the snapshots currently capture (per market per refresh):

price (yes-side mid)
volume cumulative + volume_24h
liquidity (Gamma's number, not orderbook-derived)
one_day_change, one_hour_change, one_week_change
closed, archived, active flags
end_date, category, tags
outcome_prices (full multi-outcome where applicable)

What I know is missing and have not (yet) added:

Order-book depth at multiple levels. Currently zero. Would need to hit CLOB per market, expensive at 13k markets but maybe doable for a curated top-N by liquidity.
Trade-by-trade tape. Not snapshotted — only aggregated volumes. Without the tape you can't reconstruct VWAP or detect single-fill spikes.
Resolution outcome + timestamp for closed markets. We have closed=true but not always the resolved outcome cleanly joined back to the historical snapshots, so survival-bias-aware backtests are awkward.
News/event tagging. Markets that moved 20% in an hour — was there a tweet, a court ruling, an earnings print? Currently zero linkage.
Funding-rate / borrow analogues. Polymarket doesn't have these, but the cost-of-carry equivalent (capital lockup until resolution) is computable from end_date + price and we don't expose it as a field.

Question for anyone running models on prediction-market data:

Which of (1)–(5) would change what you can backtest vs just being nice-to-have?
Is there a 6th thing I'm not listing that you've had to scrape yourself?
If you could only add one field per snapshot row, what would it be?

The full historical pull (SQLite + CSV) is on Gumroad — $9, freely redistributable for research. The screener stays free. Answers here genuinely shape what the next refresh adds, so be specific.

Methodology background on the existing crash-signal column lives in Discussion #2 if useful context.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

For backtesters: what's missing from the historical Polymarket data that you'd actually use? #4

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

For backtesters: what's missing from the historical Polymarket data that you'd actually use? #4

Uh oh!

manja316 May 25, 2026 Maintainer

Replies: 0 comments

manja316
May 25, 2026
Maintainer