Data trustworthiness: staleness watchdog, honest coverage, self-verification + health#1
Conversation
The two primary Hyperliquid sockets (orderbook l2Book, orderflow trades) connected without a heartbeat, so a half-open TCP connection threw no error, never reconnected, and kept serving a frozen orderbook / frozen CVD as if it were live. There was no 'no data in N seconds = stale' guard anywhere. - heartbeat=20 on both bare HL ws_connect calls (matches every other socket) - per-feed last_message_at + data_age()/is_stale() on OrderBookEngine and OrderFlowEngine (HL + Binance tracked separately) - OrderBookSnapshot gains a stale flag, recomputed at read time - orderbook self-heals: its snapshot loop force-closes an alive-but-silent socket past 2x the threshold to trigger the backoff reconnect - hub._update_feed_staleness() (run each 1s status tick, live mode only) flips orderflow_engine/orderbook_feed status to 'stale' and force-reconnects the HL trade socket when both venues go silent - liquidations intentionally not aged out (sporadic by nature)
…tic) The README claimed 'every liquidation event as it happens,' but the feed is not a complete census: Binance's !forceOrder stream is throttled to ~1 liq/symbol/sec at the source, Bybit only subscribes to the top-15 symbols, and Hyperliquid has no liquidation feed at all (events are inferred from large trades, carrying confirmed=False). - exchange_coverage() documents each venue's collection method + caveat - get_stats() now returns coverage, confirmed_count, heuristic_count, and a per-exchange 'method' tag - BYBIT_SYMBOL_LIMIT / HL_LIQUIDATION_MIN_USD constants replace magic numbers; misleading '# Confirmed' source comments corrected - liquidation stream marks Hyperliquid rows as estimated (~ / ?) with a caption, and tags sampled/heuristic venues in the per-exchange totals - README rewritten to state real coverage and link DATA_INTEGRITY.md
Writes were committed only every 50 events with no atexit/signal flush, so a SIGKILL/OOM lost everything since the last commit — and because _event_count is a single shared counter across all tables, a low-frequency table (funding, LSR, options) could sit uncommitted for many minutes behind high-frequency trades. - _maybe_commit() commits at 50 events OR every COMMIT_INTERVAL_SECONDS (5s), bounding worst-case loss on an uncatchable crash to ~5s and flushing slow tables promptly; replaces all seven fixed every-50 commit sites - atexit handler flushes on graceful exit / unhandled exception / Ctrl-C - corruption-recovery path no longer assumes self._conn was assigned (guarded via a local handle) — fixes a latent AttributeError if the first connect fails - kept startup PRAGMA integrity_check (Terminal won't hit the multi-GB scale that made it a problem in the internal repo)
The terminal displayed data but never proved it. Added a DataHealthMonitor that
cross-references live hub data against public exchange APIs (Binance spot price
<0.5%, long/short ratio, Deribit DVOL) and checks per-feed staleness, freshness,
completeness, and consistency.
- src/data_layer/health_monitor.py: reusable monitor producing a structured
{overall, counts, checks} result; thresholds tuned to what the Terminal
actually collects (verified live: 230 assets priced)
- hub runs it every 30s (live mode only) and caches the result on self.health
- /v1/health now returns data_health + per-feed status; top-level status flips
to 'degraded' when stale/drifting (server still up)
- combined dashboard's hardcoded green 'LIVE' badge replaced with a dynamic
✓ LIVE / ⚠ STALE / ⚠ DRIFT / ⚠ DEGRADED indicator driven by the monitor
- src/verify_data.py: one-shot CLI reusing the same monitor (non-zero exit on
any failure) — verified end-to-end live: 10/10 checks pass
The order-flow engine fed Hyperliquid and Binance trades into the same buckets and the same cumulative_cvd, so 'BTC CVD' was a silent HL+Binance sum. With no trade dedup (unlike the liquidation feed), a reconnect/resubscribe replay double-counted into the cumulative figure, which never resets — drift over a long session. - per-venue bounded seen-ID sets (HL tid, Binance aggTrade id) skip replays - cumulative CVD tracked per venue (cumulative_cvd_hl/_binance) plus the combined series; get_cumulative_cvd(symbol) exposes all three - documented the exchange-event-time vs wall-clock skew in _expire_old
Funding snapshots stamped local time.time() instead of the exchange's event time, injecting local-clock + fetch latency into cross-source alignment. - Binance premiumIndex: use the per-item 'time' field (ms), fall back to local - Bybit v5 tickers: use the response envelope 'time' (per-ticker rows have none) - spot_prices: Binance /ticker/price returns no timestamp, so local fetch time is documented as intentional rather than silently misleading
Bybit only subscribed to DEFAULT_SYMBOLS[:15] because Bybit v5 caps args per subscribe request — so liquidations on the other ~35 tracked symbols never arrived. Batch the subscription in chunks of 10 instead of truncating; symbols without a Bybit linear perp just get a harmless per-topic error reply. Coverage note updated to match.
- stop() re-awaited self._task inside the task loop and set it to None on the first iteration, so the second iteration awaited None (TypeError: None is not awaitable) and never cleaned up the Binance task. Cancel each task once, then null both handles; init _binance_task in __init__. - removed unused 'field' import - CLAUDE.md claimed 'no synthetic/demo mode' while hub.py still carries a full demo path; documented demo as a dev/preview-only path (product runs live, both entry points pass demo=False, health monitor disabled under demo)
- tests/test_data_integrity.py: staleness watchdog (incl. combined-age uses freshest venue), orderbook stale flag, CVD dedup + per-venue split, and health monitor verdict classification (stale outranks drift) - fixed the two pre-existing funding-rate tests: they expected the raw 8h rate as 'hourly', but the collector correctly normalizes to hourly (rate/8) and annualizes that — updated expectations to match the correct math - docs/DATA_INTEGRITY.md documents coverage caveats, the staleness watchdog, the verification checks + tolerances, durability, and timestamps - README gains a Self-Verifying Data feature row; CLAUDE.md documents verify_data
Adversarial review (.roast/REPORT-latest.md) findings: - H1: API defaulted to host 0.0.0.0 with no auth + CORS '*' while docs claimed localhost-only. Default to 127.0.0.1; opt into LAN via HYPERDATA_API_HOST. - H2: CI ran ruff + ast.parse + import smoke-tests but never pytest — the whole suite was invisible. Added a 'Run tests' step. - M2: LLM agent read liq_stats['count']/['volume_usd'] (nonexistent keys), so the model always saw '0 liquidations'. Use total_count/total_volume_usd. - L4: /v1/public/metrics uptime was always 0 (self._start_time never set) — use the hub's tracked uptime_seconds.
…tals, LLM crash More adversarial-review findings (.roast/REPORT-latest.md): - M1/M6: trade sampling keyed off the shared _event_count (bumped by 6 event types), so the '1-in-2' sample was biased and get_trade_summary's x2 rescale wrong. Added a dedicated, lock-protected _trade_count; all counter mutation now inside the lock. - M3: no retention/pruning anywhere — DB + WAL grew forever and the periodic COUNT(*) under the lock degraded the hot path. Added DataStore.prune() (delete > RETENTION_DAYS old + wal_checkpoint TRUNCATE), run hourly from the hub. - M7: liquidation totals + the cascade alert blended confirmed and HL-heuristic events. Added confirmed_volume_usd/heuristic_volume_usd; cascade alert now uses confirmed volume only. - F821 (real latent crash): llm_agent used asyncio.TimeoutError but only imported asyncio inside other functions — any LLM timeout raised NameError. Hoisted the import to module scope. - CI: lint step made non-blocking (pre-existing repo-wide ruff debt) so the new pytest gate actually runs; flagged for a focused lint-cleanup follow-up. - tests: unbiased-sampling, prune, and confirmed-vs-heuristic-volume regression tests added.
- SIGTERM handling: run_api.py only caught KeyboardInterrupt, so 'kill <pid>' (what process managers send) skipped hub.stop() and could drop unflushed writes + leave sockets open. Added SIGINT/SIGTERM handlers that trigger a clean shutdown. - M4: numeric query params (limit/minutes/min_size/threshold) were int()/float()'d raw, so '?limit=abc' raised an uncaught 500 without CORS headers. Added _int_param/_float_param helpers that return 400 on bad input and clamp to sane bounds; applied to all 9 sites. - docs: documented the loopback-default API bind, retention/pruning, and SIGTERM in DATA_INTEGRITY.md.
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
There was a problem hiding this comment.
What this PR does: Makes the terminal's data trustworthy by adding a staleness watchdog (heartbeat + per-feed is_stale), honest liquidation coverage tags, crash‑durable persistence, a live data‑health monitor (BTC price, L/S, DVOL vs Binance/Deribit) exposed at /v1/health, per-venue CVD with dedup, and loopback‑only API binding. Also fixes biased trade sampling, unbounded DB growth, cascade alerts using confirmed volume, and other adversarial‑review issues.
Risk areas:
- Health monitor BTC price cross‑reference uses a tight 0.5% tolerance comparing hub perp price to Binance spot; can falsely trigger DRIFT alarms in volatile markets.
- Health monitor’s funding‑vs‑LSR consistency check may compare funding from one venue against Binance LSR, producing misleading warnings.
- Staleness watchdog force‑reconnects the orderflow HL socket when data_age() is infinite (e.g., persistent outage), which could cause log/noise storms.
Verdict:⚠️ Minor concerns
|
|
||
| # External reference endpoints (public, no key). | ||
| BINANCE_SPOT_PRICE = "https://api.binance.com/api/v3/ticker/price" | ||
| BINANCE_PREMIUM_INDEX = "https://fapi.binance.com/fapi/v1/premiumIndex" |
There was a problem hiding this comment.
🟠 P1 (High): The 0.5% tolerance (PRICE_TOLERANCE_PCT) comparing the hub’s likely perp‑market BTC price against Binance spot is extremely tight. During high volatility the basis can easily exceed 0.5%, causing the health check to fail and the dashboard badge to show ⚠ DRIFT even when no actual data fault exists. This undermines the trustworthiness promise. Consider widening the tolerance or computing it dynamically (e.g., based on typical basis spreads).
| for c in checks: | ||
| counts[c.status] = counts.get(c.status, 0) + 1 | ||
|
|
||
| # Badge semantics: a stale feed is the most urgent signal (frozen data), |
There was a problem hiding this comment.
🟡 P2 (Medium): The consistency check funding_vs_lsr compares hub.market.assets["BTC"].funding_rate (source unknown, could be a different exchange) against Binance’s globalLongShortAccountRatio. These may reflect different markets and can naturally diverge, leading to noisy WARN results that confuse users about data integrity. Restrict the check to funding rates from the same venue as the LSR, or remove it.
|
Codex Review: Didn't find any major issues. Already looking forward to the next diff. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
…y, reconnect noise Resolves the reviewer's three concerns on the data-health monitor: - P1 (false DRIFT): the BTC price cross-reference compared the hub's PERP price to Binance SPOT at a tight 0.5% tolerance, so normal perp/spot basis tripped DRIFT in volatile markets. Now compares perp-to-perp (Binance premiumIndex markPrice) with a tiered tolerance: <0.5% pass, <2% warn, >=2% fail — only a real feed break flips the badge. - P2 (cross-venue noise): funding_vs_lsr compared HL funding against Binance LSR. Now uses Binance funding vs Binance LSR (same venue, confirmed Binance source) and skips near-zero funding that carries no directional signal. - Reconnect noise storm: the HL order-flow force-reconnect could fire every status tick during a persistent outage (and during initial connect, when data_age is infinite). Guarded on 'had data then stopped' + debounced to once per cooldown. Verified live: 10 checks run, price diff 0.012% perp-to-perp, no false DRIFT.
What & why
Makes the terminal's data trustworthy, not just live. The core problem this addresses: the Terminal could silently show frozen data as live, overstated liquidation completeness it didn't have, and had no way to prove its numbers against external sources. (The internal sibling repo's "verified against external sources" reputation rested entirely on a one-shot script the Terminal didn't even ship.)
Changes (R1–R4)
Trust integrity
heartbeat=20; the two bare Hyperliquid sockets (orderbook, orderflow) no longer silently freeze. Per-feedlast_message_at/is_stale(); orderbook force-reconnects an alive-but-silent socket; the hub flips feed status tostaleand kicks the HL trade socket when both venues go quiet.exchange_coverage()tags each venueconfirmed/sampled/heuristic;get_stats()returns coverage + confirmed/heuristic counts & volume. README no longer claims "every liquidation" (Binance is throttled at the source, HL is a large-trade heuristic). Bybit widened 15→50 symbols (batched, not truncated).COMMIT_INTERVAL_SECONDS=5) +atexitflush bound worst-case loss onkill -9; corruption-recovery path no longer assumes a connection.Verification + visibility
DataHealthMonitorcross-references hub data vs Binance/Deribit (BTC price <0.5%, L/S, DVOL) + freshness/completeness/consistency; runs every 30s. Surfaced at/v1/healthand as a dashboard badge (✓ LIVE / ⚠ STALE / ⚠ DRIFT).src/verify_data.pyis a CLI wrapper (verified live: 10/10 pass).Correctness depth
tid, Binance aggId) so reconnect replays can't double-count; per-venue cumulative CVD so "BTC CVD" isn't a silent HL+Binance sum.OrderFlowEngine.stop()double-await fixed.Docs/tests —
docs/DATA_INTEGRITY.md; newtests/test_data_integrity.py; fixed 2 pre-existing funding tests (they expected an un-normalized rate).Adversarial review + fixes
Ran an adversarial review (saved under
.roast/) and fixed its findings:0.0.0.0with no auth → default to127.0.0.1, opt into LAN viaHYPERDATA_API_HOST=0.0.0.0._trade_count.prune()+ WAL checkpoint.llm_agentusedasynciowithout importing it at module scope (any LLM timeout →NameError).run_api.py;/v1/public/metricsuptime fixed.Judgment calls (please veto if you disagree)
Lintset tocontinue-on-error: true— the repo has ~97 pre-existingrufferrors (mostly E501/I001/F401) unrelated to this work, so the Lint step was already failing. Tests now gate the build; recommend a separate focused lint-cleanup PR, then dropcontinue-on-error.Verification
pytest: 71 passed, 2 skipped.verify_data.py: 10/10 live./v1/health+ SIGTERM tested live.Not in this PR
assets/worth considering (recommendation relayed separately) — intentionally left out per your "don't replace now."