d3mocide · d3mocide · Jun 13, 2026 · Jun 13, 2026 · Jun 13, 2026 · Jun 13, 2026
diff --git a/agent_docs/tasks/2026-06-13-fresh-client-load-followups.md b/agent_docs/tasks/2026-06-13-fresh-client-load-followups.md
@@ -0,0 +1,97 @@
+# Fresh-Client Load Follow-ups: Map Asset Preload + News Feed SWR
+
+Follow-up to the WebSocket snapshot work (`2026-06-13-fresh-client-snapshot-replay.md`).
+Two further fresh-client slowness sources: the map JS bundle waterfall and the
+dashboard's slow text feeds.
+
+## Issue
+
+1. **Map asset waterfall.** The default view is `TACTICAL` → `TacticalMap`,
+   which depends on the `deck-gl` (~1 MB) and map-engine (~1 MB MapLibre /
+   ~1.7 MB Mapbox) vendor chunks. Since the views are lazy-loaded and the entry
+   was deliberately stopped from preloading vendors (`f32c30d`), a cold-cache
+   client only discovers these chunks **after** the entry + App chunks download,
+   parse, and the dynamic import fires — a multi-hop request waterfall before the
+   default view can paint.
+2. **Dashboard text feeds slow to load.** `GET /api/news/feed` (NewsWidget)
+   fetched the 5 configured RSS feeds **sequentially**, each with a 10 s timeout
+   (up to ~50 s worst case), and the Redis cache was populated **lazily** by the
+   requesting client. So every 15-minute cache expiry made the next caller block
+   on the full upstream fetch. There is no news poller pre-warming the cache.
+
+## Solution
+
+1. **Hoist `modulepreload` hints for the critical map chunks.** A small build-only
+   Vite plugin (`mapCriticalPreloadPlugin`) injects
+   `<link rel="modulepreload">` into `index.html` for `deck-gl`, the active GL
+   engine, and the `TacticalMap` view chunk, so the browser fetches them in
+   parallel with the entry instead of serially after it. The engine is chosen at
+   build time to mirror `mapStyles.ts`: Mapbox when a valid `VITE_MAPBOX_TOKEN`
+   is set (+ `VITE_ENABLE_MAPBOX !== "false"`), MapLibre otherwise. Only the
+   default view's engine is preloaded — the globe-only MapLibre in a Mapbox build
+   still loads on demand. The cacheable-vendor split is otherwise unchanged.
+2. **Concurrent fetch + stale-while-revalidate for the news feed.**
+   - `_fetch_feeds` now fetches all sources with `asyncio.gather` (latency bounded
+     by the slowest single feed, not the sum) via a non-raising `_fetch_one`.
+   - The endpoint serves the cached payload immediately and, once it ages past the
+     15-minute freshness window, kicks off a **background** refresh
+     (`_trigger_refresh`) so callers never block on the upstream fetch. The data
+     is kept for `CACHE_HARD_TTL` (6 h) for stale serving; a `CACHE_FRESH_KEY`
+     marks freshness. Background refreshes are deduped within a worker (a held
+     task ref) and across workers (a Redis `SET NX` lock). Only a truly cold
+     cache (no data at all, or Redis down) fetches synchronously — and that fetch
+     is now concurrent.
+
+## Changes
+
+- **`frontend/vite.config.ts`**
+  - Switched to the function form of `defineConfig` to read build env via
+    `loadEnv` and pick the engine chunk.
+  - Added `mapCriticalPreloadPlugin(engineChunk)` (uses `transformIndexHtml` with
+    `ctx.bundle` to resolve hashed chunk filenames and inject preload links).
+- **`backend/api/routers/news.py`**
+  - Added `CACHE_FRESH_KEY`, `CACHE_REFRESH_LOCK`, `CACHE_HARD_TTL`,
+    `CACHE_REFRESH_LOCK_TTL`.
+  - Split `_fetch_feeds` into `_fetch_one` (per-feed, never raises) +
+    concurrent `gather`.
+  - Added `_store_feed`, `_refresh_and_release`, `_trigger_refresh`, and a module
+    `_refresh_task` ref.
+  - Rewrote `get_news_feed` for stale-while-revalidate.
+  - Added `warm_cache()` (non-blocking refresh delegating to the deduped
+    background refresh) and `prewarm_loop()` — a continuous pre-warmer that
+    refreshes on startup and then every `NEWS_PREWARM_INTERVAL`
+    (`NEWS_PREWARM_INTERVAL_SECONDS`, default 600 s — comfortably inside the
+    900 s freshness window so the cache is always warm even with no traffic).
+- **`backend/api/main.py`**
+  - Lifespan launches `news.prewarm_loop()` as a supervised background task
+    after `broadcast_service.start()` and cancels it on shutdown alongside the
+    historian / RF-cleanup tasks. The feed cache is therefore kept warm
+    independent of client traffic, so a fresh dashboard never blocks on the
+    upstream RSS fetch.
+- **`backend/api/tests/test_news_router.py`**
+  - Added tests: fresh cache served without refresh; stale cache served +
+    triggers refresh; cold cache fetches synchronously; `_fetch_feeds` merges +
+    sorts newest-first and strips `_ts`; `_trigger_refresh` NX-lock dedupe.
+
+## Verification
+
+- **Frontend** (`frontend`): `pnpm run typecheck` (covers `vite.config.ts`),
+  `pnpm run lint`, `pnpm run test` → 278 passed. `pnpm run build` succeeded;
+  `dist/index.html` now contains `modulepreload` links for `deck-gl`,
+  `maplibre` (no Mapbox token in this build → engine = MapLibre, and `mapbox`
+  is correctly *not* preloaded), and `TacticalMap`.
+- **Backend API** (`backend/api`): `ruff check` on changed files passed;
+  `pytest` full suite → 172 passed (was 167; +5 news tests).
+
+## Benefits
+
+- **Map paints sooner on a cold client**: the two multi-MB critical chunks and
+  the default view chunk download in parallel with the entry, collapsing the
+  discover-then-fetch waterfall — without reverting the cacheable vendor split
+  or preloading the unused engine.
+- **Dashboard text feeds load fast and stay fast**: concurrent fetching cuts
+  cold-cache latency from the sum of feed latencies to the slowest single feed,
+  and stale-while-revalidate means the periodic 15-minute cache expiry no longer
+  blocks a user — they get instant (slightly stale) data while a background
+  refresh runs. Background refreshes are deduped so a burst of clients triggers
+  at most one upstream fetch.
diff --git a/agent_docs/tasks/2026-06-13-fresh-client-snapshot-replay.md b/agent_docs/tasks/2026-06-13-fresh-client-snapshot-replay.md
@@ -0,0 +1,89 @@
+# Fresh-Client Snapshot Replay (Last-Value Cache)
+
+## Issue
+
+After the recent frontend rendering optimizations (cached static layers, lazy
+map loading), the map paints almost instantly — but on a **fresh client** it
+stays empty for a long time while entities trickle in. Aircraft, ships, and
+satellites only arrive over the live WebSocket (`/api/tracks/live`), and the
+broadcast consumer reads Kafka with `auto_offset_reset="latest"`. A late joiner
+therefore receives **no backlog** — it must wait for each poller to re-emit its
+next full sweep before the world populates. The orbital sweep alone is a
+~15–37 s cycle (≈11k satellites), so a fresh client can sit on a near-empty map
+for tens of seconds. The faster (now near-instant) render made this pre-existing
+gap glaringly obvious.
+
+## Solution
+
+Add a **last-value cache (LVC)** to `BroadcastManager` and replay it to every
+newly-connected client before live streaming begins.
+
+- As the Kafka consume loop transforms each message to its TAK frame, it also
+  stores the latest frame per `uid` in an in-memory cache, keyed by entity id
+  and stamped with a monotonic receive time. The cache is kept warm even when
+  no clients are connected, so it is ready the instant someone joins.
+- On WebSocket connect, the per-client worker first replays the current cache
+  (the "snapshot") directly to that client, then enters the normal live-stream
+  drain loop. Frames are sent on the existing one-frame-per-entity wire format,
+  so **the frontend needs no changes** — a snapshot frame is indistinguishable
+  from a live update, and the client's existing `lastSourceTime` de-dup guard
+  harmlessly ignores any overlap between snapshot and live deltas.
+- Stale entries (not re-emitted within `LIVE_SNAPSHOT_TTL_SECONDS`, default
+  300 s) are excluded from snapshots and periodically pruned; a hard cap
+  (`LIVE_SNAPSHOT_MAX_ENTITIES`, default 20 000) bounds memory.
+
+### Why direct send, not the live queue
+
+The per-client live queue is bounded at 256 messages (it intentionally drops
+oldest under back-pressure). A multi-thousand-entity snapshot pushed through it
+would be almost entirely dropped, so the snapshot is sent directly via
+`send_bytes` with the same 3 s per-frame timeout, yielding to the event loop
+every 256 frames so a large replay never starves the consume loop or other
+clients.
+
+## Changes
+
+- **`backend/api/core/config.py`**
+  - Added `LIVE_SNAPSHOT_TTL_SECONDS` (default 300) and
+    `LIVE_SNAPSHOT_MAX_ENTITIES` (default 20 000).
+- **`backend/api/services/broadcast.py`**
+  - Added `import time` and the `_LVC_PRUNE_INTERVAL_S` constant.
+  - `BroadcastManager.__init__`: added the `_lvc` cache and `_last_prune`.
+  - `_consume`: records every transformed frame into the LVC
+    (`_record_live`) before the early-out on zero clients.
+  - New helpers: `_record_live`, `_maybe_prune` (TTL sweep + hard cap),
+    `_snapshot_frames` (fresh frames, copied for safe concurrent iteration),
+    and `_send_snapshot` (direct, yielding, disconnect-aware replay).
+  - `_client_worker`: replays the snapshot before the live drain loop; bails
+    out cleanly if the client disconnects mid-snapshot.
+  - `stop()`: clears the cache.
+- **`backend/api/tests/test_broadcast_snapshot.py`** (new)
+  - Covers LVC population/overwrite, blank-uid rejection, TTL exclusion, prune
+    (stale drop + hard cap), and snapshot send (all frames, empty no-op,
+    mid-stream disconnect, stale exclusion).
+
+## Verification
+
+Run on host (`backend/api`):
+
+- `uv tool run ruff check services/broadcast.py core/config.py tests/test_broadcast_snapshot.py` → All checks passed.
+- `uv run python -m pytest tests/test_broadcast_snapshot.py tests/test_tracks_validation.py -q` → 16 passed.
+- `uv run python -m pytest -q` (full API suite) → 167 passed.
+
+No frontend changes were required, so frontend suites were not run (per the
+Targeted Verification rule).
+
+## Benefits
+
+- **Fresh clients paint the full picture immediately** instead of waiting up to
+  a full poller sweep (tens of seconds for satellites). The data-load latency a
+  late joiner perceives drops from "next sweep" to "one connect round-trip."
+- **Backend-only, wire-compatible**: no frontend changes, no proto/worker
+  changes, no new service or DB query on connect — the snapshot is served from
+  memory.
+- **Bounded and self-healing**: TTL + hard cap bound memory; stale entities
+  (landed aircraft, departed vessels) age out automatically and never appear in
+  a snapshot.
+- **Back-pressure safe**: the snapshot bypasses the bounded live queue and
+  yields regularly, so a large replay cannot starve the consume loop or slow
+  other connected clients.
diff --git a/backend/api/core/config.py b/backend/api/core/config.py
@@ -43,6 +43,15 @@ def DB_DSN(self) -> str:
     # Kafka
     KAFKA_BROKERS = os.getenv("KAFKA_BROKERS", "sovereign-redpanda:9092")
 
+    # Live-stream snapshot (last-value cache).
+    # A freshly-connected WebSocket client is replayed the current world state
+    # so the map paints immediately instead of waiting for each poller's next
+    # full sweep (the orbital sweep alone is a ~15-37 s cycle). Entities not
+    # re-emitted within the TTL are dropped from the snapshot; the hard cap
+    # bounds memory if the uid space ever runs away.
+    LIVE_SNAPSHOT_TTL_SECONDS = int(os.getenv("LIVE_SNAPSHOT_TTL_SECONDS", "300"))
+    LIVE_SNAPSHOT_MAX_ENTITIES = int(os.getenv("LIVE_SNAPSHOT_MAX_ENTITIES", "20000"))
+
     # Authentication
     # When AUTH_ENABLED=false all authentication checks are skipped (local dev only — NEVER in production).
     AUTH_ENABLED: bool = os.getenv("AUTH_ENABLED", "true").lower() not in (

diff --git a/backend/api/main.py b/backend/api/main.py
@@ -78,6 +78,7 @@ async def _historian_supervisor():
 # Global task handles
 historian_task_handle: asyncio.Task | None = None
 rf_cleanup_task_handle: asyncio.Task | None = None
+news_prewarm_task_handle: asyncio.Task | None = None
 
 
 @asynccontextmanager
@@ -86,7 +87,7 @@ async def lifespan(app: FastAPI):
     BUG-017: Replaced deprecated @app.on_event("startup") / @app.on_event("shutdown")
     decorators with the modern lifespan context manager pattern (FastAPI >= 0.93).
     """
-    global historian_task_handle, rf_cleanup_task_handle
+    global historian_task_handle, rf_cleanup_task_handle, news_prewarm_task_handle
     # --- Startup ---
     settings.validate()
     await db.connect()
@@ -121,12 +122,23 @@ async def lifespan(app: FastAPI):
     historian_task_handle = asyncio.create_task(_historian_supervisor())
     rf_cleanup_task_handle = asyncio.create_task(rf_sites_cleanup_task())
     await broadcast_service.start()
-    logger.info("Database, Redis, Historian, RF Cleanup, and Broadcast Service started")
+    # Continuously pre-warm the news feed cache in the background so a fresh
+    # dashboard always hits a warm cache instead of blocking on the upstream
+    # RSS fetch (refreshes on startup, then on an interval).
+    news_prewarm_task_handle = asyncio.create_task(news.prewarm_loop())
+    logger.info(
+        "Database, Redis, Historian, RF Cleanup, Broadcast Service, "
+        "and News Pre-warm started"
+    )
 
     yield
 
     # --- Shutdown ---
-    for handle in (historian_task_handle, rf_cleanup_task_handle):
+    for handle in (
+        historian_task_handle,
+        rf_cleanup_task_handle,
+        news_prewarm_task_handle,
+    ):
         if handle:
             handle.cancel()
             try: