Skip to content

[Start] SSR memory leak under sustained load — two retention paths (queryClient gcTime + native stream buffers) #7402

@cantutar

Description

@cantutar

Which project does this relate to?

Start

Describe the bug

Under sustained SSR load on a route that uses Promise.all([...ensureQueryData]) + a fire-and-forget prefetchInfiniteQuery in the loader, memory grows monotonically and the process eventually OOMs at the V8 heap limit. Through bisection I've isolated this to two separate retention paths, one in v8 heap and one in native memory:

1. V8 heap leak — fixed by gcTime: 0

With default queryClient config (gcTime: 300_000), each per-SSR-request queryClient is retained for ~5min after the request completes (presumably by its own unobserved-query gcTime timers), accumulating dehydrated state. Under crawler-style load (~30-60 req/min in our prod, 50 concurrent in local repro) heap grows from baseline 33 MB to 2.4 GB in ~5 minutes and OOMs.

Setting gcTime: 0 on the queryClient flattens this completely — heap stays at ~375-570 MB across a 5-minute 50-concurrent autocannon run instead of climbing 33 → 2424 MB.

2. Native buffer leak — still present

After applying the gcTime fix, V8 heap is flat, but external / arrayBuffers (process.memoryUsage) grow uncapped:

mid-test  rss=2255  heapUsed=375  heapTotal=542  external=1507  arrayBuffers=1473
later     rss=2837  heapUsed=479  heapTotal=624  external=1923  arrayBuffers=1887
end       rss=3327  heapUsed=569  heapTotal=698  external=2379  arrayBuffers=2329

Concurrent with this, the SSR stream lifetime watchdog in @tanstack/router-core/src/ssr/transformStreamWithRouter.ts (lines 162 and 351) repeatedly fires:

SSR stream transform exceeded maximum lifetime (60000ms), forcing cleanup

That cleanup path clears the local string buffers (pendingRouterHtml, leftover, pendingClosingTags) but already-enqueued Uint8Array chunks survive — they've been pushed into the consumer pipeline, and if the HTTP response is slow to drain (common under sustained concurrent load), those buffers pile up in external memory.

Key evidence that nails the orphan: when I stopped the load test and the server went fully idle, [mem] readings froze at the exact same values across 10+ consecutive 60-second samples (rss=3419 heapUsed=345 external=2088 arrayBuffers=2061). The buffers aren't slow-draining, they're orphaned.

Complete minimal reproducer

https://playerse.net/

Steps to Reproduce the Bug

Production reproducer (Playerse, https://playerse.net — game stats tracker, op.gg / blitz.gg style for TFT and Valorant). Source is a closed commercial project so I can't ship a public minimal repro, but the loader shape is the documented streaming-SSR pattern from the TanStack Start docs.

Steps to Reproduce the Bug or Issue

In a TanStack Start app with a route shaped like:

export const Route = createFileRoute("/profile/$id")({
  loader: async ({ context, params }) => {
    const [profile] = await Promise.all([
      context.queryClient.ensureQueryData(profileOptions(params.id)),
      context.queryClient.ensureQueryData(rankTiersOptions()),
      context.queryClient.ensureQueryData(agentsOptions()),
      context.queryClient.ensureQueryData(mapsOptions()),
      context.queryClient.ensureQueryData(gamemodesOptions()),
      context.queryClient.ensureQueryData(weaponsOptions()),
    ]);
    context.queryClient.prefetchInfiniteQuery(matchHistoryInfiniteOptions(profile.puuid));
    return { id: params.id };
  },
  // pendingComponent / errorComponent / notFoundComponent / component as usual
});

Run prod build and hit a single URL hard:

NODE_OPTIONS="--max-old-space-size=1500" node server.mjs
autocannon -c 50 -d 300 "http://localhost:3001/<route>"

Log process.memoryUsage() every 60s — heapUsed climbs monotonically until OOM at the max-old-space limit.

Workload-independence proof: In our local repro, the dev backend doesn't have RSO/auth wired up, so the first ensureQueryData(profileOptions(...)) call returns a 403 (PROFILE_IS_PRIVATE) in ~2ms. The loader catches that and returns a { isPrivate: true } sentinel — i.e. the heavy 3.6 MB asset-metadata payload is never dehydrated. Backend logs during the load test show only:

1× GET /api/auth/get-session → 200, ~1ms
1× POST /rpc/valorant/player/search → 403, ~2ms

That's ~3ms of backend work per request, no heavy data flowing. Yet heapUsed still grows 33MB → 2424MB in 5 minutes and OOMs. So this isn't payload-size driven — it's per-request lifecycle retention.

Cross-runtime evidence: Our backend (Hono + oRPC) and queue workers run a different runtime (Bun, not Node) on the same data shapes and sit happily at 180-200 MB steady-state. Only the Node 24 + TanStack Start frontend shows this growth pattern, which makes the Node + TanStack Start interaction the suspect surface.

Expected behavior

rss / heapUsed should plateau under sustained load, not climb monotonically until OOM.

After load drops off, external / arrayBuffers should drain rather than stay frozen.

Screenshots or Videos

12-hour memory chart from production (Docker Beszel), with default gcTime:

Image Image Image Image

[mem] time series from local autocannon run with gcTime: 0 applied (showing flat V8 heap but growing external):

Image Image Image

Platform

  • @tanstack/react-start@1.167.42
  • @tanstack/react-router@1.168.23
  • @tanstack/react-router-ssr-query@1.166.11
  • @tanstack/react-query@5.99.2
  • vite@8.0.9, @vitejs/plugin-react@6
  • React 19.2, Node 24-alpine
  • Custom srvx host (not Nitro), ssr.noExternal: true in vite config

Additional context

Heap snapshot diff (two snapshots ~6 min apart during sustained load):

class              t0       t1       Δ
─────────────────────────────────────────
Router             30       174      5.8x
QueryCache         30       174      5.8x
Query              422      2,391    5.7x
RequestInstance    29       172      5.9x
Request            30       174      5.8x
(string)           314 K    1,491 K  4.7x
V8 heap total      92 MB    260 MB   +168 MB
Native typedArr.   17 MB    17 MB    0

All per-request infra (Router / QueryCache / Query / RequestInstance / Request) scales 1:1, which is what tipped us off to per-completed-request retention.

Retainer chains on the leaked Router objects point back to:

  • {get manifest} closure on the manifest object, or
  • AsyncContextFrame.table → RequestInstance → onShellReady closure (renderToPipeableStream)

Bisect summary (all runs: 50-concurrent autocannon, 5min, --max-old-space-size=1500/2560, same URL):

Variant V8 (heapUsed) Native (external) Outcome
baseline (default config) 33 → 2424 MB 28 → 138 MB OOM in ~5min
ssr: 'data-only' on the route 33 → 1429 MB 28 → 398 MB OOM in ~65s (worse)
gcTime: 0 on queryClient 375 → 569 MB (flat) 1 → 2329 MB no OOM, external grows

So:

  • gcTime: 0 cleanly flattens the V8 portion → indicates the v8 leak is queryClient retention.
  • ssr: 'data-only' rules out renderToPipeableStream/onShellReady as the primary V8 retainer.
  • The native portion is independent and survives both variants.

Heap snapshots (one near startup, one after sustained load) are available — happy to share via DM if useful, as they contain serialized in-flight request data we'd rather not put on a public issue.

Related: #5289, #6051. PR #5896 (Nov 2025) is in our version but doesn't fully resolve this under sustained load.

router.tsx setup we use (sentry/i18n trimmed for clarity):

export const getRouter = () => {
  const queryClient = new QueryClient({
    defaultOptions: {
      queries: { retry: shouldRetry, staleTime: 60_000 },
      // gcTime defaults to 300_000 = 5min, which is the v8 retainer
    },
  });
  const router = createRouter({
    routeTree,
    context: { queryClient },
    rewrite: { input, output }, // i18n url rewrites
    scrollRestoration: true,
    defaultPreload: "intent",
  });
  setupRouterSsrQueryIntegration({ router, queryClient });
  return router;
};

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions