Cache-locality-aware worker claim + reap by benben · Pull Request #451 · PostHog/duckgres

benben · 2026-04-23T11:16:33Z

Summary

Both worker scheduling paths in `K8sWorkerPool` iterate `p.workers` map in Go's randomized order, so claims and reaps are non-deterministic. That churns nodes: a new query may land on a cold-cache node while an idle reap may kill the oldest, warmest one — defeating the NVMe cache proxy we just rolled out.

Fix: track the first time this CP saw a worker on each node (`nodeFirstSeen map[string]time.Time`, stamped when a pod reports Ready with `Spec.NodeName`) and use it to order both paths:

`findIdleWorkerLocked` — prefer the idle worker on the oldest-seen node so sessions land where the NVMe cache is warmest.
`reapIdleWorkers` — sort reap candidates by nodeFirstSeen desc so excess idle workers are evicted from the newest-seen nodes first, letting Karpenter consolidate those and keeping old warm nodes alive longer.

Workers missing a nodeName (informer race on the first few ms after spawn) sort as "new" — reaped first, claimed last — which is the safe direction.

Design choices

Per-CP state, no coordination. Each CP only decides about workers it owns. The shared store already prevents over-spawn; the local `idleCount > minWorkers` gate already prevents over-reap. Cross-CP "oldest node absolutely" would need a shared store round-trip or a cluster-scoped `Node.Get` (extra RBAC) for marginal benefit.
No new k8s API calls, no new RBAC. nodeName comes from the pod the informer already watches; no `Nodes().Get` needed.
Map is pruned when the last worker on a given nodeName retires (O(N) scan inside pool mutex, N = worker count per CP, small).

Test plan

8 new unit tests in `k8s_pool_test.go` covering:
- `stampNodeFirstSeen` idempotency + empty-node guard
- `pruneNodeFirstSeen` only drops entries with no remaining worker
- claim prefers oldest node, falls back cleanly when nodeName is empty
- reap picks newest-node first
- reap stops at minWorkers
- reap skips workers still within idleTimeout
Full `go test -tags kubernetes ./controlplane/` passes
After merge: watch `duckgres_org_workers_active` by node and Karpenter node churn — expect fewer long-lived nodes getting recycled, more newest nodes evicted.

🤖 Generated with Claude Code

Both paths iterated p.workers map in randomized Go order, so claims and reaps were non-deterministic. That churned worker nodes — a new query might claim a worker on a freshly-provisioned node with a cold cache-proxy, and an idle reap might kill the worker on the longest- lived node (best cache hit rate). Track first-seen time per node in the pool (nodeFirstSeen map, populated when a worker pod reports Ready with a Spec.NodeName) and use it to order both paths: - findIdleWorkerLocked: prefer idle worker on the oldest-seen node, so sessions land on the node with the most warmed-up NVMe parquet pages. - reapIdleWorkers: sort reap candidates by nodeFirstSeen desc so excess idle workers are evicted from the newest-seen nodes first, letting Karpenter consolidate those and keeping the old nodes warm. Workers with no known nodeName (informer race on first few ms after spawn) sort as 'new' — get reaped first, claimed last — which is the safe direction. No RBAC changes. Per-CP state, no cross-CP coordination needed (store already prevents over-spawn; the local idleCount > minWorkers gate prevents over-reap). Map is pruned on last-worker-retire per node. Tests cover: stamp idempotency, empty-node guard, prune correctness, oldest-node-wins claim, unknown-node fallback, newest-node-first reap, minWorkers floor, idleTimeout gate.

benben requested a review from a team April 23, 2026 11:26

benben merged commit 40c2809 into main Apr 23, 2026
21 checks passed

benben deleted the ben/node-age-aware-scheduling branch April 23, 2026 11:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache-locality-aware worker claim + reap#451

Cache-locality-aware worker claim + reap#451
benben merged 1 commit intomainfrom
ben/node-age-aware-scheduling

benben commented Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

benben commented Apr 23, 2026

Summary

Design choices

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant