Merged
Conversation
Both paths iterated p.workers map in randomized Go order, so claims
and reaps were non-deterministic. That churned worker nodes — a new
query might claim a worker on a freshly-provisioned node with a cold
cache-proxy, and an idle reap might kill the worker on the longest-
lived node (best cache hit rate).
Track first-seen time per node in the pool (nodeFirstSeen map, populated
when a worker pod reports Ready with a Spec.NodeName) and use it to
order both paths:
- findIdleWorkerLocked: prefer idle worker on the oldest-seen node,
so sessions land on the node with the most warmed-up NVMe parquet
pages.
- reapIdleWorkers: sort reap candidates by nodeFirstSeen desc so
excess idle workers are evicted from the newest-seen nodes first,
letting Karpenter consolidate those and keeping the old nodes warm.
Workers with no known nodeName (informer race on first few ms after
spawn) sort as 'new' — get reaped first, claimed last — which is the
safe direction.
No RBAC changes. Per-CP state, no cross-CP coordination needed (store
already prevents over-spawn; the local idleCount > minWorkers gate
prevents over-reap). Map is pruned on last-worker-retire per node.
Tests cover: stamp idempotency, empty-node guard, prune correctness,
oldest-node-wins claim, unknown-node fallback, newest-node-first reap,
minWorkers floor, idleTimeout gate.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Both worker scheduling paths in `K8sWorkerPool` iterate `p.workers` map in Go's randomized order, so claims and reaps are non-deterministic. That churns nodes: a new query may land on a cold-cache node while an idle reap may kill the oldest, warmest one — defeating the NVMe cache proxy we just rolled out.
Fix: track the first time this CP saw a worker on each node (`nodeFirstSeen map[string]time.Time`, stamped when a pod reports Ready with `Spec.NodeName`) and use it to order both paths:
Workers missing a nodeName (informer race on the first few ms after spawn) sort as "new" — reaped first, claimed last — which is the safe direction.
Design choices
Test plan
🤖 Generated with Claude Code