Skip to content

docs(prod): warn about $(nproc) on Kubernetes; add Downward API guidance#28

Open
Anai-Guo wants to merge 1 commit intoBerriAI:mainfrom
Anai-Guo:fix/k8s-nproc-docs
Open

docs(prod): warn about $(nproc) on Kubernetes; add Downward API guidance#28
Anai-Guo wants to merge 1 commit intoBerriAI:mainfrom
Anai-Guo:fix/k8s-nproc-docs

Conversation

@Anai-Guo
Copy link
Copy Markdown

Summary

Closes BerriAI/litellm#26620.

The "Match Uvicorn Workers to CPU Count" section in the production docs recommends --num_workers $(nproc) for Kubernetes deployments. nproc is incorrect on Kubernetes — it reports the host node's CPU count, not the pod's cgroup CPU request or limit.

Concrete failure mode (from #26620):

Setting Value
Node e2-standard-16 (16 vCPU)
Pod requests.cpu 4
Pod limits.cpu 8
$(nproc) returns 16
Workers spawned 16
Workers the pod can actually run 4–8

This causes CPU oversubscription, context-switching overhead, and worse latency — the opposite of what the section is trying to achieve.

Fix

Restructure section 3 of docs/proxy/prod.md:

  1. Warning admonition explaining the cgroup mismatch with a concrete example.
  2. Recommended: Kubernetes Downward API recipe that exposes requests.cpu as $CPU_REQUEST in the container env, then uses --num_workers ${CPU_REQUEST:-4} in the CMD.
  3. Alternative: hardcoded --num_workers 4 for users who prefer not to wire up the Downward API.
  4. Bare-metal / VM only: keep the original $(nproc) CMD, scoped to environments where the process actually has access to all host CPUs.

The two follow-up shell blocks (--max_requests_before_restart and --run_gunicorn variants) are updated to use the same ${CPU_REQUEST:-4} pattern so they stay consistent with the recommended K8s setup.

Files changed

  • docs/proxy/prod.md — section 3 only

Why this is correct

The Downward API approach is the standard Kubernetes idiom for surfacing pod-level resource requests/limits to the container at runtime. It's documented at https://kubernetes.io/docs/tasks/inject-data-application/environment-variable-expose-pod-information/.

The ${CPU_REQUEST:-4} default keeps the container bootable in non-K8s environments (local Docker, etc.) where the env var won't be injected.

🤖 Generated with Claude Code

The $(nproc) recommendation in section 3 is incorrect on Kubernetes
because it returns the host node's CPU count, not the pod's cgroup
CPU request/limit. On a 16-vCPU node with a pod that has
requests.cpu=4 and limits.cpu=8, $(nproc) returns 16, spawning
2-4x more Uvicorn workers than the pod can actually run. The result
is CPU oversubscription, worse latency, and the opposite of the
stated goal.

Replace the K8s nproc examples with:
- A warning explaining the cgroup mismatch with a concrete example.
- A Downward API recipe (recommended) that exposes requests.cpu
  as $CPU_REQUEST in the container env, then uses
  --num_workers ${CPU_REQUEST:-4} in the CMD.
- A hardcoded --num_workers fallback for users who prefer it.
- A bare-metal/VM section that keeps the original nproc CMD,
  scoped to environments where the process actually has access
  to all host CPUs (no cgroup limit).

The two follow-up shell blocks (--max_requests_before_restart
and --run_gunicorn variants) are updated to use the same
${CPU_REQUEST:-4} pattern so they stay consistent with the
recommended K8s setup.

Fixes BerriAI/litellm#26620
@Anai-Guo Anai-Guo requested a review from a team April 27, 2026 23:20
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 27, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Apr 27, 2026 11:21pm

Request Review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Docs: nproc in Kubernetes num_workers recommendation returns host CPU count, not pod limit

1 participant