docs(prod): warn about $(nproc) on Kubernetes; add Downward API guidance by Anai-Guo · Pull Request #28 · BerriAI/litellm-docs

Anai-Guo · 2026-04-27T23:20:15Z

Summary

The "Match Uvicorn Workers to CPU Count" section in the production docs recommends --num_workers $(nproc) for Kubernetes deployments. nproc is incorrect on Kubernetes — it reports the host node's CPU count, not the pod's cgroup CPU request or limit.

Concrete failure mode (from #26620):

Setting	Value
Node	`e2-standard-16` (16 vCPU)
Pod `requests.cpu`	`4`
Pod `limits.cpu`	`8`
`$(nproc)` returns	16
Workers spawned	16
Workers the pod can actually run	4–8

This causes CPU oversubscription, context-switching overhead, and worse latency — the opposite of what the section is trying to achieve.

Fix

Restructure section 3 of docs/proxy/prod.md:

Warning admonition explaining the cgroup mismatch with a concrete example.
Recommended: Kubernetes Downward API recipe that exposes requests.cpu as $CPU_REQUEST in the container env, then uses --num_workers ${CPU_REQUEST:-4} in the CMD.
Alternative: hardcoded --num_workers 4 for users who prefer not to wire up the Downward API.
Bare-metal / VM only: keep the original $(nproc) CMD, scoped to environments where the process actually has access to all host CPUs.

The two follow-up shell blocks (--max_requests_before_restart and --run_gunicorn variants) are updated to use the same ${CPU_REQUEST:-4} pattern so they stay consistent with the recommended K8s setup.

Files changed

docs/proxy/prod.md — section 3 only

Why this is correct

The Downward API approach is the standard Kubernetes idiom for surfacing pod-level resource requests/limits to the container at runtime. It's documented at https://kubernetes.io/docs/tasks/inject-data-application/environment-variable-expose-pod-information/.

The ${CPU_REQUEST:-4} default keeps the container bootable in non-K8s environments (local Docker, etc.) where the env var won't be injected.

🤖 Generated with Claude Code

The $(nproc) recommendation in section 3 is incorrect on Kubernetes because it returns the host node's CPU count, not the pod's cgroup CPU request/limit. On a 16-vCPU node with a pod that has requests.cpu=4 and limits.cpu=8, $(nproc) returns 16, spawning 2-4x more Uvicorn workers than the pod can actually run. The result is CPU oversubscription, worse latency, and the opposite of the stated goal. Replace the K8s nproc examples with: - A warning explaining the cgroup mismatch with a concrete example. - A Downward API recipe (recommended) that exposes requests.cpu as $CPU_REQUEST in the container env, then uses --num_workers ${CPU_REQUEST:-4} in the CMD. - A hardcoded --num_workers fallback for users who prefer it. - A bare-metal/VM section that keeps the original nproc CMD, scoped to environments where the process actually has access to all host CPUs (no cgroup limit). The two follow-up shell blocks (--max_requests_before_restart and --run_gunicorn variants) are updated to use the same ${CPU_REQUEST:-4} pattern so they stay consistent with the recommended K8s setup. Fixes BerriAI/litellm#26620

vercel · 2026-04-27T23:20:20Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Apr 27, 2026 11:21pm

Anai-Guo requested a review from a team April 27, 2026 23:20

Anai-Guo mentioned this pull request Apr 27, 2026

Docs: nproc in Kubernetes num_workers recommendation returns host CPU count, not pod limit BerriAI/litellm#26620

Open

vercel Bot deployed to Preview April 27, 2026 23:21 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(prod): warn about $(nproc) on Kubernetes; add Downward API guidance#28

docs(prod): warn about $(nproc) on Kubernetes; add Downward API guidance#28
Anai-Guo wants to merge 1 commit intoBerriAI:mainfrom
Anai-Guo:fix/k8s-nproc-docs

Anai-Guo commented Apr 27, 2026

Uh oh!

vercel Bot commented Apr 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Anai-Guo commented Apr 27, 2026

Summary

Fix

Files changed

Why this is correct

Uh oh!

vercel Bot commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented Apr 27, 2026 •

edited

Loading