[Feat][Router] Support multiple API keys for multi-tenant authentication (#759)#937
[Feat][Router] Support multiple API keys for multi-tenant authentication (#759)#937prashansapkota wants to merge 3 commits intovllm-project:mainfrom
Conversation
…roject#888) Signed-off-by: prashansapkota <prashan.sapkota3456@gmail.com>
…on (vllm-project#759) Previously, vllmApiKey only accepted a single string, silently discarding any keys after the first comma. This prevented teams from sharing a vLLM deployment with independent credentials. Changes: - helm/values.yaml: extend vllmApiKey schema to accept string, list, or existing-secret object. List entries are joined as a comma-separated string before being stored in the Kubernetes Secret, which matches the format vLLM's --api-key flag natively expects. - helm/templates/secrets.yaml: add kindIs "slice" branch to join a list of keys with commas before base64-encoding into the Secret. - helm/templates/deployment-vllm-multi.yaml, helm/templates/deployment-router.yaml: extend the kindIs "string" guard to also match "slice" so list-typed keys are correctly mapped to the VLLM_API_KEY env var from the generated Secret. - src/vllm_router/auth.py (new): self-contained auth module with _parse_api_keys(), get_allowed_api_keys(), and verify_api_key() FastAPI dependency. Reads VLLM_API_KEY, splits on commas, strips whitespace, and returns HTTP 401 for missing or invalid Bearer tokens. No-ops when VLLM_API_KEY is unset. - src/vllm_router/app.py: wire verify_api_key as a router-level dependency on main_router so every inference endpoint is protected without modifying individual route handlers. - src/tests/test_multi_api_key_auth.py (new): 24 tests covering key parsing, env var reading, and the FastAPI dependency (valid key, multi-key, invalid key → 401, missing header → 401, wrong scheme → 401, whitespace tolerance, no-auth passthrough). Signed-off-by: prashansapkota <prashan.sapkota3456@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request introduces support for multiple API keys to enable multi-tenant access and addresses the issue of stale Prometheus metrics. Key changes include updating Helm templates to handle vllmApiKey as either a string or a list, implementing a new authentication module that parses multiple keys from environment variables, and adding a mechanism to clear label-based gauges in the metrics router. Additionally, a healthy field was added to EndpointInfo for better status tracking. Feedback was provided to improve the robustness of the Authorization header check in the authentication dependency.
| if not allowed_keys: | ||
| return | ||
|
|
||
| auth_header = request.headers.get("Authorization", "") |
There was a problem hiding this comment.
The request.headers.get call defaults to an empty string, but auth_header.startswith will fail if the header is missing or empty. While the logic handles this, it is cleaner to check for the presence of the header explicitly.
| auth_header = request.headers.get("Authorization", "") | |
| auth_header = request.headers.get("Authorization") | |
| if not auth_header or not auth_header.startswith("Bearer "): |
|
Addressed the review feedback:
|
…tartswith Signed-off-by: prashansapkota <prashan.sapkota3456@gmail.com>
9ac7ddd to
174b606
Compare
|
This is great, thanks. Would love to see this. |
Summary
Closes #759
Previously,
vllmApiKeyonly accepted a single string value. Because vLLM's--api-keyflag natively supports a comma-separated list of keys, users tried passing"key1,key2,key3"or a YAML list — but the Helm chart silently discarded everything after the first key, and the router had no enforcement layer at all. This blocked multi-tenant deployments where different teams need independent credentials to share the same vLLM instance.This PR fixes the full stack. From Helm values through to the router's request validation.
What Changed
Helm (infrastructure layer)
helm/values.yamlvllmApiKeyschema annotation fromtype:[string, object]totype:[string, array, object]helm/templates/secrets.yamlkindIs "slice"branch: whenvllmApiKeyis a YAML list, entries are joined with commas (join ",") before being base64-encoded into the Kubernetes Secret. The Secret always stores a single comma-separated string, the format vLLM natively expects for--api-key.helm/templates/deployment-vllm-multi.yamlandhelm/templates/deployment-router.yamlkindIs "string"guard toor (kindIs "string") (kindIs "slice")so that list-typed keys also correctly resolve theVLLM_API_KEYenv var from the generated Secret.Router (application layer)
src/vllm_router/auth.py(new)_parse_api_keys(raw): splits a comma-separated string into afrozenset, stripping whitespace and ignoring empty segmentsget_allowed_api_keys(): readsVLLM_API_KEYfrom the environment at request timeverify_api_key(request): FastAPI dependency that:VLLM_API_KEYis unset (preserves backward compatibility for unauthenticated deployments)HTTP 401if theAuthorizationheader is missing or not aBearertokenHTTP 401if the token is not in the allowed setsrc/vllm_router/app.pyverify_api_keyas a router-levelDependsonmain_router: every inference endpoint (/v1/chat/completions,/v1/completions,/v1/embeddings, etc.) is protected automatically without touching individual route handlers.Tests
src/tests/test_multi_api_key_auth.py(new, 24 tests)_parse_api_keysget_allowed_api_keysverify_api_keyHow It Works End-to-End
Backward Compatibility
vllmApiKeyset: no change:verify_api_keyno-ops{secretName, secretKey}: no changeTest Plan
pytest src/tests/test_multi_api_key_auth.py(24 tests pass)pytest src/tests/test_stale_metrics.py src/tests/test_static_service_discovery.py src/tests/test_utils.py. All existing tests pass (51 total)black,isort. All modified files pass linting