Skip to content

Secure vLLM runtime (Implementation)#667

Open
sats-23 wants to merge 23 commits intoIBM:mainfrom
sats-23:vLLMImpl
Open

Secure vLLM runtime (Implementation)#667
sats-23 wants to merge 23 commits intoIBM:mainfrom
sats-23:vLLMImpl

Conversation

@sats-23
Copy link
Copy Markdown
Contributor

@sats-23 sats-23 commented Apr 24, 2026

-When Chatbot is loaded, user is prompted with this menu.
-Upon correct API key, user can proceed to use chatbot as usual
-Key remains valid until page refresh
-Wrong API key redirects to same prompt box expecting API key
image

-Similarly updated Swagger docs as well
image

-Curl requests with and without auth header when auth is enabled
image

}

// Log vLLM authentication status
if instructAPIKey, ok := values["instruct"].(map[string]any); ok {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think somehow we need to push this info to applications/workload, any application specific content in the orchestration is a big NO!

}

// Log vLLM authentication status
if instructAPIKey, ok := values["instruct"].(map[string]any); ok {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think somehow we need to push this info to applications/workload, any application specific content in the orchestration is a big NO!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +44 to +45
# @description API key for vLLM instruct service authentication. If empty, authentication is disabled. Provide a key to enable authentication.
apiKey: ""
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the format the key should be in?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plain text format, there is no validation on length or type of characters as of now

value: ibm-granite/granite-3.3-8b-instruct
- name: RERANKER_ENDPOINT
value: http://reranker-predictor:8080
value: http://reranker-predictor:8000
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why bringing this change? without spyre it runs on 8080

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted

Comment on lines +75 to +81
{{- if .Values.instruct.apiKey }}
- name: VLLM_INSTRUCT_API_KEY
valueFrom:
secretKeyRef:
name: vllm-instruct-api-key
key: apiKey
{{- end }}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a query, in case of OpenShift, dont we need to use Service Account?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, this is a static injection similar to opensearch-credentials

Copy link
Copy Markdown
Member

@mayuka-c mayuka-c Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No I meant with RHOAI you could use Token-Based Authentication right?
This uses Seervice Account and you mount the token on service whichever needs communicate with models right?

I havent tried, but I do remember an option of enabling auth while deploying models in RHOAI dashboard

Copy link
Copy Markdown
Member

@mkumatag mkumatag Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yes, OpenShift AI does support an additional layer through some operators, such as authorino. Perhaps you could reach out to the OpenShift AI team to learn more about that approach? However, it would certainly involve more work like install those additional operator part of the application deployment with an additional specs etc..

I would vote for get started with this in this release and explore the operator way be next release(or add option shift support next release)

@sats-23 sats-23 force-pushed the vLLMImpl branch 2 times, most recently from 733b81c to a22d60c Compare April 24, 2026 08:31
@sats-23 sats-23 marked this pull request as ready for review April 24, 2026 09:16
@sats-23 sats-23 force-pushed the vLLMImpl branch 3 times, most recently from 4d0981b to d706a72 Compare April 24, 2026 09:25
@mkumatag
Copy link
Copy Markdown
Member

@sats-23 how are we exposing the api key to the user if they want to consume the key post running?

@sats-23
Copy link
Copy Markdown
Contributor Author

sats-23 commented Apr 27, 2026

@sats-23 how are we exposing the api key to the user if they want to consume the key post running?

Currently we are passing the API key and consuming it both from backend server and vLLM pod side.
Since the /v1/chat/completion or other endpoints to the user are exposed via backend server, the logic will handle passing the api key in auth header by default.

So, currently, there is no exposing/showing api key to user. Only in case they want to consume directly from vLLM pod (pod network interface), they will have to pass the api key which is not the general practice in our app.

Comment thread spyre-rag/src/common/llm_utils.py Outdated
value: "{{ .Values.opensearch.auth.username }}"
- name: OPENSEARCH_PASSWORD
value: "{{ .Values.opensearch.auth.password }}"
{{- if .Values.instruct.apiKey }}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not required for similarity service, since instruct is not used by this service.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, removed

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many methods use /v1/chat/completion API, please analyse all the usage of instruct LLM and add auth header.

  • Table summary
  • Summarize (stream & non-stream)
  • Q&A stream query

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, PTAL

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, there is a pass through auth for digitize and summarize API.
Only chatbot UI and backend requires user to mention API Key, in all other cases, the key is automatically propagated

@dharaneeshvrd
Copy link
Copy Markdown
Member

how are we exposing the api key to the user if they want to consume the key post running?

There is no neat way to get this now. But it would be great if we can get the values configured for the current deployment via a cli command.

@sats-23
Copy link
Copy Markdown
Contributor Author

sats-23 commented Apr 27, 2026

how are we exposing the api key to the user if they want to consume the key post running?

There is no neat way to get this now. But it would be great if we can get the values configured for the current deployment via a cli command.

I assume we should also explain to the user that they need to use it only in case of access vLLM within pod networking.
Else they may get confused and use it against the backend server

@sats-23 sats-23 marked this pull request as draft April 27, 2026 08:16
@sats-23 sats-23 marked this pull request as ready for review April 28, 2026 09:01
@sats-23
Copy link
Copy Markdown
Contributor Author

sats-23 commented Apr 28, 2026

@dharaneeshvrd @mkumatag @manju956
The approach has been updated, PTAL

sats-23 added 17 commits April 30, 2026 14:44
Signed-off-by: Sathvik <Sathvik.S@ibm.com>
Signed-off-by: Sathvik <Sathvik.S@ibm.com>
Signed-off-by: Sathvik <Sathvik.S@ibm.com>
Signed-off-by: Sathvik <Sathvik.S@ibm.com>
Signed-off-by: Sathvik <Sathvik.S@ibm.com>
Signed-off-by: Sathvik <Sathvik.S@ibm.com>
Signed-off-by: Sathvik <Sathvik.S@ibm.com>
Signed-off-by: Sathvik <Sathvik.S@ibm.com>
Signed-off-by: Sathvik <Sathvik.S@ibm.com>
Signed-off-by: Sathvik <Sathvik.S@ibm.com>
Signed-off-by: Sathvik <Sathvik.S@ibm.com>
Signed-off-by: Sathvik <Sathvik.S@ibm.com>
Signed-off-by: Sathvik <Sathvik.S@ibm.com>
Signed-off-by: Sathvik <Sathvik.S@ibm.com>
Signed-off-by: Sathvik <Sathvik.S@ibm.com>
Signed-off-by: Sathvik <Sathvik.S@ibm.com>
Signed-off-by: Sathvik <Sathvik.S@ibm.com>
Signed-off-by: Sathvik <Sathvik.S@ibm.com>
Comment thread spyre-rag/src/digitize/app.py Outdated
import uvicorn

from fastapi import FastAPI, UploadFile, File, HTTPException, BackgroundTasks, Query, status, Request
from fastapi import FastAPI, UploadFile, File, HTTPException, BackgroundTasks, Query, status, Request, Header
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment thread spyre-rag/src/chatbot/response_utils.py Outdated
}
}
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

pranithraoibm
pranithraoibm previously approved these changes Apr 30, 2026
Signed-off-by: Sathvik <Sathvik.S@ibm.com>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sry, please remove the changes here too

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

mkumatag
mkumatag previously approved these changes Apr 30, 2026
Signed-off-by: Sathvik <Sathvik.S@ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants