-
Notifications
You must be signed in to change notification settings - Fork 65
LCORE-941: Use proper vLLM image for prow RHOAI tests #1014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Enabled GPU for vLLM model serving in the Prow e2e test environment. - Add GPU node tolerations to ensure pods are scheduled on GPU enabled nodes. - Use dynamic llama-stack image built from the LCS repository Containerfile. - Use latest vLLM image from the Red Hat template.
WalkthroughThis PR extends RHOAI end-to-end test infrastructure with GPU support by adding Nvidia GPU Operator and Node Feature Discovery manifests, GPU-enabled vLLM runtime/inference service, parameterizing container images, disabling OpenAI in favor of vLLM with Llama-3.2-1B-Instruct, and enhancing pipeline scripts with comprehensive GPU setup and verification. Changes
Sequence DiagramsequenceDiagram
participant Pipeline as Pipeline (Main)
participant Bootstrap as Bootstrap<br/>(Operators & OLM)
participant GPUSetup as GPU Setup<br/>(NFD & GPU Operator)
participant Cluster as Kubernetes<br/>Cluster
participant OLM as OLM<br/>Operator Hub
participant DeployVLLM as Deploy vLLM<br/>(GPU)
Pipeline->>Bootstrap: Run bootstrap.sh<br/>(apply operators)
Bootstrap->>Cluster: Apply OperatorGroups<br/>(nvidia-gpu-operator, openshift-nfd)
Bootstrap->>OLM: Create Subscriptions<br/>(gpu-operator, nfd)
OLM-->>Cluster: Create InstallPlan &<br/>Install Operators
Bootstrap-->>Pipeline: Operators ready
Pipeline->>GPUSetup: Run gpu-setup.sh
GPUSetup->>Cluster: Apply NFD DaemonSet manifest
GPUSetup->>Cluster: Wait for nfd-worker pods
GPUSetup->>Cluster: Patch NFD with GPU tolerations
GPUSetup->>Cluster: Apply GPU ClusterPolicy
GPUSetup->>Cluster: Wait for GPU operator pods
GPUSetup->>Cluster: Wait for GPU node labels<br/>(nvidia.com/gpu.present)
GPUSetup->>Cluster: Monitor GPU capacity
GPUSetup-->>Pipeline: GPU infrastructure ready
Pipeline->>DeployVLLM: Run deploy-vllm.sh
DeployVLLM->>Cluster: Wait for GPU nodes<br/>& capacity > 0
DeployVLLM->>Cluster: Apply vllm-runtime-gpu.yaml
DeployVLLM->>Cluster: Apply vllm-inference-service-gpu.yaml
DeployVLLM-->>Pipeline: vLLM service deployed
Pipeline->>Cluster: Verify service readiness<br/>& run API tests
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In `@tests/e2e-prow/rhoai/manifests/vllm/vllm-runtime-gpu.yaml`:
- Around line 17-21: In vllm-runtime-gpu.yaml remove the trailing spaces from
the vLLM CLI flags so Kubernetes doesn't pass malformed args: fix the args
entries that contain "--tool-call-parser " and "--chat-template " to be
"--tool-call-parser" and "--chat-template" (and ensure their corresponding
values like "llama3_json" and
"/mnt/chat-template/tool_chat_template_llama3.2_json.jinja" remain separate
array items or properly paired) so vLLM's argparse recognizes the flags and the
container can start.
In `@tests/e2e-prow/rhoai/scripts/bootstrap.sh`:
- Around line 56-80: The retry after recreating the subscription only checks
once after a 10s sleep, so reset the wait loop state and re-enter the same until
loop to give the recreated subscription the full timeout; specifically, inside
the branch where you delete and reapply the subscription (the block manipulating
subscription gpu-operator-certified and applying
"$BASE_DIR/manifests/operators/operators.yaml"), set elapsed=0 (or set a retried
flag and use continue) so execution jumps back into the existing until oc get
installplan -n nvidia-gpu-operator ... | grep -q . loop and retries for the full
timeout instead of a single one-time check.
- Around line 23-45: operatorgroup.yaml does not create the nvidia-gpu-operator
namespace, so calling oc wait namespace/nvidia-gpu-operator can fail; before
running oc wait or applying operatorgroup.yaml, apply the manifest that creates
the nvidia-gpu-operator namespace (the Namespace manifest that declares
metadata.name: nvidia-gpu-operator), ensure the namespace exists (e.g., oc apply
<namespace-manifest> and optionally oc wait --for=condition=Created
namespace/nvidia-gpu-operator), then continue to apply operatorgroup.yaml,
operators.yaml and the existing oc wait/operatorgroup polling logic.
In `@tests/e2e-prow/rhoai/scripts/deploy-vllm.sh`:
- Around line 32-78: The loop incorrectly checks only the first node via
jsonpath '.items[0].status.capacity.nvidia.com/gpu', causing false timeouts in
multi-node clusters; change the until condition and the capacity assignment to
inspect all nodes (e.g., use jsonpath that ranges over .items[*] or pipe to jq)
and succeed if any node has a non-empty, non-zero GPU value. Update the two oc
calls in the until condition and the capacity=$(...) assignment to collect all
GPU capacities (e.g., '{range .items[*]}{.status.capacity.nvidia\.com/gpu}{"
"}{end}') and test that at least one token != "" and != "0"; keep the existing
debug oc get nodes output unchanged.
In `@tests/e2e-prow/rhoai/scripts/gpu-setup.sh`:
- Around line 163-165: The loop uses an until condition that only checks
capacity via oc get nodes ... status.capacity.nvidia.com/gpu; update that
condition to also check allocatable (status.allocatable.nvidia.com/gpu) is
present and non-zero, so the until stops only when both capacity and allocatable
are non-empty and not "0". Modify the existing until expression (the oc get
nodes calls in the loop) to include the allocatable checks alongside the
capacity checks, leaving the elapsed/timeout break logic (variables elapsed and
timeout) and the existing echo/error handling unchanged.
🧹 Nitpick comments (4)
tests/e2e-prow/rhoai/scripts/gpu-setup.sh (1)
57-74: Avoid overwriting existing NFD tolerations.
Line 57–74 replaces the tolerations array if it already exists, which can drop preconfigured tolerations. Consider appending instead.♻️ Safer append/initialize approach
+# Ensure tolerations array exists, then append +if ! oc get daemonset nfd-worker -n openshift-nfd -o jsonpath='{.spec.template.spec.tolerations}' | grep -q .; then + oc patch daemonset nfd-worker -n openshift-nfd --type=json -p='[ + { "op": "add", "path": "/spec/template/spec/tolerations", "value": [] } + ]' +fi +oc patch daemonset nfd-worker -n openshift-nfd --type=json -p='[ + { "op": "add", "path": "/spec/template/spec/tolerations/-", + "value": { "key": "nvidia.com/gpu", "operator": "Exists", "effect": "NoSchedule" } }, + { "op": "add", "path": "/spec/template/spec/tolerations/-", + "value": { "key": "gpu", "operator": "Exists", "effect": "NoSchedule" } } +]' -oc patch daemonset nfd-worker -n openshift-nfd --type=json -p='[ - { - "op": "add", - "path": "/spec/template/spec/tolerations", - "value": [ - { - "key": "nvidia.com/gpu", - "operator": "Exists", - "effect": "NoSchedule" - }, - { - "key": "gpu", - "operator": "Exists", - "effect": "NoSchedule" - } - ] - } -]'tests/e2e-prow/rhoai/pipeline-services.sh (1)
5-9: Guard required env vars beforeenvsubst.If
LLAMA_STACK_IMAGE(or any substituted var) is unset,envsubstwill blank it and apply a broken manifest. Consider a fast‑fail check.🛠️ Suggested fix
+: "${LLAMA_STACK_IMAGE:?LLAMA_STACK_IMAGE must be set}" envsubst < "$BASE_DIR/manifests/lightspeed/llama-stack.yaml" | oc apply -f -tests/e2e-prow/rhoai/pipeline.sh (2)
13-21: Consider extracting the lastFROMfor multi‑stage Containerfiles.
grep -m1 '^FROM'selects the first stage; if the Containerfile is multi‑stage, this may return a builder image instead of the runtime image. Using the lastFROMis safer.🛠️ Suggested fix
-LLAMA_STACK_IMAGE=$(curl -sL https://raw.githubusercontent.com/lightspeed-core/lightspeed-stack/main/test.containerfile | grep -m1 '^FROM' | awk '{print $2}') +LLAMA_STACK_IMAGE=$(curl -sL https://raw.githubusercontent.com/lightspeed-core/lightspeed-stack/main/test.containerfile | awk '/^FROM/ {img=$2} END {print img}')
107-154: Recreate the curl pod if it exists but isn’t Running.If a previous run left
vllm-test-curlin Failed/Completed/CrashLoop,oc execwill fail repeatedly. Consider verifying phase and recreating when not Running.🛠️ Suggested fix
- # Create a temporary pod for testing (if it doesn't exist) - if ! oc get pod vllm-test-curl -n "$NAMESPACE" &>/dev/null; then + # Recreate the test pod if it exists but isn't Running + if oc get pod vllm-test-curl -n "$NAMESPACE" &>/dev/null; then + phase=$(oc get pod vllm-test-curl -n "$NAMESPACE" -o jsonpath='{.status.phase}') + if [[ "$phase" != "Running" ]]; then + oc delete pod vllm-test-curl -n "$NAMESPACE" --ignore-not-found=true + fi + fi + + # Create a temporary pod for testing (if it doesn't exist) + if ! oc get pod vllm-test-curl -n "$NAMESPACE" &>/dev/null; then oc run vllm-test-curl --image=curlimages/curl:latest \ --restart=Never -n "$NAMESPACE" -- sleep 3600 oc wait --for=condition=Ready pod/vllm-test-curl -n "$NAMESPACE" --timeout=60s fi
| - --enable-auto-tool-choice | ||
| - --tool-call-parser | ||
| - llama3_json | ||
| - --chat-template | ||
| - /mnt/chat-template/tool_chat_template_llama3.2_json.jinja |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Check if the file exists and examine the exact content at lines 17-21
if [ -f "tests/e2e-prow/rhoai/manifests/vllm/vllm-runtime-gpu.yaml" ]; then
echo "=== File content lines 17-21 with special chars shown ==="
sed -n '17,21p' "tests/e2e-prow/rhoai/manifests/vllm/vllm-runtime-gpu.yaml" | cat -A
echo ""
echo "=== Full context lines 10-25 ==="
sed -n '10,25p' "tests/e2e-prow/rhoai/manifests/vllm/vllm-runtime-gpu.yaml" | cat -n
else
echo "File not found"
find . -name "vllm-runtime-gpu.yaml" -type f
fiRepository: lightspeed-core/lightspeed-stack
Length of output: 885
🌐 Web query:
vLLM argument parser trailing spaces CLI flags parsing behavior
💡 Result:
Short answer: vLLM’s CLI uses a FlexibleArgumentParser built on Python argparse, so flag-parsing follows standard argparse rules — trailing whitespace inside an argument value is preserved by the parser but the shell will usually strip/unpack whitespace unless you quote the value. For file/path arguments and complex JSON-style args, you must quote or escape spaces; vLLM has reported issues when unescaped spaces appear in local file paths. References and guidance below.
- vLLM’s parser: vLLM provides FlexibleArgumentParser / argparse helpers (behaves like argparse). Preserve/stripping of trailing spaces is therefore determined by the shell and how you quote the value. [1]
- Reported problems: there is a vLLM GitHub issue where inference failed when there was a space character in a local file path — shows you must quote/escape paths. [2]
- CLI docs: many vLLM flags accept JSON or values; docs recommend usual CLI quoting (use --flag="value with space" or --flag=value). Boolean flags are store_true/store_false and are unaffected by trailing-space values. [3]
Practical tips:
- Quote values containing spaces: --model "/path/with space/model" or --model="/path/with space/model".
- Use the equals form to avoid shell word-splitting: --json-arg='{"k":"v with space"}'.
- For boolean flags (store_true/store_false) trailing spaces aren’t meaningful.
- If you see a bug when quoting correctly, report it with a repro; check/open vLLM issues (the file-path/space issue has been reported). [2]
Sources:
[1] vLLM argparse utilities documentation.
[2] vLLM GitHub issue: "Inference fails when there is space character in local file path."
[3] vLLM CLI reference / flags documentation.
Remove trailing spaces in vLLM flags to prevent startup failure.
Lines 19 and 21 contain trailing spaces after --tool-call-parser and --chat-template. Kubernetes passes the args array directly to the container without shell processing, so these flags become --tool-call-parser and --chat-template (with spaces). The vLLM argparse parser won't recognize these as valid flags and will reject them as unrecognized arguments, causing the container to fail on startup.
Fix
- - --tool-call-parser
+ - --tool-call-parser
- llama3_json
- - --chat-template
+ - --chat-template
- /mnt/chat-template/tool_chat_template_llama3.2_json.jinja📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - --enable-auto-tool-choice | |
| - --tool-call-parser | |
| - llama3_json | |
| - --chat-template | |
| - /mnt/chat-template/tool_chat_template_llama3.2_json.jinja | |
| - --enable-auto-tool-choice | |
| - --tool-call-parser | |
| - llama3_json | |
| - --chat-template | |
| - /mnt/chat-template/tool_chat_template_llama3.2_json.jinja |
🤖 Prompt for AI Agents
In `@tests/e2e-prow/rhoai/manifests/vllm/vllm-runtime-gpu.yaml` around lines 17 -
21, In vllm-runtime-gpu.yaml remove the trailing spaces from the vLLM CLI flags
so Kubernetes doesn't pass malformed args: fix the args entries that contain
"--tool-call-parser " and "--chat-template " to be "--tool-call-parser" and
"--chat-template" (and ensure their corresponding values like "llama3_json" and
"/mnt/chat-template/tool_chat_template_llama3.2_json.jinja" remain separate
array items or properly paired) so vLLM's argparse recognizes the flags and the
container can start.
| echo "--> Applying OperatorGroups from operatorgroup.yaml..." | ||
| oc apply -f "$BASE_DIR/manifests/operators/operatorgroup.yaml" | ||
|
|
||
| sleep 10 | ||
|
|
||
| echo "--> Applying Operator Subscriptions from operators.yaml..." | ||
| oc apply -f "$BASE_DIR/manifests/operators/operators.yaml" | ||
|
|
||
| sleep 10 | ||
|
|
||
| # WAIT FOR GPU OPERATOR NAMESPACE AND OPERATORGROUP | ||
| echo "--> Ensuring GPU Operator namespace and OperatorGroup are ready..." | ||
| oc wait --for=jsonpath='{.status.phase}'=Active namespace/nvidia-gpu-operator --timeout=60s | ||
| echo " -> Waiting for GPU OperatorGroup to be created..." | ||
| until oc get operatorgroup nvidia-gpu-operator-group -n nvidia-gpu-operator &>/dev/null; do | ||
| echo " ...still waiting for OperatorGroup" | ||
| sleep 2 | ||
| done | ||
| echo " -> GPU OperatorGroup ready" | ||
|
|
||
| # Give OLM a moment to process the OperatorGroup before checking subscriptions | ||
| sleep 5 | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
find . -name "bootstrap.sh" -o -name "operatorgroup.yaml" -o -name "operators.yaml" | head -20Repository: lightspeed-core/lightspeed-stack
Length of output: 239
🏁 Script executed:
cat -n ./tests/e2e-prow/rhoai/scripts/bootstrap.sh | head -50Repository: lightspeed-core/lightspeed-stack
Length of output: 2193
🏁 Script executed:
cat -n ./tests/e2e-prow/rhoai/manifests/operators/operatorgroup.yamlRepository: lightspeed-core/lightspeed-stack
Length of output: 802
🏁 Script executed:
find ./tests/e2e-prow/rhoai/manifests -name "*namespace*" -o -name "*namespaces*" 2>/dev/null | head -20Repository: lightspeed-core/lightspeed-stack
Length of output: 119
🏁 Script executed:
ls -la ./tests/e2e-prow/rhoai/manifests/namespaces/Repository: lightspeed-core/lightspeed-stack
Length of output: 310
🏁 Script executed:
cat -n ./tests/e2e-prow/rhoai/manifests/namespaces/nvidia-operator.yamlRepository: lightspeed-core/lightspeed-stack
Length of output: 211
Ensure the GPU operator namespace exists before oc wait.
oc wait namespace/nvidia-gpu-operator will fail if the namespace doesn't exist. Since operatorgroup.yaml only applies OperatorGroup resources and doesn't create the namespace, apply the namespace manifest first.
🛠️ Suggested fix
echo "--> Ensuring GPU Operator namespace and OperatorGroup are ready..."
+if ! oc get namespace nvidia-gpu-operator &>/dev/null; then
+ oc apply -f "$BASE_DIR/manifests/namespaces/nvidia-operator.yaml"
+fi
oc wait --for=jsonpath='{.status.phase}'=Active namespace/nvidia-gpu-operator --timeout=60s🤖 Prompt for AI Agents
In `@tests/e2e-prow/rhoai/scripts/bootstrap.sh` around lines 23 - 45,
operatorgroup.yaml does not create the nvidia-gpu-operator namespace, so calling
oc wait namespace/nvidia-gpu-operator can fail; before running oc wait or
applying operatorgroup.yaml, apply the manifest that creates the
nvidia-gpu-operator namespace (the Namespace manifest that declares
metadata.name: nvidia-gpu-operator), ensure the namespace exists (e.g., oc apply
<namespace-manifest> and optionally oc wait --for=condition=Created
namespace/nvidia-gpu-operator), then continue to apply operatorgroup.yaml,
operators.yaml and the existing oc wait/operatorgroup polling logic.
| # Verify GPU operator InstallPlan was created before waiting for CSV | ||
| echo " -> Verifying GPU Operator InstallPlan was created..." | ||
| timeout=120 | ||
| elapsed=0 | ||
| until oc get installplan -n nvidia-gpu-operator --no-headers 2>/dev/null | grep -q .; do | ||
| if [ $elapsed -ge $timeout ]; then | ||
| echo " ❌ No InstallPlan created for GPU Operator - this is an OLM issue" | ||
| echo " Attempting to fix by recreating subscription..." | ||
| oc delete subscription gpu-operator-certified -n nvidia-gpu-operator | ||
| sleep 5 | ||
| oc apply -f "$BASE_DIR/manifests/operators/operators.yaml" | ||
| sleep 10 | ||
| # Try one more time | ||
| if ! oc get installplan -n nvidia-gpu-operator --no-headers 2>/dev/null | grep -q .; then | ||
| echo " ❌ Still no InstallPlan - manual intervention required" | ||
| exit 1 | ||
| fi | ||
| break | ||
| fi | ||
| echo " ...waiting for InstallPlan ($elapsed/$timeout seconds)" | ||
| sleep 5 | ||
| elapsed=$((elapsed + 5)) | ||
| done | ||
| echo " -> InstallPlan created successfully" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Check if file exists and show the relevant section
wc -l tests/e2e-prow/rhoai/scripts/bootstrap.sh
head -80 tests/e2e-prow/rhoai/scripts/bootstrap.sh | tail -40Repository: lightspeed-core/lightspeed-stack
Length of output: 1872
Give the InstallPlan retry a real wait window.
After recreating the subscription, the script checks for an InstallPlan only once after 10s. OLM can take longer, so this can still fail even when it would succeed shortly after. Reset elapsed to 0 and loop back to the wait condition to give the retry the same full timeout window as the initial attempt, similar to how the fix suggests using a retried flag and continue.
🤖 Prompt for AI Agents
In `@tests/e2e-prow/rhoai/scripts/bootstrap.sh` around lines 56 - 80, The retry
after recreating the subscription only checks once after a 10s sleep, so reset
the wait loop state and re-enter the same until loop to give the recreated
subscription the full timeout; specifically, inside the branch where you delete
and reapply the subscription (the block manipulating subscription
gpu-operator-certified and applying
"$BASE_DIR/manifests/operators/operators.yaml"), set elapsed=0 (or set a retried
flag and use continue) so execution jumps back into the existing until oc get
installplan -n nvidia-gpu-operator ... | grep -q . loop and retries for the full
timeout instead of a single one-time check.
| # Wait for GPU nodes to be labeled by NFD | ||
| echo "Waiting for GPU nodes to be labeled by NFD..." | ||
| timeout=600 # 10 minutes | ||
| elapsed=0 | ||
| until oc get nodes -l nvidia.com/gpu.present=true --no-headers 2>/dev/null | grep -q .; do | ||
| if [ $elapsed -ge $timeout ]; then | ||
| echo "❌ Timeout waiting for GPU nodes to be labeled" | ||
| exit 1 | ||
| fi | ||
| echo "No GPU nodes found yet. Waiting... ($elapsed/$timeout seconds)" | ||
| sleep 10 | ||
| elapsed=$((elapsed + 10)) | ||
| done | ||
| echo "✅ GPU nodes detected." | ||
|
|
||
| # Wait for GPU capacity to be available | ||
| echo "Waiting for GPU capacity to be available on nodes..." | ||
| timeout=600 # 10 minutes | ||
| elapsed=0 | ||
| until [ "$(oc get nodes -l nvidia.com/gpu.present=true -o jsonpath='{.items[0].status.capacity.nvidia\.com/gpu}' 2>/dev/null)" != "" ] && \ | ||
| [ "$(oc get nodes -l nvidia.com/gpu.present=true -o jsonpath='{.items[0].status.capacity.nvidia\.com/gpu}' 2>/dev/null)" != "0" ]; do | ||
| if [ $elapsed -ge $timeout ]; then | ||
| echo "❌ Timeout waiting for GPU capacity" | ||
| echo "DEBUG: Checking GPU status..." | ||
| oc get nodes -l nvidia.com/gpu.present=true -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{"capacity: "}{.status.capacity.nvidia\.com/gpu}{"\t"}{"allocatable: "}{.status.allocatable.nvidia\.com/gpu}{"\n"}{end}' | ||
| exit 1 | ||
| fi | ||
| capacity=$(oc get nodes -l nvidia.com/gpu.present=true -o jsonpath='{.items[0].status.capacity.nvidia\.com/gpu}' 2>/dev/null || echo "0") | ||
| echo "GPU capacity: $capacity. Waiting... ($elapsed/$timeout seconds)" | ||
| sleep 10 | ||
| elapsed=$((elapsed + 10)) | ||
| done | ||
| echo "✅ GPU capacity available." | ||
|
|
||
| # Display GPU node info | ||
| echo "GPU nodes ready:" | ||
| oc get nodes -l nvidia.com/gpu.present=true -o custom-columns=NAME:.metadata.name,GPU:.status.capacity.nvidia\\.com/gpu,INSTANCE:.metadata.labels.node\\.kubernetes\\.io/instance-type | ||
|
|
||
| echo "Applying vLLM manifests..." | ||
|
|
||
| envsubst < "$BASE_DIR/manifests/vllm/vllm-runtime-gpu.yaml" | oc apply -f - | ||
|
|
||
| # Wait a moment for the ServingRuntime to be fully persisted before creating the InferenceService | ||
| echo "Waiting for ServingRuntime to be ready..." | ||
| sleep 5 | ||
|
|
||
| oc apply -f "$BASE_DIR/manifests/vllm/inference-service.yaml" | ||
| oc apply -f "$BASE_DIR/manifests/vllm/vllm-inference-service-gpu.yaml" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, check if the file exists and read the relevant lines
if [ -f "tests/e2e-prow/rhoai/scripts/deploy-vllm.sh" ]; then
echo "=== File found. Reading lines 32-78 ==="
sed -n '32,78p' tests/e2e-prow/rhoai/scripts/deploy-vllm.sh
else
echo "File not found"
fiRepository: lightspeed-core/lightspeed-stack
Length of output: 2232
🏁 Script executed:
# Also check the file size to understand context
wc -l tests/e2e-prow/rhoai/scripts/deploy-vllm.shRepository: lightspeed-core/lightspeed-stack
Length of output: 122
Avoid single-node GPU capacity gating.
The capacity loop only inspects the first matching node (.items[0]). In a multi-node GPU setup, if that node hasn't yet reported capacity while another node is ready, the script will timeout unnecessarily. Check for non-zero capacity on any node instead.
🛠️ Suggested fix
-until [ "$(oc get nodes -l nvidia.com/gpu.present=true -o jsonpath='{.items[0].status.capacity.nvidia\.com/gpu}' 2>/dev/null)" != "" ] && \
- [ "$(oc get nodes -l nvidia.com/gpu.present=true -o jsonpath='{.items[0].status.capacity.nvidia\.com/gpu}' 2>/dev/null)" != "0" ]; do
+until oc get nodes -l nvidia.com/gpu.present=true \
+ -o jsonpath='{range .items[*]}{.status.capacity.nvidia\.com/gpu}{"\n"}{end}' 2>/dev/null \
+ | grep -qE '^[1-9]'; do
if [ $elapsed -ge $timeout ]; then
echo "❌ Timeout waiting for GPU capacity"
@@
- capacity=$(oc get nodes -l nvidia.com/gpu.present=true -o jsonpath='{.items[0].status.capacity.nvidia\.com/gpu}' 2>/dev/null || echo "0")
- echo "GPU capacity: $capacity. Waiting... ($elapsed/$timeout seconds)"
+ capacities=$(oc get nodes -l nvidia.com/gpu.present=true \
+ -o jsonpath='{range .items[*]}{.metadata.name}:{.status.capacity.nvidia\.com/gpu}{"\n"}{end}' 2>/dev/null || true)
+ echo -e "GPU capacity:\n$capacities\nWaiting... ($elapsed/$timeout seconds)"
sleep 10
elapsed=$((elapsed + 10))
done🤖 Prompt for AI Agents
In `@tests/e2e-prow/rhoai/scripts/deploy-vllm.sh` around lines 32 - 78, The loop
incorrectly checks only the first node via jsonpath
'.items[0].status.capacity.nvidia.com/gpu', causing false timeouts in multi-node
clusters; change the until condition and the capacity assignment to inspect all
nodes (e.g., use jsonpath that ranges over .items[*] or pipe to jq) and succeed
if any node has a non-empty, non-zero GPU value. Update the two oc calls in the
until condition and the capacity=$(...) assignment to collect all GPU capacities
(e.g., '{range .items[*]}{.status.capacity.nvidia\.com/gpu}{" "}{end}') and test
that at least one token != "" and != "0"; keep the existing debug oc get nodes
output unchanged.
| until [ "$(oc get nodes -l nvidia.com/gpu.present=true -o jsonpath='{.items[0].status.capacity.nvidia\.com/gpu}' 2>/dev/null)" != "" ] && \ | ||
| [ "$(oc get nodes -l nvidia.com/gpu.present=true -o jsonpath='{.items[0].status.capacity.nvidia\.com/gpu}' 2>/dev/null)" != "0" ]; do | ||
| if [ $elapsed -ge $timeout ]; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
find . -name "gpu-setup.sh" -type fRepository: lightspeed-core/lightspeed-stack
Length of output: 119
🏁 Script executed:
cat -n ./tests/e2e-prow/rhoai/scripts/gpu-setup.sh | sed -n '150,180p'Repository: lightspeed-core/lightspeed-stack
Length of output: 1944
🏁 Script executed:
cat -n ./tests/e2e-prow/rhoai/scripts/gpu-setup.sh | sed -n '180,220p'Repository: lightspeed-core/lightspeed-stack
Length of output: 1686
Add allocatable checks to the GPU capacity wait loop.
The until condition on lines 163–164 only checks capacity, but the echo message on line 187 indicates the script intends to wait for both capacity and allocatable to be available. The loop exits as soon as capacity is non-zero, even if allocatable remains 0, which prevents GPU scheduling and causes later failures. Kubernetes requires non-zero allocatable for pods to be schedulable.
Proposed fix
-until [ "$(oc get nodes -l nvidia.com/gpu.present=true -o jsonpath='{.items[0].status.capacity.nvidia\.com/gpu}' 2>/dev/null)" != "" ] && \
- [ "$(oc get nodes -l nvidia.com/gpu.present=true -o jsonpath='{.items[0].status.capacity.nvidia\.com/gpu}' 2>/dev/null)" != "0" ]; do
+until [ "$(oc get nodes -l nvidia.com/gpu.present=true -o jsonpath='{.items[0].status.capacity.nvidia\.com/gpu}' 2>/dev/null)" != "" ] && \
+ [ "$(oc get nodes -l nvidia.com/gpu.present=true -o jsonpath='{.items[0].status.capacity.nvidia\.com/gpu}' 2>/dev/null)" != "0" ] && \
+ [ "$(oc get nodes -l nvidia.com/gpu.present=true -o jsonpath='{.items[0].status.allocatable.nvidia\.com/gpu}' 2>/dev/null)" != "" ] && \
+ [ "$(oc get nodes -l nvidia.com/gpu.present=true -o jsonpath='{.items[0].status.allocatable.nvidia\.com/gpu}' 2>/dev/null)" != "0" ]; do📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| until [ "$(oc get nodes -l nvidia.com/gpu.present=true -o jsonpath='{.items[0].status.capacity.nvidia\.com/gpu}' 2>/dev/null)" != "" ] && \ | |
| [ "$(oc get nodes -l nvidia.com/gpu.present=true -o jsonpath='{.items[0].status.capacity.nvidia\.com/gpu}' 2>/dev/null)" != "0" ]; do | |
| if [ $elapsed -ge $timeout ]; then | |
| until [ "$(oc get nodes -l nvidia.com/gpu.present=true -o jsonpath='{.items[0].status.capacity.nvidia\.com/gpu}' 2>/dev/null)" != "" ] && \ | |
| [ "$(oc get nodes -l nvidia.com/gpu.present=true -o jsonpath='{.items[0].status.capacity.nvidia\.com/gpu}' 2>/dev/null)" != "0" ] && \ | |
| [ "$(oc get nodes -l nvidia.com/gpu.present=true -o jsonpath='{.items[0].status.allocatable.nvidia\.com/gpu}' 2>/dev/null)" != "" ] && \ | |
| [ "$(oc get nodes -l nvidia.com/gpu.present=true -o jsonpath='{.items[0].status.allocatable.nvidia\.com/gpu}' 2>/dev/null)" != "0" ]; do | |
| if [ $elapsed -ge $timeout ]; then |
🤖 Prompt for AI Agents
In `@tests/e2e-prow/rhoai/scripts/gpu-setup.sh` around lines 163 - 165, The loop
uses an until condition that only checks capacity via oc get nodes ...
status.capacity.nvidia.com/gpu; update that condition to also check allocatable
(status.allocatable.nvidia.com/gpu) is present and non-zero, so the until stops
only when both capacity and allocatable are non-empty and not "0". Modify the
existing until expression (the oc get nodes calls in the loop) to include the
allocatable checks alongside the capacity checks, leaving the elapsed/timeout
break logic (variables elapsed and timeout) and the existing echo/error handling
unchanged.
tisnik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you
Description
Note:
some tests fail because the current logic to restart the lightspeed-stack docker container needs to be updated (separate ticket).
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
PTAL here to see the prow job rehearse logs.
Summary by CodeRabbit
New Features
Configuration
✏️ Tip: You can customize this high-level summary in your review settings.