Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions evals/claude-code/eval.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -72,3 +72,16 @@ config:
toolPattern: ".*"
minToolCalls: 1
maxToolCalls: 20
# OpenShift tasks
- glob: ../tasks/openshift/*/*.yaml
labelSelector:
suite: openshift
assertions:
promptsUsed:
- server: kubernetes
prompt: plan_mustgather
toolsUsed:
- server: kubernetes
tool: resource_create_or_update
minToolCalls: 1
maxToolCalls: 15
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
kind: Task
metadata:
labels:
suite: openshift
name: plan-mustgather-audit-logs
difficulty: easy
steps:
setup:
inline: |-
#!/usr/bin/env bash
set -euo pipefail
# Cleanup any existing must-gather resources from previous runs
kubectl get ns -o name 2>/dev/null | grep openshift-must-gather | xargs -r kubectl delete --wait=false 2>/dev/null || true
kubectl get clusterrolebinding -o name 2>/dev/null | grep must-gather-collector | xargs -r kubectl delete 2>/dev/null || true
verify:
inline: |-
#!/usr/bin/env bash
set -euo pipefail
# Find must-gather pod and verify it uses gather_audit_logs command
POD_JSON=$(kubectl get pods -A -o json | jq -r '.items[] | select(.metadata.namespace | startswith("openshift-must-gather"))')
if [ -z "$POD_JSON" ]; then
echo "No must-gather pod found"
exit 1
fi
NS=$(echo "$POD_JSON" | jq -r '.metadata.namespace' | head -1)
POD=$(echo "$POD_JSON" | jq -r '.metadata.name' | head -1)
# Verify pod uses gather_audit_logs command
kubectl get pod "$POD" -n "$NS" -o yaml | grep -q "gather_audit_logs"
cleanup:
inline: |-
#!/usr/bin/env bash
set -euo pipefail
# Delete all must-gather namespaces and clusterrolebindings
kubectl get ns -o name 2>/dev/null | grep openshift-must-gather | xargs -r kubectl delete --wait=false 2>/dev/null || true
kubectl get clusterrolebinding -o name 2>/dev/null | grep must-gather-collector | xargs -r kubectl delete 2>/dev/null || true
prompt:
inline: I need to collect audit logs from my OpenShift cluster for a security investigation. Please set up a must-gather that specifically gathers audit logs and apply it.
assertions:
promptsUsed:
- server: kubernetes
prompt: plan_mustgather
toolsUsed:
- server: kubernetes
tool: resource_create_or_update
minToolCalls: 1
maxToolCalls: 15
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
kind: Task
metadata:
labels:
suite: openshift
name: plan-mustgather-custom-images
difficulty: medium
steps:
setup:
inline: |-
#!/usr/bin/env bash
set -euo pipefail
# Cleanup any existing must-gather resources from previous runs
kubectl get ns -o name 2>/dev/null | grep openshift-must-gather | xargs -r kubectl delete --wait=false 2>/dev/null || true
kubectl get clusterrolebinding -o name 2>/dev/null | grep must-gather-collector | xargs -r kubectl delete 2>/dev/null || true
verify:
inline: |-
#!/usr/bin/env bash
set -euo pipefail
# Find must-gather pod and verify it uses custom images
POD_JSON=$(kubectl get pods -A -o json | jq -r '.items[] | select(.metadata.namespace | startswith("openshift-must-gather"))')
if [ -z "$POD_JSON" ]; then
echo "No must-gather pod found"
exit 1
fi
NS=$(echo "$POD_JSON" | jq -r '.metadata.namespace' | head -1)
POD=$(echo "$POD_JSON" | jq -r '.metadata.name' | head -1)
# Verify pod uses the logging operator image
kubectl get pod "$POD" -n "$NS" -o yaml | grep -q "cluster-logging-rhel9-operator"
Comment on lines +27 to +28
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Verification only checks one of the two required images.

The prompt specifies two images should be configured (ose-must-gather:v4.15 and cluster-logging-rhel9-operator:latest), but the verification only checks for the logging operator image. This could lead to false positives if only one image is configured.

Consider verifying both images
       # Verify pod uses the logging operator image
-      kubectl get pod "$POD" -n "$NS" -o yaml | grep -q "cluster-logging-rhel9-operator"
+      POD_YAML=$(kubectl get pod "$POD" -n "$NS" -o yaml)
+      # Verify pod uses both the platform and logging operator images
+      echo "$POD_YAML" | grep -q "ose-must-gather"
+      echo "$POD_YAML" | grep -q "cluster-logging-rhel9-operator"

Also applies to: 36-37

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@evals/tasks/openshift/plan-mustgather-custom-images/plan-mustgather-custom-images.yaml`
around lines 27 - 28, The verification only checks for
"cluster-logging-rhel9-operator" and misses "ose-must-gather:v4.15"; update the
pod inspection for the commands that use kubectl get pod "$POD" -n "$NS" -o yaml
| grep -q "cluster-logging-rhel9-operator" (and the duplicate at the later
occurrence) to also verify the presence of the ose-must-gather:v4.15
image—either by adding a second grep check for "ose-must-gather:v4.15" after the
existing command or replacing the single grep with a combined check/regex that
ensures both "cluster-logging-rhel9-operator:latest" and "ose-must-gather:v4.15"
are present.

cleanup:
inline: |-
#!/usr/bin/env bash
set -euo pipefail
# Delete all must-gather namespaces and clusterrolebindings
kubectl get ns -o name 2>/dev/null | grep openshift-must-gather | xargs -r kubectl delete --wait=false 2>/dev/null || true
kubectl get clusterrolebinding -o name 2>/dev/null | grep must-gather-collector | xargs -r kubectl delete 2>/dev/null || true
prompt:
inline: I'm debugging logging issues on my OpenShift 4.15 cluster. Set up and apply a must-gather using both the platform image registry.redhat.io/openshift4/ose-must-gather:v4.15 and the logging operator image registry.redhat.io/openshift-logging/cluster-logging-rhel9-operator:latest.
assertions:
promptsUsed:
- server: kubernetes
prompt: plan_mustgather
toolsUsed:
- server: kubernetes
tool: resource_create_or_update
minToolCalls: 1
maxToolCalls: 15
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
kind: Task
metadata:
labels:
suite: openshift
name: plan-mustgather-custom-namespace
difficulty: easy
steps:
setup:
inline: |-
#!/usr/bin/env bash
set -euo pipefail
# Cleanup any existing resources
kubectl delete ns my-debug-namespace --wait=false 2>/dev/null || true
kubectl get clusterrolebinding -o name 2>/dev/null | grep my-debug-namespace-must-gather-collector | xargs -r kubectl delete 2>/dev/null || true
verify:
inline: |-
#!/usr/bin/env bash
set -euo pipefail
# Verify resources exist in the custom namespace
kubectl get ns my-debug-namespace
kubectl get sa must-gather-collector -n my-debug-namespace
kubectl get pod -n my-debug-namespace -o name | grep -q must-gather
cleanup:
inline: |-
#!/usr/bin/env bash
set -euo pipefail
# Delete the custom namespace and clusterrolebinding
kubectl delete ns my-debug-namespace --wait=false 2>/dev/null || true
kubectl get clusterrolebinding -o name 2>/dev/null | grep my-debug-namespace-must-gather-collector | xargs -r kubectl delete 2>/dev/null || true
prompt:
inline: I want to run a must-gather but need it in a specific namespace called "my-debug-namespace" due to our cluster policies. Can you set this up and apply it?
assertions:
promptsUsed:
- server: kubernetes
prompt: plan_mustgather
toolsUsed:
- server: kubernetes
tool: resource_create_or_update
minToolCalls: 1
maxToolCalls: 15
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
kind: Task
metadata:
labels:
suite: openshift
name: plan-mustgather-default
difficulty: easy
steps:
setup:
inline: |-
#!/usr/bin/env bash
set -euo pipefail
# Cleanup any existing must-gather resources from previous runs
kubectl get ns -o name 2>/dev/null | grep openshift-must-gather | xargs -r kubectl delete --wait=false 2>/dev/null || true
kubectl get clusterrolebinding -o name 2>/dev/null | grep must-gather-collector | xargs -r kubectl delete 2>/dev/null || true
verify:
inline: |-
#!/usr/bin/env bash
set -euo pipefail
# Find must-gather pod and verify it exists with default image
POD_JSON=$(kubectl get pods -A -o json | jq -r '.items[] | select(.metadata.namespace | startswith("openshift-must-gather"))')
if [ -z "$POD_JSON" ]; then
echo "No must-gather pod found"
exit 1
fi
NS=$(echo "$POD_JSON" | jq -r '.metadata.namespace' | head -1)
POD=$(echo "$POD_JSON" | jq -r '.metadata.name' | head -1)
# Verify pod uses default must-gather image
kubectl get pod "$POD" -n "$NS" -o yaml | grep -q "ose-must-gather"
cleanup:
inline: |-
#!/usr/bin/env bash
set -euo pipefail
# Delete all must-gather namespaces and clusterrolebindings
kubectl get ns -o name 2>/dev/null | grep openshift-must-gather | xargs -r kubectl delete --wait=false 2>/dev/null || true
kubectl get clusterrolebinding -o name 2>/dev/null | grep must-gather-collector | xargs -r kubectl delete 2>/dev/null || true
prompt:
inline: I need to collect diagnostic data from my OpenShift cluster for a support case. Can you help me plan and apply a must-gather collection?
assertions:
promptsUsed:
- server: kubernetes
prompt: plan_mustgather
toolsUsed:
- server: kubernetes
tool: resource_create_or_update
minToolCalls: 1
maxToolCalls: 15
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
kind: Task
metadata:
labels:
suite: openshift
name: plan-mustgather-host-network
difficulty: easy
steps:
setup:
inline: |-
#!/usr/bin/env bash
set -euo pipefail
# Cleanup any existing must-gather resources from previous runs
kubectl get ns -o name 2>/dev/null | grep openshift-must-gather | xargs -r kubectl delete --wait=false 2>/dev/null || true
kubectl get clusterrolebinding -o name 2>/dev/null | grep must-gather-collector | xargs -r kubectl delete 2>/dev/null || true
verify:
inline: |-
#!/usr/bin/env bash
set -euo pipefail
# Find must-gather pod and verify it has hostNetwork enabled
POD_JSON=$(kubectl get pods -A -o json | jq -r '.items[] | select(.metadata.namespace | startswith("openshift-must-gather"))')
if [ -z "$POD_JSON" ]; then
echo "No must-gather pod found"
exit 1
fi
NS=$(echo "$POD_JSON" | jq -r '.metadata.namespace' | head -1)
POD=$(echo "$POD_JSON" | jq -r '.metadata.name' | head -1)
# Verify pod has hostNetwork: true
kubectl get pod "$POD" -n "$NS" -o yaml | grep -q "hostNetwork: true"
cleanup:
inline: |-
#!/usr/bin/env bash
set -euo pipefail
# Delete all must-gather namespaces and clusterrolebindings
kubectl get ns -o name 2>/dev/null | grep openshift-must-gather | xargs -r kubectl delete --wait=false 2>/dev/null || true
kubectl get clusterrolebinding -o name 2>/dev/null | grep must-gather-collector | xargs -r kubectl delete 2>/dev/null || true
prompt:
inline: I'm troubleshooting network connectivity issues on my cluster. Set up and apply a must-gather that uses host networking to capture network-level diagnostics, and keep the resources around after collection so I can inspect them.
assertions:
promptsUsed:
- server: kubernetes
prompt: plan_mustgather
toolsUsed:
- server: kubernetes
tool: resource_create_or_update
minToolCalls: 1
maxToolCalls: 15
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
kind: Task
metadata:
labels:
suite: openshift
name: plan-mustgather-node-selector
difficulty: medium
steps:
setup:
inline: |-
#!/usr/bin/env bash
set -euo pipefail
# Cleanup any existing must-gather resources from previous runs
kubectl get ns -o name 2>/dev/null | grep openshift-must-gather | xargs -r kubectl delete --wait=false 2>/dev/null || true
kubectl get clusterrolebinding -o name 2>/dev/null | grep must-gather-collector | xargs -r kubectl delete 2>/dev/null || true
verify:
inline: |-
#!/usr/bin/env bash
set -euo pipefail
# Find must-gather pod and verify it has the correct nodeSelector
POD_JSON=$(kubectl get pods -A -o json | jq -r '.items[] | select(.metadata.namespace | startswith("openshift-must-gather"))')
if [ -z "$POD_JSON" ]; then
echo "No must-gather pod found"
exit 1
fi
NS=$(echo "$POD_JSON" | jq -r '.metadata.namespace' | head -1)
POD=$(echo "$POD_JSON" | jq -r '.metadata.name' | head -1)
Comment on lines +20 to +26
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential JSON parsing failure when multiple pods match.

When multiple pods match the namespace prefix filter, jq -r '.items[] | select(...)' outputs multiple JSON objects concatenated (not as an array). Piping $POD_JSON back into jq on lines 25-26 will fail with a parse error because it's not valid JSON when there are multiple objects.

Consider using jq -s (slurp) to handle multiple objects, or filter to an array from the start:

Proposed fix
-      POD_JSON=$(kubectl get pods -A -o json | jq -r '.items[] | select(.metadata.namespace | startswith("openshift-must-gather"))')
+      POD_JSON=$(kubectl get pods -A -o json | jq '[.items[] | select(.metadata.namespace | startswith("openshift-must-gather"))] | first')
       if [ -z "$POD_JSON" ]; then
         echo "No must-gather pod found"
         exit 1
       fi
-      NS=$(echo "$POD_JSON" | jq -r '.metadata.namespace' | head -1)
-      POD=$(echo "$POD_JSON" | jq -r '.metadata.name' | head -1)
+      NS=$(echo "$POD_JSON" | jq -r '.metadata.namespace')
+      POD=$(echo "$POD_JSON" | jq -r '.metadata.name')

Note: This same pattern appears in all other task files in this PR.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
POD_JSON=$(kubectl get pods -A -o json | jq -r '.items[] | select(.metadata.namespace | startswith("openshift-must-gather"))')
if [ -z "$POD_JSON" ]; then
echo "No must-gather pod found"
exit 1
fi
NS=$(echo "$POD_JSON" | jq -r '.metadata.namespace' | head -1)
POD=$(echo "$POD_JSON" | jq -r '.metadata.name' | head -1)
POD_JSON=$(kubectl get pods -A -o json | jq '[.items[] | select(.metadata.namespace | startswith("openshift-must-gather"))] | first')
if [ -z "$POD_JSON" ]; then
echo "No must-gather pod found"
exit 1
fi
NS=$(echo "$POD_JSON" | jq -r '.metadata.namespace')
POD=$(echo "$POD_JSON" | jq -r '.metadata.name')
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@evals/tasks/openshift/plan-mustgather-node-selector/plan-mustgather-node-selector.yaml`
around lines 20 - 26, The current extraction stores possibly multiple pod JSON
objects into POD_JSON which concatenates objects and later pipelines into jq for
NS and POD causing parse errors; change the jq pipeline that sets POD_JSON to
produce a single JSON array (use jq -s or wrap items[] selection into an array)
or restrict the selection to the first matching item so subsequent jq calls that
read POD_JSON (used to set NS and POD) receive valid JSON; update the POD_JSON
assignment and/or the NS and POD extraction logic that references POD_JSON to
use jq -s or array indexing to safely handle multiple matches.

# Verify pod has worker node selector
kubectl get pod "$POD" -n "$NS" -o yaml | grep -q "node-role.kubernetes.io/worker"
cleanup:
inline: |-
#!/usr/bin/env bash
set -euo pipefail
# Delete all must-gather namespaces and clusterrolebindings
kubectl get ns -o name 2>/dev/null | grep openshift-must-gather | xargs -r kubectl delete --wait=false 2>/dev/null || true
kubectl get clusterrolebinding -o name 2>/dev/null | grep must-gather-collector | xargs -r kubectl delete 2>/dev/null || true
prompt:
inline: I need to collect diagnostics but only want the must-gather pod to run on worker nodes, not on control plane nodes. The node selector should be node-role.kubernetes.io/worker. Can you set this up and apply it?
assertions:
promptsUsed:
- server: kubernetes
prompt: plan_mustgather
toolsUsed:
- server: kubernetes
tool: resource_create_or_update
minToolCalls: 1
maxToolCalls: 15
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
kind: Task
metadata:
labels:
suite: openshift
name: plan-mustgather-timeout-since
difficulty: medium
steps:
setup:
inline: |-
#!/usr/bin/env bash
set -euo pipefail
# Cleanup any existing must-gather resources from previous runs
kubectl get ns -o name 2>/dev/null | grep openshift-must-gather | xargs -r kubectl delete --wait=false 2>/dev/null || true
kubectl get clusterrolebinding -o name 2>/dev/null | grep must-gather-collector | xargs -r kubectl delete 2>/dev/null || true
verify:
inline: |-
#!/usr/bin/env bash
set -euo pipefail
# Find must-gather pod and verify it has timeout and since configured
POD_JSON=$(kubectl get pods -A -o json | jq -r '.items[] | select(.metadata.namespace | startswith("openshift-must-gather"))')
if [ -z "$POD_JSON" ]; then
echo "No must-gather pod found"
exit 1
fi
NS=$(echo "$POD_JSON" | jq -r '.metadata.namespace' | head -1)
POD=$(echo "$POD_JSON" | jq -r '.metadata.name' | head -1)
POD_YAML=$(kubectl get pod "$POD" -n "$NS" -o yaml)
# Verify pod has MUST_GATHER_SINCE env var
echo "$POD_YAML" | grep -q "MUST_GATHER_SINCE"
# Verify pod has timeout in command
echo "$POD_YAML" | grep -q "/usr/bin/timeout"
cleanup:
inline: |-
#!/usr/bin/env bash
set -euo pipefail
# Delete all must-gather namespaces and clusterrolebindings
kubectl get ns -o name 2>/dev/null | grep openshift-must-gather | xargs -r kubectl delete --wait=false 2>/dev/null || true
kubectl get clusterrolebinding -o name 2>/dev/null | grep must-gather-collector | xargs -r kubectl delete 2>/dev/null || true
prompt:
inline: My cluster had an incident about an hour ago. I need a must-gather that only collects logs and data from the last 2 hours to keep the archive small. Also set a 30 minute timeout so it doesn't run forever. Set this up and apply it.
assertions:
promptsUsed:
- server: kubernetes
prompt: plan_mustgather
toolsUsed:
- server: kubernetes
tool: resource_create_or_update
minToolCalls: 1
maxToolCalls: 15