Skip to content

Version updates#21

Open
marosset wants to merge 2 commits intoAzure-Samples:mainfrom
marosset:version-updates
Open

Version updates#21
marosset wants to merge 2 commits intoAzure-Samples:mainfrom
marosset:version-updates

Conversation

@marosset
Copy link
Copy Markdown

Purpose

  • ...

Does this introduce a breaking change?

[ ] Yes
[x] No

Pull Request Type

What kind of change does this Pull Request introduce?

This PR updates versions of various components (Azure APIs, Ray, KubeRay, etc)

[ ] Bugfix
[ ] Feature
[ ] Code style update (formatting, local variables)
[ ] Refactoring (no functional changes, no api changes)
[ ] Documentation content changes
[x] Other... Please describe: Chore

How to Test

  • Get the code
git clone [repo-address]
cd [repo-name]
git checkout [branch-name]

  • Test the code
# Deploy an AKS cluster (aks-classic example)
cd aks-classic
tofu init && tofu apply

# Get cluster credentials
az aks get-credentials --resource-group <rg-name> --name <cluster-name>

# Install KubeRay operator (should install v1.6.0)
helm repo add kuberay https://ray-project.github.io/kuberay-helm/
helm install kuberay-operator kuberay/kuberay-operator --version 1.6.0 --namespace kuberay --create-namespace

# Download and run the Ray pytorch-mnist job with Ray 2.54.1
curl -LO https://raw.githubusercontent.com/ray-project/kuberay/master/ray-operator/config/samples/pytorch-mnist/ray-job.pytorch-mnist.yaml
sed -i 's/2\.52\.0/2.54.1/g' ray-job.pytorch-mnist.yaml
kubectl apply -f ray-job.pytorch-mnist.yaml

# Monitor job status (takes ~8-10 minutes for pip install + training)
kubectl get rayjob rayjob-pytorch-mnist -w

What to Check

Verify that the following are valid

  • KubeRay operator pod is Running in the kuberay namespace with version 1.6.0
  • Ray job completes with status SUCCEEDED
  • Training accuracy is ~87% on Fashion MNIST (consistent with previous versions)
  • Helm chart versions in all deploy.sh scripts reference 1.6.0 (not 1.1.1 or 1.4.2)
  • Ray image references use rayproject/ray:2.54.1 (not ray-ml which is discontinued)
    rayVersion fields match 2.54.1 in all YAML manifests
  • AKS API version is 2026-01-01 (GA, not preview) in main.tf files
  • Kubernetes version defaults are 1.33 in all variables.tf files

Other Information

Signed-off-by: Mark Rossetti <marosset@microsoft.com>
Signed-off-by: Mark Rossetti <marosset@microsoft.com>
Copy link
Copy Markdown

@mboersma mboersma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, I checked out this branch and tested the aks-classic variant locally as per the PR instructions. I verified the "What to Check" bullet points.

Training Accuracy passes: 'accuracy': 0.8716

👍🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants