Skip to content

[DRAFT] poc seamless worker update#2372

Draft
itechdima wants to merge 1 commit intomainfrom
SCHED-1255/poc-seamless-worker-update
Draft

[DRAFT] poc seamless worker update#2372
itechdima wants to merge 1 commit intomainfrom
SCHED-1255/poc-seamless-worker-update

Conversation

@itechdima
Copy link
Copy Markdown
Collaborator

Problem

Solution

Testing

Release Notes

@itechdima itechdima force-pushed the SCHED-1255/poc-seamless-worker-update branch 2 times, most recently from 4f0c567 to d5b6aef Compare March 30, 2026 16:33
@Uburro Uburro requested a review from Copilot March 30, 2026 16:34
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a NodeSet-level update strategy knob for worker (OpenKruise) StatefulSets and introduces a “self-restart” mechanism for workers by deleting the running pod via the Kubernetes API, with corresponding CRD/schema and RBAC updates.

Changes:

  • Add spec.updateStrategy to NodeSet API/CRDs and plumb it through values/rendering.
  • Render Kruise StatefulSet UpdateStrategy / VolumeClaimUpdateStrategy based on the configured strategy.
  • Change worker reboot script to delete its own pod, and grant worker Role permission to delete pods.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
internal/values/slurm_nodeset.go Plumbs UpdateStrategy from NodeSet spec into rendered values.
internal/render/worker/statefulset.go Switches StatefulSet strategy rendering to be configurable via UpdateStrategy.
internal/render/worker/role.go Expands worker RBAC to allow pod deletion (needed by reboot script).
internal/consts/statefulset.go Introduces UpdateStrategy enum constants.
images/common/scripts/reboot.sh Replaces host reboot with Kubernetes API pod deletion.
api/v1alpha1/nodeset_types.go Adds spec.updateStrategy to the NodeSet API type with validation/default.
config/crd/bases/slurm.nebius.ai_nodesets.yaml CRD schema update for updateStrategy (incl. required/default/enum).
helm/soperator/crds/slurmcluster-crd.yaml Helm-packaged CRD schema update for updateStrategy.
helm/soperator-crds/templates/slurmcluster-crd.yaml Helm template CRD schema update for updateStrategy.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread internal/render/worker/statefulset.go
Comment thread api/v1alpha1/nodeset_types.go
Comment thread internal/consts/statefulset.go
Comment thread config/crd/bases/slurm.nebius.ai_nodesets.yaml
Comment thread internal/render/worker/statefulset.go
Comment thread internal/render/worker/statefulset.go
Comment thread internal/render/worker/statefulset.go
Comment thread images/common/scripts/reboot.sh Outdated
Comment thread internal/render/worker/role.go Outdated
@itechdima itechdima force-pushed the SCHED-1255/poc-seamless-worker-update branch 3 times, most recently from 0fff765 to f64ac3b Compare March 31, 2026 15:15
@itechdima itechdima force-pushed the SCHED-1255/poc-seamless-worker-update branch 5 times, most recently from a37c18c to 59c0114 Compare April 10, 2026 16:04
@itechdima itechdima force-pushed the SCHED-1255/poc-seamless-worker-update branch 2 times, most recently from 0fa5087 to 883f660 Compare April 24, 2026 16:13
@itechdima itechdima force-pushed the SCHED-1255/poc-seamless-worker-update branch from 4581f65 to 4395c64 Compare May 1, 2026 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants