Skip to content

fix: propagate numCores and extraCores from wekacluster to wekacontainers per role (OP-260)#2395

Open
kristina-solovyova wants to merge 1 commit intomainfrom
03-25-fix_propagate_numcores_and_extracores_from_wekacluster_to_wekacontainers_per_role
Open

fix: propagate numCores and extraCores from wekacluster to wekacontainers per role (OP-260)#2395
kristina-solovyova wants to merge 1 commit intomainfrom
03-25-fix_propagate_numcores_and_extracores_from_wekacluster_to_wekacontainers_per_role

Conversation

@kristina-solovyova
Copy link
Collaborator

No description provided.

@github-actions
Copy link

github-actions bot commented Mar 25, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Snapshot Warnings

⚠️: No snapshots were found for the head SHA 97c6000.
Ensure that dependencies are being submitted on PR branches and consider enabling retry-on-snapshot-warnings. See the documentation for more information and troubleshooting advice.

Scanned Files

None

Copy link
Collaborator Author

kristina-solovyova commented Mar 25, 2026


How to use the Graphite Merge Queue

Add the label main-merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has required the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@kristina-solovyova kristina-solovyova changed the base branch from main to graphite-base/2395 March 25, 2026 12:32
@kristina-solovyova kristina-solovyova force-pushed the 03-25-fix_propagate_numcores_and_extracores_from_wekacluster_to_wekacontainers_per_role branch from abce97b to 50df2fb Compare March 25, 2026 12:32
@kristina-solovyova kristina-solovyova changed the base branch from graphite-base/2395 to 01-21-feat_add_smbw_support March 25, 2026 12:32
@kristina-solovyova kristina-solovyova force-pushed the 03-25-fix_propagate_numcores_and_extracores_from_wekacluster_to_wekacontainers_per_role branch from 50df2fb to 10dbe9f Compare March 25, 2026 12:45
@assafgi assafgi changed the base branch from 01-21-feat_add_smbw_support to graphite-base/2395 March 25, 2026 12:45
@weka weka deleted a comment from claude bot Mar 25, 2026
@weka weka deleted a comment from claude bot Mar 25, 2026
@kristina-solovyova kristina-solovyova force-pushed the 03-25-fix_propagate_numcores_and_extracores_from_wekacluster_to_wekacontainers_per_role branch from 10dbe9f to a5f51e7 Compare March 25, 2026 12:53
@kristina-solovyova kristina-solovyova changed the base branch from graphite-base/2395 to 01-21-feat_add_smbw_support March 25, 2026 12:53
@kristina-solovyova kristina-solovyova marked this pull request as ready for review March 25, 2026 12:53
@kristina-solovyova kristina-solovyova requested a review from a team as a code owner March 25, 2026 12:53
@claude
Copy link

claude bot commented Mar 25, 2026

Claude finished @kristina-solovyova's task in 1m 35s —— View job


Code Review

  • Gather context (parent PR diff, changed files)
  • Review the changed file
  • Post review feedback

Summary

This PR does two things: (1) fixes a real bug where NumCores was never propagated from WekaCluster to WekaContainer specs, and (2) refactors the repeated switch role statements into a unified forRole() helper. The changes are clean and well-motivated.


Observations

Bug fix: NumCores propagation (funcs_upgrade.go:399)

NumCores was simply absent from HandleSpecUpdates. The fix is correct — GetNonZeroOrDefault preserves existing container values when the cluster spec has 0 (unset), which is the right semantic.

container.Spec.NumCores = util.GetNonZeroOrDefault(rv.NumCores, container.Spec.NumCores)

One implicit design decision worth documenting: once NumCores is set on a container, it can never be reduced to 0 via the cluster spec (since 0 is treated as "not set"). This appears intentional but is worth a comment if not already covered elsewhere.


Overrides refactor (funcs_upgrade.go:296–346)

The old pattern fetched overrides multiple times and conditionally wrote back. The new pattern fetches once, mutates, and unconditionally writes back at the end:

overrides := container.Spec.GetOverrides()
// ... modify fields ...
container.Spec.Overrides = overrides  // always assigned

This relies on client.MergeFrom to suppress no-op API calls, which is correct and idiomatic. No functional regression.

One subtle behavior change: UpgradePreventEviction on compute containers is now always written (even when value didn't change). This is fine since the patch mechanism handles it, but the old code's explicit guards made the intent clearer.


forRole() helper (funcs_upgrade.go:45–61)

Clean consolidation. The fallthrough to return containerRoleSpec{} for unknown roles is correct — unknown roles will get zero values and GetNonZeroOrDefault will preserve existing container fields.


dynamicTemplate nil-guard removal (funcs_upgrade.go:127–144)

The old code had a manual nil check:

// old
if spec.Dynamic != nil {
    computeExtraCores = spec.Dynamic.ComputeExtraCores
}

The new code reads from dynamicTemplate, which is always non-nil (falls back to the default template from GetWekaClusterTemplate). When spec.Dynamic == nil, the default template returns 0 for ExtraCores fields, so behavior is preserved.


Potential issue: SmbwHugepages default uses tmpl.Cores.Smbw (funcs_upgrade.go:224)

SmbwHugepages: util.GetNonZeroOrDefault(dynamicTemplate.SmbwFrontendHugepages, 1400*tmpl.Cores.Smbw),

Note tmpl (the default template from GetWekaClusterTemplate(spec.Dynamic)) is used rather than dynamicTemplate for the Smbw core count. This is intentional — you want the core count from the resolved template, not the raw config — but it's subtle. If spec.Dynamic specifies SmbwCores but SmbwFrontendHugepages is unset, the default is computed from the resolved core count, which is correct. No issue here, just worth noting.


Verdict

The refactoring is solid — no regressions visible, the NumCores propagation fix is the key value, and Smbw support is properly wired in throughout. The PR is ready to merge after the stack is resolved.

@graphite-app graphite-app bot requested review from assafgi and tigrawap March 25, 2026 12:54
@graphite-app
Copy link

graphite-app bot commented Mar 25, 2026

Graphite Automations

"Add anton/matt/sergey/kristina as reviwers on operator PRs" took an action on this PR • (03/25/26)

2 reviewers were added to this PR based on Anton Bykov's automation.

@kristina-solovyova kristina-solovyova force-pushed the 03-25-fix_propagate_numcores_and_extracores_from_wekacluster_to_wekacontainers_per_role branch from a5f51e7 to 716e161 Compare March 25, 2026 12:55
@claude
Copy link

claude bot commented Mar 25, 2026

Claude finished @kristina-solovyova's task in 31s —— View job


No changes since the last review — the PR still contains the same single commit (716e1611). See the previous review above for full feedback.

@kristina-solovyova kristina-solovyova changed the title fix: propagate numCores and extraCores from wekacluster to wekacontainers per role fix: propagate numCores and extraCores from wekacluster to wekacontainers per role (OP-260) Mar 25, 2026
@kristina-solovyova kristina-solovyova force-pushed the 03-25-fix_propagate_numcores_and_extracores_from_wekacluster_to_wekacontainers_per_role branch from 716e161 to 97c6000 Compare March 25, 2026 13:44
@claude
Copy link

claude bot commented Mar 25, 2026

Claude finished @kristina-solovyova's task in 1m 39s —— View job


  • Check for new changes since last review
  • Review changes
  • Post review feedback

Code Review (updated commit 97c60007)

The commit was amended since the last review. Key changes from the prior version:


New: explicit NumCores increase-only semantics (funcs_upgrade.go:399–402)

// NumCores can only be increased — decreasing requires manual intervention (container restart/reconfiguration).
if rv.NumCores > container.Spec.NumCores {
    container.Spec.NumCores = rv.NumCores
}

The previous version used GetNonZeroOrDefault (which would silently preserve existing values when cluster spec has 0). This version uses a > guard, which is more explicit and correct: it documents intent, allows reducing to a positive but lower value by hand, and still won't trigger on unset (0) values in the cluster spec. Good improvement.


New: ExtraCores vs NumCores semantic comment (funcs_upgrade.go:394–395)

// ExtraCores allows zero (explicit removal), so uses equality guard.
// NumCores and Hugepages treat zero as "not set" — only update when non-zero.

This directly addresses the implicit design decision flagged in the previous review. No changes needed.


New: Smbw fully wired throughout

SmbwCores, SmbwExtraCores, SmbwHugepages, SmbwHugepagesOffset are now added to UpdatableClusterSpec, populated from dynamicTemplate in NewUpdatableClusterSpec, and included in forRole(). The SmbwHugepages default (1400 * tmpl.Cores.Smbw) uses the resolved template core count, which is correct.

One edge case: if tmpl.Cores.Smbw == 0 (no Smbw cores configured), the default hugepages will be 0. GetNonZeroOrDefault will then fall back to the container's existing value, so no regression — but it's worth confirming this is intended when Smbw is not configured.


DriversBuildId comparison simplified (funcs_upgrade.go:410–412)

oldId, newId := container.Spec.DriversBuildId, updatableSpec.DriversBuildId
if (oldId == nil) != (newId == nil) || (oldId != nil && *oldId != *newId) {

Cleaner than the previous triple-condition. Logically equivalent.


Verdict

All issues from the previous review are addressed. The Smbw propagation is complete, NumCores semantics are correctly documented and implemented, and the refactoring is clean. Ready to merge once the stack is resolved.

Base automatically changed from 01-21-feat_add_smbw_support to main March 25, 2026 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant