EAI-6030: Add per-hardware-family AIM model source selection#741
Open
pre wants to merge 2 commits into
Open
Conversation
Convert sources/aim-cluster-model-source from an ArgoCD directory app into a Helm chart that renders either the legacy generic model sources (default, when hardwareFamilies is empty) or per-hardware-family AIMClusterModelSource resources (cpu, epyc, instinct, radeon). The legacy branch reproduces the existing amd-aim-release-* resources unchanged so ArgoCD does not prune or recreate existing installs. The app's hardwareFamilies value is supplied as a structured YAML list via valuesObject (cluster-bloom injects the selected families at deploy time), so no comma parsing is involved on any hop. The base default is an empty list, preserving legacy behavior. Part of EAI-6030.
6 tasks
For medium/large clusters ArgoCD reads cluster-values/values.yaml, which the gitea-init-job rebuilds from a template rather than copying the seeded complete_values.yaml wholesale. Add an aimHardwareFamily value and emit the apps.aim-cluster-model-source.valuesObject.hardwareFamilies block into cluster-values when set, mirroring the existing airmImageRepository handling. Without it the chart fell back to the legacy install-all branch on medium/large. Part of EAI-6030.
This was referenced Jun 11, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Related
Summary
Converts
sources/aim-cluster-model-sourcefrom an ArgoCD directory app into a Helm chart that installs either the legacy generic model sources (default) or per-hardware-familyAIMClusterModelSourceresources, selected byhardwareFamilies.hardwareFamiliesis empty, renders the existingamd-aim-release-*sources (0.8.5, 0.9.0, 0.10.0, 0.11.0) unchanged, so ArgoCD does not prune/recreate existing installs.cpu,epyc,instinct,radeon).valuesObject(cluster-bloom injects it at deploy time, see the companion cluster-bloom PR). No comma-as-list-separator pitfall on any hop;cluster-apps.yamlis untouched.Pairs with cluster-bloom PR for the
AIM_HARDWARE_FAMILYinstall flag. Part of EAI-6030. This covers the AIM-catalog portion only; the ROCm 7.13 / GPU Operator profile defaults are separate work.Design notes
instinct/radeonare GPU families;cpu/epycare CPU inference targets, hence "hardware family" rather than "GPU family".cpuandradeonare pinned as placeholders onghcr.io(requireghcr-regcred, not provisioned), accepted, their pull fails until a docker.io release exists.Test plan
helm lint sources/aim-cluster-model-source/hardwareFamilies=[instinct]→ instinct onlyhardwareFamilies=[epyc,instinct]→ both, nothing elsevalues:(default[]= legacy)