Skip to content

EAI-6030: Add AIM_HARDWARE_FAMILY flag for model source selection#256

Open
pre wants to merge 3 commits into
mainfrom
EAI-6030-aim-gpu-family
Open

EAI-6030: Add AIM_HARDWARE_FAMILY flag for model source selection#256
pre wants to merge 3 commits into
mainfrom
EAI-6030-aim-gpu-family

Conversation

@pre

@pre pre commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Related

Summary

Adds the AIM_HARDWARE_FAMILY install flag that selects which AIM model sources cluster-forge installs, by hardware family.

  • Comma-separated value: cpu, epyc, instinct, radeon. Empty (default) = install the full legacy catalog (no behavior change).
  • New hardwareFamilyList schema type validates the value via the existing pattern/examples harness (no new Go validator).
  • cluster-bloom splits the comma-string into a YAML list and injects it into cluster-forge's apps.aim-cluster-model-source.valuesObject.hardwareFamilies:
    • small clusters: --set-json on the ArgoCD render (argocd_deploy.yaml)
    • medium/large: yq into the gitea-persisted values (bootstrap_gitea.yaml)
  • Passing a structured list avoids Helm's comma-as-list-separator pitfall entirely, no escaping needed at the bloom layer.

Pairs with cluster-forge PR #741 (the chart side). Part of EAI-6030.

Notes

  • The plan originally targeted create_cluster_forge_app.yaml, but that task renders only the parent app-of-apps (--show-only templates/cluster-forge.yaml) and cannot carry child-app values. The active small-path applier is argocd_deploy.yaml, so injection lives there instead.
  • Bumped the LoadSchema field-count assertion to 38 (it was already stale before this change).

Test plan

  • go build ./...
  • go vet ./pkg/config/...
  • go test ./pkg/config/... (new TestHardwareFamilyListPattern + validator auto-coverage of valid/invalid examples)
  • Filter chain produces [] for empty, correct JSON arrays otherwise (incl. whitespace trim)
  • yq list write survives the size-merge step
  • Reviewer: end-to-end bloom run on a cluster (ansible/helm not runnable in this env)

Add the AIM_HARDWARE_FAMILY install flag (comma-separated:
cpu,epyc,instinct,radeon; empty = legacy install-all). A new
hardwareFamilyList schema type validates the value via the existing
pattern/examples harness.

cluster-bloom splits the flag into a YAML list and injects it into
cluster-forge's apps.aim-cluster-model-source.valuesObject.hardwareFamilies:
via --set-json on the small-cluster ArgoCD render (argocd_deploy.yaml) and
via yq into the gitea-persisted values for medium/large (bootstrap_gitea.yaml).
Passing a structured list avoids Helm's comma-as-list-separator pitfall
entirely, no escaping needed.

Bump the LoadSchema field-count assertion (was already stale) to 38.

Part of EAI-6030.
pre added 2 commits June 10, 2026 14:48
The gitea-persistence yq task interpolated the JSON array inside a
double-quoted yq expression, so the shell stripped the inner quotes and
yq received a bare [epyc, instinct] it could not lex. Single-quote the
expression so the JSON string quotes survive to yq.

Part of EAI-6030.
The previous yq edit to complete_values.yaml was dropped on medium/large,
the gitea-init-job rebuilds cluster-values from a template and only carries
over enabledApps plus a fixed global block, so apps.*.valuesObject overrides
never reached ArgoCD and the legacy branch rendered. Pass AIM_HARDWARE_FAMILY
to the init job via a values file (companion cluster-forge change writes it
into cluster-values), matching the disabledApps/airmImageRepository pattern.
Small clusters are unaffected (they use --set-json on the direct root render).

Part of EAI-6030.
@pre pre marked this pull request as ready for review June 10, 2026 15:34
@pre pre requested a review from a team as a code owner June 10, 2026 15:34
@pre pre requested review from brownzebra and oskarasbrink June 10, 2026 15:35
@pre pre force-pushed the EAI-6030-aim-gpu-family branch from 342f410 to 1aeab75 Compare June 11, 2026 08:37
@pre pre requested a review from blankdots June 12, 2026 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant