Skip to content

EAI-6866 Skip ClusterForge when CLUSTERFORGE_RELEASE is none#258

Open
pre wants to merge 2 commits into
EAI-6866-small-cluster-argocd-double-installfrom
EAI-6866-skip-clusterforge-when-none
Open

EAI-6866 Skip ClusterForge when CLUSTERFORGE_RELEASE is none#258
pre wants to merge 2 commits into
EAI-6866-small-cluster-argocd-double-installfrom
EAI-6866-skip-clusterforge-when-none

Conversation

@pre

@pre pre commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Related:

Stacked on #257. Review/merge #257 first; this PR's base is the #257 branch, and its diff will collapse to just this change once #257 merges.

Summary

CLUSTERFORGE_RELEASE: none (or "") is meant to install ArgoCD only and skip ClusterForge entirely. The legacy Deploy ClusterForge via ArgoCD (Small Clusters) task (argocd_deploy.yaml) contradicted that: even for none, it cloned cluster-forge and helm template ... | kubectl apply-ed the full app-of-apps, including the self-managing argocd Application.

That self-managing app re-templated ArgoCD via the argo-helm chart (sources/argocd/8.3.5, redis 7.2.8) on top of the already-installed ArgoCD core (core-install.yaml, redis 7.0.15). The chart's redis secret-init init container runs as the default ServiceAccount, which lacks secrets:create, so the new redis pod crash-looped (Init:CrashLoopBackOff) and left two stuck argocd-redis ReplicaSets.

Fix (Option B: none = ArgoCD only)

  • Remove the Deploy ClusterForge via ArgoCD (Small Clusters) task and the now-unreferenced argocd_deploy.yaml. It was small-only (never ran for medium/large) and its job (deploying ClusterForge) is invalid in the "skip ClusterForge" case.
  • The none/"" path now installs only ArgoCD core via deploy_k8s_apps/argocd.yaml, with no ClusterForge apps and a single redis.
  • Simplify the Parse ClusterForge Version gate, which no longer needs the small + INSTALL_ARGOCD branch (only argocd_deploy.yaml consumed clusterforge_version in that case; argocd.yaml uses ARGOCD_VERSION).

Medium/large and small+release are unaffected: they deploy ClusterForge through Setup ClusterForge -> clusterforge_setup.yaml -> bootstrap_argocd.yaml (the unified, size-aware path), which this PR does not touch.

Test plan

  • CLUSTER_SIZE: small + CLUSTERFORGE_RELEASE: none on a wiped single node: bloom completes failed=0.
  • Single argocd-redis ReplicaSet, no Init:CrashLoopBackOff; ArgoCD core pods (app-controller, applicationset, redis, repo-server) Running.
  • Zero ArgoCD Applications created (default AppProject only); Setup ClusterForge skipped.
  • (Covered by EAI-6866 Prevent small-cluster ArgoCD double-install #257) CLUSTER_SIZE: small + CLUSTERFORGE_RELEASE: main still deploys ClusterForge via the unified path with a single ArgoCD install.

CLUSTERFORGE_RELEASE "none"/"" is meant to install ArgoCD only and skip
ClusterForge entirely. The legacy "Deploy ClusterForge via ArgoCD (Small
Clusters)" task contradicted that: it cloned cluster-forge and applied the
full app-of-apps, including the self-managing argocd Application, which
re-templated redis via the argo-helm chart on top of the already-installed
ArgoCD core. The chart's redis secret-init init container runs as the
default ServiceAccount, which lacks secrets:create, so the new redis pod
crash-looped and left two argocd-redis ReplicaSets stuck.

Remove that task (and the now-unreferenced argocd_deploy.yaml) so the none
path installs only ArgoCD core via deploy_k8s_apps/argocd.yaml, with no
ClusterForge apps and a single redis. Simplify the Parse ClusterForge
Version gate, which no longer needs the small+argocd branch.
CLUSTERFORGE_RELEASE "none"/"" is meant to bring up the bare cluster with
no ClusterForge stack. The standalone "Setup ArgoCD Core" task in
deploy_k8s_apps/main.yaml contradicted that: its when-clause fired
precisely on CLUSTERFORGE_RELEASE in ["none", ""], so the none path still
installed ArgoCD.

Remove that task and the now-unreferenced argocd.yaml. ArgoCD is only ever
bootstrapped as part of ClusterForge (clusterforge_setup.yaml), which is
size-aware and runs only when a real release is set, so the want-ClusterForge
case is unaffected. The INSTALL_ARGOCD and ARGOCD_VERSION keys only fed the
removed path, so drop them from the schema, playbook defaults, qemu test and
docs. Fix the stale schema field-count assertion.
@pre pre marked this pull request as ready for review June 11, 2026 10:04
@pre pre requested a review from a team as a code owner June 11, 2026 10:04
@pre pre requested a review from blankdots June 12, 2026 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant