This document explains how GitOps works in this platform using Flux CD.
GitOps is a way of managing infrastructure and applications where:
- Git is the source of truth: Everything is declared in Git
- Automated synchronization: Controllers ensure cluster matches Git
- Declarative: Describe desired state, not imperative steps
- Auditable: Git history provides complete audit trail
- Recoverable: Disaster recovery is as simple as pointing Flux at Git
We chose Flux over other GitOps tools because:
- Kubernetes-native: Uses CRDs and controllers, not external services
- Security: GitHub App authentication, no long-lived tokens
- Dependency management: Built-in support for resource dependencies
- Health checking: Waits for resources to be ready before proceeding
- CNCF project: Strong community, production-proven
Related: Technology Choices - Flux for GitOps
Flux deploys resources in a specific order based on dependencies. This ensures that foundational components are ready before dependent resources are created.
graph TD;
Namespaces-->CRDs;
CRDs-->Crossplane;
Crossplane-->EPIs["EKS Pod Identities"];
EPIs["EKS Pod Identities"]-->Security;
EPIs["EKS Pod Identities"]-->Infrastructure;
EPIs["EKS Pod Identities"]-->Observability;
Observability-->Apps["Other apps"];
Infrastructure-->Apps["Other apps"];
Security-->Infrastructure;
Security-->Observability
Why first? All other resources are scoped to namespaces.
# clusters/mycluster-0/namespaces.yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: namespaces
namespace: flux-system
spec:
interval: 10m
path: ./infrastructure/base/namespaces
prune: true
sourceRef:
kind: GitRepository
name: flux-systemWhy second? CRDs must exist before any custom resources can be created.
# clusters/mycluster-0/crds.yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: crds
namespace: flux-system
spec:
interval: 10m
path: ./crds/base
prune: false # Never prune CRDs to avoid data loss
sourceRef:
kind: GitRepository
name: flux-system
dependsOn:
- name: namespaces # Wait for namespacesNote: prune: false prevents accidental deletion of CRDs and their associated resources.
Why third? Crossplane controllers must be running before creating infrastructure resources.
# clusters/mycluster-0/crossplane.yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: crossplane
namespace: flux-system
spec:
interval: 10m
path: ./infrastructure/base/crossplane
prune: true
sourceRef:
kind: GitRepository
name: flux-system
dependsOn:
- name: crds # CRDs must exist
healthChecks:
- apiVersion: apps/v1
kind: Deployment
name: crossplane
namespace: crossplane-systemWhy fourth? IAM roles must exist before applications can use AWS APIs.
# clusters/mycluster-0/epis.yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: epis
namespace: flux-system
spec:
interval: 10m
path: ./security/base/epis
prune: true
sourceRef:
kind: GitRepository
name: flux-system
dependsOn:
- name: crossplane # Crossplane must be ready
healthChecks:
- apiVersion: cloud.ogenki.io/v1alpha1
kind: EKSPodIdentity
name: '*'
namespace: '*'Health Check: Uses CEL (Common Expression Language) to ensure all EPIs are ready:
# Custom condition checking
- type: Ready
status: "True"Why fifth? Security components (External Secrets, cert-manager, Kyverno) needed by infrastructure and apps.
# clusters/mycluster-0/security.yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: security
namespace: flux-system
spec:
interval: 10m
path: ./security/mycluster-0
prune: true
sourceRef:
kind: GitRepository
name: flux-system
dependsOn:
- name: epis # IAM roles for External SecretsKey Components:
- External Secrets Operator: Syncs secrets from AWS Secrets Manager / OpenBao
- cert-manager: TLS certificate management
- Kyverno: Policy enforcement
- ZITADEL: Identity and access management
Why sixth? Core infrastructure (networking, DNS, storage) needed by applications.
# clusters/mycluster-0/infrastructure.yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: infrastructure
namespace: flux-system
spec:
interval: 10m
path: ./infrastructure/mycluster-0
prune: true
sourceRef:
kind: GitRepository
name: flux-system
dependsOn:
- name: security # External Secrets needed
- name: epis # IAM roles neededKey Components:
- Cilium: Networking and network policies
- External DNS: Route53 synchronization
- Gateway API: Ingress controllers
- Karpenter: Node autoscaling
Why seventh? Monitoring stack depends on infrastructure and security.
# clusters/mycluster-0/observability.yaml
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: observability
namespace: flux-system
spec:
interval: 10m
path: ./observability/mycluster-0
prune: true
sourceRef:
kind: GitRepository
name: flux-system
dependsOn:
- name: infrastructure # Gateway, DNS needed
- name: security # Secrets, certificates neededKey Components:
- VictoriaMetrics: Metrics collection and storage
- VictoriaLogs: Log aggregation
- Grafana Operator: Dashboards and datasources
Why last? Apps depend on all platform services.
# clusters/mycluster-0/apps.yaml (example)
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: apps
namespace: flux-system
spec:
interval: 10m
path: ./apps/mycluster-0
prune: true
sourceRef:
kind: GitRepository
name: flux-system
dependsOn:
- name: infrastructure # Networking needed
- name: observability # Monitoring needed.
├── flux/ # Flux operator and core components
│ ├── base/
│ │ ├── flux-operator/ # Flux installation
│ │ ├── flux-instance/ # Flux configuration
│ │ └── notifications/ # Alert providers
│ └── mycluster-0/
│ └── kustomization.yaml
│
├── clusters/ # Cluster-specific Kustomizations
│ └── mycluster-0/
│ ├── namespaces.yaml # Layer 1
│ ├── crds.yaml # Layer 2
│ ├── crossplane.yaml # Layer 3
│ ├── epis.yaml # Layer 4
│ ├── security.yaml # Layer 5
│ ├── infrastructure.yaml # Layer 6
│ ├── observability.yaml # Layer 7
│ └── tooling.yaml # Layer 8
│
├── infrastructure/ # Infrastructure resources
│ ├── base/ # Base configurations
│ │ ├── namespaces/
│ │ ├── crossplane/
│ │ ├── cilium/
│ │ ├── gapi/ # Gateway API
│ │ └── external-dns/
│ └── mycluster-0/ # Cluster-specific overrides
│ └── kustomization.yaml
│
├── security/ # Security resources
│ ├── base/
│ │ ├── epis/ # EKS Pod Identities
│ │ ├── external-secrets/
│ │ ├── cert-manager/
│ │ ├── kyverno/
│ │ └── zitadel/
│ └── mycluster-0/
│
├── observability/ # Observability resources
│ ├── base/
│ │ ├── victoriametrics/
│ │ ├── victorialogs/
│ │ └── grafana/
│ └── mycluster-0/
│
└── tooling/ # Platform tools
├── base/
│ ├── harbor/
│ ├── headlamp/
│ └── homepage/
└── mycluster-0/
Flux supports variable substitution from ConfigMaps and Secrets. This enables cluster-specific configuration without duplicating manifests.
Source: clusters/mycluster-0/cluster-vars-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: eks-mycluster-0-vars
namespace: flux-system
data:
cluster_name: mycluster-0
cluster_region: eu-west-3
domain: priv.cloud.ogenki.io
vpc_cidr: 10.0.0.0/16Usage in Kustomization:
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: infrastructure
namespace: flux-system
spec:
# ... other fields ...
postBuild:
substitute:
cluster_name: "${cluster_name}"
domain: "${domain}"
substituteFrom:
- kind: ConfigMap
name: eks-mycluster-0-varsUsage in Manifests:
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: platform-tailscale-general # Or platform-tailscale-admin
namespace: infrastructure
spec:
gatewayClassName: cilium-tailscale
listeners:
- name: https
hostname: "*.${domain}" # Substituted to "*.priv.cloud.ogenki.io"
port: 443
protocol: HTTPSUse case: Sensitive values like API keys, tokens, passwords.
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: apps
namespace: flux-system
spec:
# ... other fields ...
postBuild:
substituteFrom:
- kind: Secret
name: app-secrets
optional: false # Fail if secret doesn't existSecurity Note: Secrets in Git should be encrypted (e.g., SOPS) or created by External Secrets Operator.
Flux can wait for specific resources to be healthy before proceeding with dependent Kustomizations.
healthChecks:
- apiVersion: apps/v1
kind: Deployment
name: crossplane
namespace: crossplane-systemChecks:
- Deployment exists
- Desired replicas match ready replicas
- No failed pods
healthChecks:
- apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
name: cilium
namespace: kube-systemChecks:
- HelmRelease is in
Readystate - Underlying resources are healthy
- No rollback occurred
healthChecks:
- apiVersion: cloud.ogenki.io/v1alpha1
kind: EKSPodIdentity
name: '*' # All EPIs in all namespaces
namespace: '*'Uses CEL (Common Expression Language) for custom conditions:
# In the CRD definition
status:
conditions:
- type: Ready
status: "True"
reason: AvailableStatusReady# View all Flux resources
flux get all
# View specific types
flux get kustomizations
flux get helmreleases
flux get sources git# Reconcile a specific Kustomization
flux reconcile kustomization infrastructure
# Reconcile a HelmRelease
flux reconcile helmrelease cilium -n kube-system
# Reconcile a GitRepository
flux reconcile source git flux-systemUse case: Prevent Flux from making changes during maintenance
# Suspend all Kustomizations
flux suspend kustomization --all
# Suspend specific Kustomization
flux suspend kustomization apps
# Resume
flux resume kustomization --all
flux resume kustomization apps# Get detailed status
flux get kustomization infrastructure
# View events
flux events --for Kustomization/infrastructure
# Check logs
flux logs --kind=kustomization --name=infrastructure# Show dependency tree
flux tree kustomization apps
# Expected output:
# Kustomization/flux-system/apps
# └── Kustomization/flux-system/infrastructure
# └── Kustomization/flux-system/security
# └── Kustomization/flux-system/epis
# └── Kustomization/flux-system/crossplane
# └── Kustomization/flux-system/crds
# └── Kustomization/flux-system/namespacesFlux can send notifications on sync events, failures, and warnings.
# flux/base/notifications/slack-alert.yaml
apiVersion: notification.toolkit.fluxcd.io/v1beta3
kind: Alert
metadata:
name: slack
namespace: flux-system
spec:
providerRef:
name: slack
eventSeverity: info
eventSources:
- kind: Kustomization
name: '*'
- kind: HelmRelease
name: '*'Provider:
apiVersion: notification.toolkit.fluxcd.io/v1beta3
kind: Provider
metadata:
name: slack
namespace: flux-system
spec:
type: slack
channel: "#gitops-notifications"
secretRef:
name: slack-webhook-url # From External Secrets-
Create Branch
git checkout -b feature/update-app
-
Make Changes
# Edit manifests vim infrastructure/base/cilium/helmrelease.yaml -
Validate Locally
# Run pre-commit hooks pre-commit run --all-files # Validate with kubeconform kubeconform infrastructure/base/cilium/
-
Commit and Push
git add . git commit -m "feat: update Cilium to v1.15" git push origin feature/update-app
-
Create Pull Request
- CI validates changes
- Reviewers approve
- Merge to main
-
Flux Syncs Automatically
# Flux detects changes (default: 1 minute) # Reconciles affected Kustomizations # Health checks ensure successful deployment
Git-based rollback (recommended):
# Revert commit
git revert <commit-hash>
git push origin main
# Flux will automatically apply the previous stateManual rollback:
# Suspend to prevent Flux from re-applying
flux suspend kustomization apps
# Manually revert changes
kubectl rollout undo deployment/myapp -n apps
# Resume when ready
flux resume kustomization appsTo bootstrap Flux on a new cluster:
# 1. Export GitHub token
export GITHUB_TOKEN=<your-token>
# 2. Bootstrap Flux
flux bootstrap github \
--owner=Smana \
--repository=cloud-native-ref \
--branch=main \
--path=clusters/new-cluster \
--personal
# 3. Flux installs itself and starts reconcilingAlternative: Use Flux Operator (current approach in this repo)
# Install Flux Operator
kubectl apply -f flux/base/flux-operator/
# Create FluxInstance
kubectl apply -f flux/base/flux-instance/Why? More secure than personal access tokens:
- Fine-grained repository access
- Can be scoped to specific repositories
- Easier to rotate and audit
Setup:
- Create GitHub App in repository settings
- Store credentials in AWS Secrets Manager
- External Secrets syncs to Kubernetes Secret
- FluxInstance references Secret
Flux controllers run with least privilege:
- Can only reconcile resources in allowed namespaces
- Cannot modify cluster-wide resources (except CRDs)
- Service accounts scoped per controller
Never commit secrets to Git!
# ❌ BAD: Secret in Git
apiVersion: v1
kind: Secret
metadata:
name: api-key
data:
key: bXktc2VjcmV0 # Base64 is not encryption!
# ✅ GOOD: External Secret
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: api-key
spec:
secretStoreRef:
name: aws-secrets-manager
target:
name: api-key
data:
- secretKey: key
remoteRef:
key: /apps/myapp/api-key# Check status
flux get kustomization infrastructure
# Common issues:
# 1. Dependency not ready
# → Check dependsOn Kustomizations
# 2. Health check failing
# → Investigate unhealthy resources
# 3. Validation error
# → Check flux logs for YAML errors# Check HelmRelease
flux get helmrelease cilium -n kube-system
# Common issues:
# 1. Values error
# → Validate Helm values syntax
# 2. Chart not found
# → Check HelmRepository sync
# 3. Upgrade timeout
# → Increase .spec.timeout# Check if Flux created it
kubectl get <resource> -n <namespace>
# If not, check Kustomization
flux describe kustomization <name>
# Look for:
# - Prune settings (might delete unexpected resources)
# - Namespace issues (resource in wrong namespace)
# - RBAC (Flux lacks permissions)- Use Dependencies: Ensure proper ordering with
dependsOn - Health Checks: Always check critical resources are ready
- Immutable Infrastructure: Don't manually modify cluster resources
- Small Changes: Commit frequently, deploy incrementally
- Test in Dev: Validate changes in development cluster first
- Monitor Flux: Set up notifications for failures
- Document Decisions: Explain non-obvious dependencies in comments
- Technology Choices - Why Flux
- Crossplane - Infrastructure compositions
- CI Workflows - Validation before merge
- Flux Documentation