This repo manages cloud infrastructure deployed to AWS via Spacelift CI/CD using OpenTofu. While originally focused on EKS, it now also manages additional AWS services including S3, and SES.
eks-stack/
main.tf # Root Spacelift administrative stack
provider.tf # Spacelift provider config
common-resources/ # Shared Spacelift policies, contexts, AWS integrations
deployments/
main.tf # Wires spacelift pipeline configs per environment (dev/staging/prod)
spacelift/ # CI pipeline definitions (HOW things get deployed)
dpe-k8s/ # Pipeline config for EKS stacks
stacks/ # Cloud resource definitions (WHAT gets deployed)
dpe-k8s/ # VPC, EKS cluster, SES, S3 buckets
dpe-k8s-deployments/ # K8s-internal: ArgoCD, Airflow, monitoring, etc.
modules/ # Reusable Terraform modules
docs/ # Workshop materials
scripts/ # Utility scripts
Before modifying modules, stacks, or configs:
- Spacelift access — Request access to the Spacelift UI from Lingling or Bryan
- AWS SSO — Configure SSO profiles for the target AWS accounts (see Connecting to an EKS Cluster)
- OpenTofu — Install OpenTofu for local development/testing
- kubectl (optional) — For Kubernetes cluster access, install kubectl
deployments/main.tf unless your branch is pointed at main.
When working on feature branches that target other feature branches (rather than main), Spacelift may not recognize changes to deployments/main.tf because it reads configuration from the targeted branch. This can cause variable resolution errors where required variables appear undefined even though they're properly configured in your branch.
Deploying a resource involves a two-step process:
-
deployments/spacelift/<name>/defines the CI pipeline in Spacelift. This controls which stacks can be deployed, to which AWS account, with what environment variables. Think of this as the build server job configuration. -
deployments/stacks/<name>/defines what cloud resources are provisioned when that pipeline runs. Each stack is a Terraform root module that typically sources reusable modules frommodules/. Think of this as the infrastructure blueprint.
When Spacelift runs a pipeline, it executes tofu plan/apply against the corresponding stack directory.
graph TD
ROOT["main.tf<br/><i>Root Administrative Stack</i>"] --> COMMON["module 'common'<br/>common-resources/"]
ROOT --> REGISTRY["module 'terraform-registry'<br/>modules/"]
ROOT --> DEPLOY["module 'deployments'<br/>deployments/main.tf"]
COMMON ~~~ NOTE_C["Shared policies, contexts,<br/>and AWS integrations"]
REGISTRY ~~~ NOTE_R["Registers reusable modules<br/>in Spacelift's module registry"]
DEPLOY --> DEV["spacelift_space<br/><b>development</b>"]
DEPLOY --> STG["spacelift_space<br/><b>staging</b>"]
DEPLOY --> PROD["spacelift_space<br/><b>production</b>"]
DEV --> DEV_MOD["module 'dpe-sandbox-spacelift-development'<br/>source = ./spacelift/dpe-k8s"]
STG --> STG_MOD["module 'dpe-sandbox-spacelift-staging'<br/>source = ./spacelift/dpe-k8s"]
PROD --> PROD_MOD["module 'dpe-sandbox-spacelift-production'<br/>source = ./spacelift/dpe-k8s"]
DEV_MOD --> STACKS["deployments/stacks/dpe-k8s/<br/>deployments/stacks/dpe-k8s-deployments/"]
STG_MOD --> STACKS
PROD_MOD --> STACKS
style ROOT fill:#4a90d9,color:#fff
style STACKS fill:#2ecc71,color:#fff
style NOTE_C fill:none,stroke:none
style NOTE_R fill:none,stroke:none
| Scenario | Where to Add | Example |
|---|---|---|
| Kubernetes application/service that runs inside the EKS cluster | Add module to deployments/stacks/dpe-k8s-deployments/ |
ArgoCD, Airflow, monitoring tools |
| Core AWS infrastructure or EKS cluster configuration | Add module to deployments/stacks/dpe-k8s/ |
VPC, EKS addons, S3 buckets |
| Independent resource with its own deployment lifecycle | Create a new stack (follow "Adding a New Stack" below) | Standalone SES config, separate API Gateway |
Rule of thumb: If your resource needs to be deployed/destroyed independently from the EKS cluster, create a new stack. If it's a Kubernetes workload, add it to dpe-k8s-deployments. If it's AWS infrastructure that the cluster depends on, add it to dpe-k8s.
Follow these steps to add a new independently-deployed resource:
Create a new directory in modules/<your-module>/ with at minimum:
main.tf- the cloud resources to createvariables.tf- configurable inputsoutputs.tf- values other stacks may needversions.tf- required providers
See the modules README for guidelines.
Create a new directory in deployments/stacks/<your-stack>/ with:
main.tf- sources your module:source = "../../../modules/<your-module>"variables.tf- environment-specific inputs (passed asTF_VAR_*from Spacelift)outputs.tf- values to exportprovider.tf- AWS provider configurationversions.tf- required providers and OpenTofu version
Create a new directory in deployments/spacelift/<your-stack>/ with:
main.tf- defines:spacelift_space- a logical grouping in Spaceliftspacelift_stack- pointsproject_rootto your stack directoryspacelift_environment_variable- passesTF_VAR_*variablesspacelift_aws_integration_attachment- binds AWS credentials
variables.tf- inputs from the parent moduleoutputs.tf- stack IDs for referenceversions.tf- Spacelift provider
Add a module block in deployments/main.tf that sources your new spacelift config and passes the required variables (parent_space_id, admin_stack_id, aws_integration_id, git_branch, etc.).
Once merged to main, the root administrative stack detects the changes and creates the new Spacelift stacks automatically.
To see your newly created stacks and monitor deployments:
- Spacelift UI — Log into the Spacelift UI to view stacks, runs, and logs. As of February 2026, a common DPE team account has not been created — contact Lingling or Bryan for access.
- AWS Console — Log into the AWS Console via JumpCloud for the target account to view the deployed cloud resources directly.
Reusable Terraform modules in modules/:
AWS Infrastructure
sage-aws-vpc- VPC with public/private subnetssage-aws-eks- EKS cluster provisioningsage-aws-eks-addons- Post-creation EKS addons (CoreDNS, EBS CSI, GuardDuty)sage-aws-ses- Simple Email Service setups3-bucket- S3 bucket with optional public access and IRSA
Kubernetes Cluster Management
sage-aws-k8s-node-autoscaler- Node autoscaling via Spot.io Oceancert-manager- TLS certificate provisioningenvoy-gateway- API Gateway with TLS termination
Application Deployment (K8s)
apache-airflow- Workflow orchestrationargo-cd- GitOps continuous deliveryflux-cd- Alternative GitOps toolpostgres-cloud-native- PostgreSQL instance via CloudNativePGpostgres-cloud-native-operator- CloudNativePG operator
Monitoring and Security
victoria-metrics- Prometheus-compatible metrics collectiontrivy-operator- Container security scanning
API and Messaging
aws-api-gateway- API Gateway resourcesaws-sqs- Simple Queue Service
CI/CD
spacelift-private-worker- Spacelift private worker setup
Demos
demo-network-policies- Kubernetes network policy examplesdemo-pod-level-security-groups-strict- Pod-level security group examples
The following sections apply specifically to the EKS cluster stacks (dpe-k8s and dpe-k8s-deployments).
The VPC is created with the AWS VPC Terraform module. It contains a number of defaults for our use-case at Sage. See the module definition for details.
AWS EKS is a managed Kubernetes cluster. We provide configurable parameters to run workloads on top of it.
API access to the Kubernetes cluster endpoint is set to Public and private.
Reading:
Public: Allows connections via kubectl from outside the VPC. Access is secured using a combination of AWS IAM and native Kubernetes RBAC.
Private: All communication between worker nodes and the API server stays within the VPC. You can limit the IP addresses that can access the API server from the internet, or completely disable internet access to it.
The VPC CNI (Container Network Interface) plugin allocates VPC IP addresses to Kubernetes nodes and configures networking for Pods on each node.
Allows assigning EC2 security groups directly to pods running in EKS. This can be used as an alternative or in conjunction with Kubernetes network policies.
See modules/demo-pod-level-security-groups-strict for an example.
Controls network traffic within the cluster (e.g., pod-to-pod traffic).
See modules/demo-network-policies for an example.
Further reading:
- https://docs.aws.amazon.com/eks/latest/userguide/cni-network-policy.html
- https://docs.aws.amazon.com/eks/latest/userguide/security-groups-for-pods.html
- https://aws.amazon.com/blogs/containers/introducing-security-groups-for-pods/
- https://kubernetes.io/docs/concepts/services-networking/network-policies/
We use Spot.io to manage EKS cluster nodes. It has scale-to-zero capabilities and dynamically adds or removes nodes based on demand. The autoscaler is provided as a Terraform module (sage-aws-k8s-node-autoscaler).
Spot.io setup (manual, per AWS account):
- Subscribe through the AWS Marketplace: https://aws.amazon.com/marketplace/saas/ordering?productId=bc241ac2-7b41-4fdd-89d1-6928ec6dae15
- "Set up your account" on the Spot.io website and link it to an existing organization
- Link the account through the AWS UI:
- Create a policy (see the JSON in the Spot.io UI)
- Create a role (see instructions in the Spot.io UI)
- Get an API token:
- Log into the Spot UI: https://console.spotinst.com/settings/v2/tokens/permanent
- Create a new Permanent token named
{AWS-Account-Name}-token - Copy the token and create an
AWS Secrets ManagerPlaintext secret namedspotinst_tokenwith descriptionSpot.io token
To connect via kubectl, ensure you have SSO set up for the target account:
# Login with your SSO profile (e.g., dpe-prod-admin)
aws sso login --profile dpe-prod-admin
# Update kubeconfig to authenticate using the SSO profile and assume the eks_admin_role
aws eks update-kubeconfig --region us-east-1 --name dpe-k8 --profile dpe-prod-adminAWS GuardDuty provides audit trails for the EKS cluster with two components:
Initial configuration is handled through the securitycentral IT account. Runtime Monitoring is installed via Terraform modules so it can be torn down with the VPC and EKS cluster.
We also use the trivy-operator for Kubernetes-native security scanning. As resources are deployed, Trivy generates vulnerability reports. policy-reporter provides a UI for reviewing results. SBOM (Software Bill of Materials) reports are used to track security advisories.
Deployment of applications to the Kubernetes cluster uses Terraform, Spacelift, and ArgoCD or FluxCD.
See the modules README for supplemental information.
- Create a new directory in
./modulesnamed after what you are deploying - At minimum define
main.tfandversions.tfwith the cloud resources and required providers - Add any additional files and resources as needed
ArgoCD is a declarative GitOps continuous delivery tool for Kubernetes. It continuously monitors Kubernetes resources to align expected state with actual state, with support for Helm charts and Kubernetes YAML files.
The declarative setup uses ArgoCD CRDs. We typically create an Application Specification and use Multiple Sources for an Application to install public Helm charts with custom values.yaml files. See the Apache Airflow module README for a real example.
As of August 2024, access to resources on the Kubernetes cluster is through kubectl port-forward sessions. No internet-facing load balancers are available.
Using a tool like K9s, navigate to the pod and start a port-forward session, then open localhost at the specified port in your browser.
(Future work for better secrets management: https://sagebionetworks.jira.com/browse/IBCDPE-1038)
Most resources have a login page requiring username and password stored as base64-encoded Kubernetes secrets:
- ArgoCD: Secret
argocd-initial-admin-secret, usernameadmin - Grafana: Secret
victoria-metrics-k8s-stack-grafana, usernameadmin
We use a DPE service account (dpesagebionetworks) to authenticate Docker Hub pulls, avoiding anonymous rate limits.
Setup steps:
- Log into Docker Hub with credentials stored in LastPass
- Create a new Personal Access Token
- Add it to the Spacelift "Kubernetes Deployments" stack as
TF_VAR_docker_access_token - Add the variable to
variables.tfin the relevant module - Add a Kubernetes secret to
main.tffor each namespace needing authenticated pulls - Update Helm charts to reference the secret per this guide
- Deploy via Terraform and apply via ArgoCD or FluxCD
variables.tf:
variable "docker_server" {
description = "The docker registry URL"
default = "https://index.docker.io/v1/"
type = string
}
variable "docker_username" {
description = "Username to log into docker for authenticated pulls"
default = "dpesagebionetworks"
type = string
}
variable "docker_access_token" {
description = "The access token to use for docker authenticated pulls. Created via by setting 'TF_VAR_docker_access_token' within spacelift as an environment variable"
type = string
}
variable "docker_email" {
description = "The email for the docker account"
default = "dpe@sagebase.org"
type = string
}main.tf:
resource "kubernetes_secret" "docker-cfg" {
metadata {
name = "docker-cfg"
namespace = var.namespace
}
type = "kubernetes.io/dockerconfigjson"
data = {
".dockerconfigjson" = jsonencode({
auths = {
"${var.docker_server}" = {
"username" = var.docker_username,
"password" = var.docker_access_token,
"email" = var.docker_email
"auth" = base64encode("${var.docker_username}:${var.docker_access_token}")
}
}
})
}
}To fully tear down EKS infrastructure, destroy in this order:
- Go into the ArgoCD UI and delete all applications
- Run
tofu destroy --auto-approveas a task in Spacelift for the Kubernetes Deployments stack - Run
tofu destroy --auto-approveas a task in Spacelift for the Infrastructure stack
Reference: https://docs.spacelift.io/integrations/cloud-providers/aws#setup-guide
- Create a new IAM role (e.g.,
spacelift-admin-role) with description: "Role for Spacelift CI/CD to assume when deploying resources managed by Terraform" - Use this custom trust policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::324880187172:root"
},
"Action": "sts:AssumeRole",
"Condition": {
"StringLike": {
"sts:ExternalId": "sagebionetworks@*"
}
}
},
{
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::{{AWS ACCOUNT ID}}:root"
},
"Action": "sts:AssumeRole"
}
]
}- Attach these policies to the role:
PowerUserAccess- An inline policy for IAM operations (needed if Terraform creates/edits/deletes IAM roles/policies):
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"iam:*Role",
"iam:*RolePolicy",
"iam:*RolePolicies",
"iam:*Policy",
"iam:*PolicyVersion",
"iam:*OpenIDConnectProvider",
"iam:*InstanceProfile",
"iam:ListPolicyVersions",
"iam:UpdateOpenIDConnectProviderThumbprint",
"iam:ListGroupsForUser",
"iam:ListAttachedUserPolicies"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"iam:CreateUser",
"iam:AttachUserPolicy",
"iam:ListPolicies",
"iam:TagUser",
"iam:GetUser",
"iam:DeleteUser",
"iam:CreateAccessKey",
"iam:ListAccessKeys",
"iam:DeleteAccessKey"
],
"Resource": "arn:aws:iam::{{AWS ACCOUNT ID}}:user/smtp_user"
}
]
}- Add a new
spacelift_aws_integrationresource to thecommon-resources/aws-integrationsdirectory.