Skip to content

Upgrade AKS to Kubernetes 1.33.0#388

Open
gizmo-rt wants to merge 34 commits intodevelopmentfrom
k8s-upgrade/azure-1.33
Open

Upgrade AKS to Kubernetes 1.33.0#388
gizmo-rt wants to merge 34 commits intodevelopmentfrom
k8s-upgrade/azure-1.33

Conversation

@gizmo-rt
Copy link
Contributor

Summary

  • Upgrade AKS default Kubernetes version from 1.32.0 to 1.33.0
  • Update k8s_version output to match

Test plan

  • tofu validate in k8s/azure/aks/
  • tofu plan to confirm in-place cluster upgrade with no unexpected replacements

jatintalgotra-zd and others added 30 commits January 21, 2026 17:33
* add openobserve support for aws (#347)

Co-authored-by: jatintalgotra-zd <jatin.talgotra@zop.dev>

* Removes deprecated template file

* updates in variables

* ident updates and reorganise

---------

Co-authored-by: Keerthana R <keerthana.rajasekaran@zop.dev>
Co-authored-by: jatintalgotra-zd <jatin.talgotra@zop.dev>
* updates event handling

* checks for event
* fix mimir and tempo for the gcp

* fix tempo for the gcp
* populate output for openobserve in gcp

* populate output for openobserve in gcp
* upgrade fluent-bit version in gcp

* upgrade fluent-bit version in aws, oci and azure
gizmo-rt and others added 4 commits February 17, 2026 22:24
* fix(aws): critical security vulnerabilities across AWS modules

- Remove hardcoded Nessus API key from user-data.tpl, replace with variable
- Set EKS endpoint to private access only (endpoint_private_access=true, endpoint_public_access=false)
- Remove SSH port 22 open to 0.0.0.0/0 from external_worker_group_mgmt and all_worker_mgmt security groups
- Add S3 server-side encryption (KMS) and public access blocks to object-storage and all observability buckets
- Disable force_destroy on all S3 buckets (object-storage, loki, cortex, mimir, tempo, openobserve)
- Set skip_final_snapshot=false with final_snapshot_identifier on RDS primary and read replica
- Change enable_ssl default to true for AWS RDS
- Scope Velero IAM S3 permissions to specific bucket instead of wildcard
- Downgrade zop-system cluster-admin binding to edit role

* fix(aws): resolve deployment issues found during validation

- Remove deprecated --container-runtime containerd bootstrap arg (incompatible with AL2023 AMIs, unnecessary for K8s 1.24+)
- Update default PostgreSQL version from 16.1 to 16.12 (16.1 not available in AWS RDS)
- Add depends_on to cert-manager CRD manifests to prevent webhook race condition

* removes duplicate blocks and updates legacy code

* fix(aws): restore public endpoint access for EKS cluster API

Private-only endpoint breaks Terraform provisioning since Helm/kubectl
resources need to reach the cluster API from outside the VPC. Keep both
private and public access enabled.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(aws): add lifecycle prevent_destroy to all S3 data buckets

Addresses PR review feedback to add prevent_destroy lifecycle blocks
to object-storage and observability S3 buckets (cortex, mimir, loki,
tempo, openobserve) to retain data even during destroy operations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix(gcp): enable SSL by default for GCP Cloud SQL

- Change require_ssl default to true for GCP Cloud SQL
- Change enable_ssl default to true for GCP Cloud SQL
- Ensures database connections require TLS by default

* fix(gcp): security hardening for GCS buckets and RBAC

- Disable force_destroy on all observability GCS buckets (cortex, loki, mimir, tempo, openobserve)
- Enable uniform_bucket_level_access and public_access_prevention on all observability GCS buckets
- Downgrade zop-system ClusterRoleBinding from cluster-admin to edit

* fix(gcp): add lifecycle prevent_destroy to all GCS data buckets

Addresses PR review feedback to add prevent_destroy lifecycle blocks
to cortex, mimir, loki, tempo, and openobserve data buckets to retain
observability data even during destroy operations.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix(azure): critical security vulnerabilities in Azure SQL modules

- Restrict Azure PostgreSQL firewall from 0.0.0.0-255.255.255.255 to Azure services only (0.0.0.0-0.0.0.0)
- Restrict Azure MySQL firewall from 0.0.0.0-255.255.255.255 to Azure services only (0.0.0.0-0.0.0.0)
- Change enable_ssl default to true for Azure PostgreSQL
- Add enable_ssl variable to Azure MySQL with default true
- Make MySQL require_secure_transport conditional on enable_ssl variable instead of always OFF

* fix(azure): cert-manager race condition and overly permissive RBAC

- Add depends_on to kubectl_manifest resources to prevent cert-manager
  webhook race condition during cluster provisioning
- Downgrade zop-system ClusterRoleBinding from cluster-admin to edit
  for principle of least privilege

* Enable SSL for Grafana PostgreSQL connection on Azure
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The VNet changes need to be promoted along with the corresponding infra-updater changes. Since we’re modifying vnet-related variables, moving them independently could result in a variable mismatch and would impact cluster creation.

@gizmo-rt gizmo-rt changed the base branch from main to development February 18, 2026 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants