Skip to content

SRE-Keith Bachand: Interview#271

Open
xoibsurferx wants to merge 15 commits intoTekmetric:masterfrom
xoibsurferx:sre-kb-demo
Open

SRE-Keith Bachand: Interview#271
xoibsurferx wants to merge 15 commits intoTekmetric:masterfrom
xoibsurferx:sre-kb-demo

Conversation

@xoibsurferx
Copy link

@xoibsurferx xoibsurferx commented Feb 13, 2026

Summary

  • Productionized the backend deployment with GitOps (Argo CD), a production‑leaning Helm chart, and observability endpoints.
  • Added a Kind + Argo CD runbook so the demo can be reproduced end‑to‑end.

Changes

  • Added Spring Boot Actuator + Prometheus registry and exposed health/metrics endpoints (backend/pom.xml, backend/src/main/resources/application.properties).
  • Updated Helm chart to use /actuator/health probes, optional startupProbe, image digest support, and optional ingress TLS (sre/charts/interview-backend).
  • Added Prometheus scrape annotations and an optional ServiceMonitor (sre/charts/interview-backend/templates/service.yaml, .../servicemonitor.yaml).
  • Added system component Kustomize base for ingress‑nginx + metrics‑server (sre/system/...).
  • Added Argo CD install via Kustomize with explicit namespace creation and pinned Argo CD version (sre/argocd/install).
  • Added Argo CD Application for system components and wired into apps kustomization (sre/argocd/applications).
  • Updated kind config to label the node for ingress‑nginx scheduling (sre/kind/kind-config.yaml).
  • Added a dedicated demo runbook with Argo CD UI/login and app status checks (sre/RUNBOOK.md).

Why

  • Observability is required; Actuator + Prometheus provides standard health/metrics endpoints.
  • GitOps via Argo CD keeps system components and app deployments consistent and auditable.
  • Startup/readiness/liveness probes and resource limits improve reliability under load.
  • Image digest support and TLS options move the chart closer to production readiness.
  • Kind node label prevents ingress‑nginx from staying Pending in single‑node demos.
  • Runbook satisfies the “clear instructions” requirement without altering the original assignment README.

Validation

Install Argo CD (use server‑side apply to avoid CRD annotation size errors)

kubectl apply --server-side -k sre/argocd/install
kubectl -n argocd rollout status deploy/argocd-server

Apply GitOps apps (system + backend)

kubectl apply -k sre/argocd/applications

Check Argo CD app status

argocd app list
argocd app get system
argocd app get interview-backend

Validate API

kubectl -n interview port-forward svc/interview-backend 8080:80
curl http://localhost:8080/api/welcome
curl http://localhost:8080/actuator/health

AWS/EKS Notes (What I'd Change migrating this to AWS/EKS)

This repo is set up for a local demo on kind. For a production‑grade AWS/EKS deployment,
I would shift infrastructure to Terraform and adjust platform components accordingly.

Infrastructure as Code (Terraform)

  • Provision VPC, subnets, NAT, security groups, EKS cluster, managed node groups.
  • Create IAM OIDC provider + IRSA roles for Argo CD, external‑dns, cert‑manager, etc.
  • Create ECR repos and push images there (update image repo in Helm values).

Ingress, TLS, and DNS

  • Replace ingress-nginx with AWS Load Balancer Controller (ALB ingress).
  • Use ACM for TLS certs and cert‑manager if needed for other issuers.
  • Use external‑dns to manage Route 53 records automatically.

Autoscaling and Observability

  • Use Karpenter or Cluster Autoscaler for node scaling.
  • Wire metrics to Prometheus + Grafana (or AMP/CloudWatch) and enable ServiceMonitor.
  • Add log aggregation to CloudWatch or a centralized logging stack.

Data and Secrets

  • Replace H2 with RDS (Postgres/MySQL) and use Secrets Manager or SSM.
  • Update Helm chart to read DB config from Secrets/ConfigMaps.

Security and Reliability

  • Use private subnets, restrict security group ingress, and tighten NetworkPolicies.
  • Add PodSecurity standards, admission policies (OPA/Gatekeeper or Kyverno), and
    image policy enforcement (sigstore/cosign).

These changes keep the demo flow intact locally while outlining a path to
production‑grade deployment on AWS/EKS.

@xoibsurferx xoibsurferx changed the title SRE-Keith Bachand: Initial commit/changes SRE-Keith Bachand: Interview Feb 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants