crms-devops · JashwanthMU · Jun 28, 2026 · Jun 28, 2026 · Jun 28, 2026
diff --git a/docs/week-06-notes.md b/docs/week-06-notes.md
@@ -0,0 +1,33 @@
+# Week 6 Notes — Terraform VPC Foundation
+
+## What we built
+- infra/terraform/main.tf: AWS provider ~> 5.0, ap-south-1 region
+- infra/terraform/vpc.tf: VPC 10.0.0.0/16, IGW, public/private subnets, route tables
+- infra/terraform/security_groups.tf: EKS nodes SG, RDS SG
+- infra/terraform/variables.tf: VPC CIDR, subnets, AZs, environment
+- infra/terraform/outputs.tf: vpc_id, subnet IDs
+- infra/terraform/terraform.tfvars: dev environment, ap-south-1
+
+## Key decisions
+- NAT Gateway excluded from dev — saves $32/month
+- All resources tagged: Project=CRMS, ManagedBy=Terraform
+- ap-south-1 (Mumbai) — closest region to Coimbatore
+
+## Commands used
+```bash
+terraform init
+terraform plan       # always plan before apply
+terraform apply      # creates 14 resources in ~30 seconds
+terraform destroy    # tears down to $0 when not in use
+```
+
+## Milestone
+crms-dev-vpc visible in AWS Console ap-south-1
+14 resources created: VPC, IGW, 4 subnets, 2 route tables,
+4 associations, 2 security groups
+
+## What we learned
+- How Terraform providers and resources work
+- Why IaC matters: infrastructure is disposable, code is permanent
+- How VPC subnets, route tables, and IGW connect together
+- terraform plan shows exactly what will change before touching AWS
diff --git a/docs/week-07-notes.md b/docs/week-07-notes.md
@@ -0,0 +1,30 @@
+# Week 7 Notes — S3 Remote Backend for Terraform State
+
+## What we built
+- S3 bucket: crms-terraform-state-055237683990
+- Versioning enabled — full state history preserved
+- AES256 encryption — state files encrypted at rest
+- Public access blocked — no accidental exposure
+- main.tf updated with backend "s3" block
+
+## Why remote state matters
+Without S3 backend, terraform.tfstate lives on your laptop.
+If laptop dies or two engineers run terraform simultaneously — state corrupts.
+S3 backend: state is shared, versioned, encrypted, team-accessible.
+
+## Commands used
+```bash
+aws s3api create-bucket --bucket crms-terraform-state-055237683990 --region ap-south-1 --create-bucket-configuration LocationConstraint=ap-south-1
+aws s3api put-bucket-versioning --bucket crms-terraform-state-055237683990 --versioning-configuration Status=Enabled
+terraform init -migrate-state   # migrates local state to S3
+aws s3 ls s3://crms-terraform-state-055237683990/dev/
+```
+
+## Milestone
+terraform.tfstate in S3 — team can now collaborate on infra safely
+State never stored in Git — .gitignore covers *.tfstate
+
+## What we learned
+- Why .gitignore must exclude .terraform/, *.tfstate, *.tfstate.backup
+- How terraform backend migration works
+- Remote state enables team collaboration on infrastructure
diff --git a/docs/week-08-notes.md b/docs/week-08-notes.md
@@ -0,0 +1,35 @@
+# Week 8 Notes — EKS Cluster on AWS
+
+## What we built
+- infra/terraform/eks.tf: EKS cluster + node group
+- aws_iam_role.eks_cluster: control plane IAM role
+- aws_iam_role.eks_nodes: worker node IAM role
+- 4 IAM policy attachments: EKSClusterPolicy, WorkerNode, CNI, ECR
+- EKS cluster: crms-dev, Kubernetes 1.31, ap-south-1
+- Node group: t3.small x1 (min 1, max 2) — cost optimised
+
+## Cost awareness
+- EKS control plane: $0.10/hour
+- t3.small node: $0.023/hour
+- Total: ~$0.123/hour
+- Always terraform destroy after testing
+
+## Commands used
+```bash
+terraform apply -auto-approve          # creates cluster in ~10 mins
+aws eks update-kubeconfig --name crms-dev --region ap-south-1
+kubectl get nodes                       # verify worker node READY
+kubectl get namespaces                  # default, kube-system, kube-public
+kubectl cluster-info                    # control plane endpoint
+terraform destroy -auto-approve        # destroy after testing
+```
+
+## Milestone
+kubectl get nodes output:
+ip-10-0-x-x.ap-south-1.compute.internal   Ready   v1.31.14-eks-3385e9b
+
+## What we learned
+- How EKS control plane + worker nodes relate
+- Why IAM roles are required for both control plane and nodes
+- t3.small vs t3.medium trade-offs for cost vs capacity
+- aws eks update-kubeconfig wires kubectl to the cluster
diff --git a/docs/week-09-notes.md b/docs/week-09-notes.md
@@ -0,0 +1,47 @@
+# Week 9 Notes — CRMS Deployed on Kubernetes
+
+## What we built
+- k8s/base/namespace.yaml: crms namespace
+- k8s/base/configmap.yaml: non-secret app config
+- k8s/base/secret.yaml: DATABASE_URL + SECRET_KEY
+- k8s/base/postgres-deployment.yaml: PostgreSQL + ClusterIP service
+- k8s/base/backend-deployment.yaml: FastAPI x2 replicas + health probes
+- k8s/base/frontend-deployment.yaml: React/nginx x2 + LoadBalancer service
+- k8s/base/hpa.yaml: HPA backend 2-10 pods, frontend 2-5 pods
+
+## Critical fix applied
+nginx upstream was "backend" — EKS has no DNS for that.
+Fixed to: crms-backend-service.crms.svc.cluster.local:8000
+Full Kubernetes DNS format: <service>.<namespace>.svc.cluster.local
+
+## Commands used
+```bash
+kubectl apply -f k8s/base/namespace.yaml   # always apply namespace first
+kubectl apply -f k8s/base/
+kubectl get pods -n crms -w
+kubectl get service crms-frontend-service -n crms
+kubectl logs <pod-name> -n crms            # debug crashes
+kubectl describe pod <pod-name> -n crms    # diagnose pending/error
+```
+
+## Milestone
+All 5 pods 1/1 Running:
+- crms-backend x2
+- crms-frontend x2
+- crms-db x1
+
+LoadBalancer URL live:
+ade5ebcd818514ed69e44386d4b3ca72.elb.ap-south-1.amazonaws.com
+
+## What we learned
+- Kubernetes DNS: services reachable by full DNS name within cluster
+- Namespace ordering matters: apply namespace before other resources
+- HPA scales pods based on CPU/memory — works with Kubernetes metrics server
+- LoadBalancer type service = AWS NLB provisioned automatically
+- imagePullPolicy: Always ensures latest image is pulled from GHCR
+
+## Issues and fixes
+- CrashLoopBackOff: nginx couldn't resolve "backend" hostname
+  Fix: use full K8s DNS crms-backend-service.crms.svc.cluster.local
+- Namespace not found on first apply: applied namespace.yaml separately first
+- GHCR images must be public for EKS to pull without auth
diff --git a/docs/week-11-notes.md b/docs/week-11-notes.md
@@ -0,0 +1,53 @@
+# Week 11 Notes — Prometheus + Grafana Observability
+
+## What we built
+- backend/app/main.py: prometheus-fastapi-instrumentator added
+  - Exposes /metrics endpoint automatically
+  - Tracks: request count, latency histograms, in-progress requests
+- observability/prometheus-values.yaml: kube-prometheus-stack Helm values
+  - Prometheus retention: 24h
+  - Scrapes crms-backend:8000/metrics
+  - Grafana enabled with admin password
+- observability/grafana-dashboards/crms-dashboard.json:
+  - Panel 1: API request rate (requests/sec)
+  - Panel 2: p99 API response latency
+  - Panel 3: Pod CPU usage by pod name
+  - Panel 4: Pod memory usage by pod name
+- observability/prometheus-rules.yaml: 3 alert rules
+  - CRMSHighErrorRate: fires if 5xx rate > 10% for 2m
+  - CRMSHighLatency: fires if p99 > 1s for 5m
+  - CRMSPodDown: fires if backend replicas < 1
+
+## Deploy monitoring stack
+```bash
+helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
+helm repo update
+kubectl create namespace monitoring
+helm upgrade --install kube-prometheus-stack \
+  prometheus-community/kube-prometheus-stack \
+  --namespace monitoring \
+  --values observability/prometheus-values.yaml
+kubectl port-forward svc/kube-prometheus-stack-grafana -n monitoring 3000:80
+```
+
+## Access Grafana
+URL: http://localhost:3000
+Username: admin
+Password: crms-grafana-admin
+
+## Key metrics exposed by /metrics
+- http_requests_total: total requests by method, path, status
+- http_request_duration_seconds: latency histogram
+- http_requests_inprogress: concurrent requests
+
+## What we learned
+- How Prometheus scrapes metrics via pull model
+- PromQL: rate(), histogram_quantile(), kube_ metrics
+- Grafana dashboards: panels, data sources, time ranges
+- Alert rules: expr, for, severity, annotations
+- Why observability matters: you can't fix what you can't measure
+
+## Note on resources
+kube-prometheus-stack requires t3.medium (4GB RAM) to run
+alongside CRMS application pods. t3.small runs out of memory.
+Production: use t3.medium or larger nodes.