Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions docs/week-06-notes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# Week 6 Notes — Terraform VPC Foundation

## What we built
- infra/terraform/main.tf: AWS provider ~> 5.0, ap-south-1 region
- infra/terraform/vpc.tf: VPC 10.0.0.0/16, IGW, public/private subnets, route tables
- infra/terraform/security_groups.tf: EKS nodes SG, RDS SG
- infra/terraform/variables.tf: VPC CIDR, subnets, AZs, environment
- infra/terraform/outputs.tf: vpc_id, subnet IDs
- infra/terraform/terraform.tfvars: dev environment, ap-south-1

## Key decisions
- NAT Gateway excluded from dev — saves $32/month
- All resources tagged: Project=CRMS, ManagedBy=Terraform
- ap-south-1 (Mumbai) — closest region to Coimbatore

## Commands used
```bash
terraform init
terraform plan # always plan before apply
terraform apply # creates 14 resources in ~30 seconds
terraform destroy # tears down to $0 when not in use
```

## Milestone
crms-dev-vpc visible in AWS Console ap-south-1
14 resources created: VPC, IGW, 4 subnets, 2 route tables,
4 associations, 2 security groups

## What we learned
- How Terraform providers and resources work
- Why IaC matters: infrastructure is disposable, code is permanent
- How VPC subnets, route tables, and IGW connect together
- terraform plan shows exactly what will change before touching AWS
30 changes: 30 additions & 0 deletions docs/week-07-notes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Week 7 Notes — S3 Remote Backend for Terraform State

## What we built
- S3 bucket: crms-terraform-state-055237683990
- Versioning enabled — full state history preserved
- AES256 encryption — state files encrypted at rest
- Public access blocked — no accidental exposure
- main.tf updated with backend "s3" block

## Why remote state matters
Without S3 backend, terraform.tfstate lives on your laptop.
If laptop dies or two engineers run terraform simultaneously — state corrupts.
S3 backend: state is shared, versioned, encrypted, team-accessible.

## Commands used
```bash
aws s3api create-bucket --bucket crms-terraform-state-055237683990 --region ap-south-1 --create-bucket-configuration LocationConstraint=ap-south-1
aws s3api put-bucket-versioning --bucket crms-terraform-state-055237683990 --versioning-configuration Status=Enabled
terraform init -migrate-state # migrates local state to S3
aws s3 ls s3://crms-terraform-state-055237683990/dev/
```

## Milestone
terraform.tfstate in S3 — team can now collaborate on infra safely
State never stored in Git — .gitignore covers *.tfstate

## What we learned
- Why .gitignore must exclude .terraform/, *.tfstate, *.tfstate.backup
- How terraform backend migration works
- Remote state enables team collaboration on infrastructure
35 changes: 35 additions & 0 deletions docs/week-08-notes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Week 8 Notes — EKS Cluster on AWS

## What we built
- infra/terraform/eks.tf: EKS cluster + node group
- aws_iam_role.eks_cluster: control plane IAM role
- aws_iam_role.eks_nodes: worker node IAM role
- 4 IAM policy attachments: EKSClusterPolicy, WorkerNode, CNI, ECR
- EKS cluster: crms-dev, Kubernetes 1.31, ap-south-1
- Node group: t3.small x1 (min 1, max 2) — cost optimised

## Cost awareness
- EKS control plane: $0.10/hour
- t3.small node: $0.023/hour
- Total: ~$0.123/hour
- Always terraform destroy after testing

## Commands used
```bash
terraform apply -auto-approve # creates cluster in ~10 mins
aws eks update-kubeconfig --name crms-dev --region ap-south-1
kubectl get nodes # verify worker node READY
kubectl get namespaces # default, kube-system, kube-public
kubectl cluster-info # control plane endpoint
terraform destroy -auto-approve # destroy after testing
```

## Milestone
kubectl get nodes output:
ip-10-0-x-x.ap-south-1.compute.internal Ready v1.31.14-eks-3385e9b

## What we learned
- How EKS control plane + worker nodes relate
- Why IAM roles are required for both control plane and nodes
- t3.small vs t3.medium trade-offs for cost vs capacity
- aws eks update-kubeconfig wires kubectl to the cluster
47 changes: 47 additions & 0 deletions docs/week-09-notes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Week 9 Notes — CRMS Deployed on Kubernetes

## What we built
- k8s/base/namespace.yaml: crms namespace
- k8s/base/configmap.yaml: non-secret app config
- k8s/base/secret.yaml: DATABASE_URL + SECRET_KEY
- k8s/base/postgres-deployment.yaml: PostgreSQL + ClusterIP service
- k8s/base/backend-deployment.yaml: FastAPI x2 replicas + health probes
- k8s/base/frontend-deployment.yaml: React/nginx x2 + LoadBalancer service
- k8s/base/hpa.yaml: HPA backend 2-10 pods, frontend 2-5 pods

## Critical fix applied
nginx upstream was "backend" — EKS has no DNS for that.
Fixed to: crms-backend-service.crms.svc.cluster.local:8000
Full Kubernetes DNS format: <service>.<namespace>.svc.cluster.local

## Commands used
```bash
kubectl apply -f k8s/base/namespace.yaml # always apply namespace first
kubectl apply -f k8s/base/
kubectl get pods -n crms -w
kubectl get service crms-frontend-service -n crms
kubectl logs <pod-name> -n crms # debug crashes
kubectl describe pod <pod-name> -n crms # diagnose pending/error
```

## Milestone
All 5 pods 1/1 Running:
- crms-backend x2
- crms-frontend x2
- crms-db x1

LoadBalancer URL live:
ade5ebcd818514ed69e44386d4b3ca72.elb.ap-south-1.amazonaws.com

## What we learned
- Kubernetes DNS: services reachable by full DNS name within cluster
- Namespace ordering matters: apply namespace before other resources
- HPA scales pods based on CPU/memory — works with Kubernetes metrics server
- LoadBalancer type service = AWS NLB provisioned automatically
- imagePullPolicy: Always ensures latest image is pulled from GHCR

## Issues and fixes
- CrashLoopBackOff: nginx couldn't resolve "backend" hostname
Fix: use full K8s DNS crms-backend-service.crms.svc.cluster.local
- Namespace not found on first apply: applied namespace.yaml separately first
- GHCR images must be public for EKS to pull without auth
53 changes: 53 additions & 0 deletions docs/week-11-notes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Week 11 Notes — Prometheus + Grafana Observability

## What we built
- backend/app/main.py: prometheus-fastapi-instrumentator added
- Exposes /metrics endpoint automatically
- Tracks: request count, latency histograms, in-progress requests
- observability/prometheus-values.yaml: kube-prometheus-stack Helm values
- Prometheus retention: 24h
- Scrapes crms-backend:8000/metrics
- Grafana enabled with admin password
- observability/grafana-dashboards/crms-dashboard.json:
- Panel 1: API request rate (requests/sec)
- Panel 2: p99 API response latency
- Panel 3: Pod CPU usage by pod name
- Panel 4: Pod memory usage by pod name
- observability/prometheus-rules.yaml: 3 alert rules
- CRMSHighErrorRate: fires if 5xx rate > 10% for 2m
- CRMSHighLatency: fires if p99 > 1s for 5m
- CRMSPodDown: fires if backend replicas < 1

## Deploy monitoring stack
```bash
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
kubectl create namespace monitoring
helm upgrade --install kube-prometheus-stack \
prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--values observability/prometheus-values.yaml
kubectl port-forward svc/kube-prometheus-stack-grafana -n monitoring 3000:80
```

## Access Grafana
URL: http://localhost:3000
Username: admin
Password: crms-grafana-admin

## Key metrics exposed by /metrics
- http_requests_total: total requests by method, path, status
- http_request_duration_seconds: latency histogram
- http_requests_inprogress: concurrent requests

## What we learned
- How Prometheus scrapes metrics via pull model
- PromQL: rate(), histogram_quantile(), kube_ metrics
- Grafana dashboards: panels, data sources, time ranges
- Alert rules: expr, for, severity, annotations
- Why observability matters: you can't fix what you can't measure

## Note on resources
kube-prometheus-stack requires t3.medium (4GB RAM) to run
alongside CRMS application pods. t3.small runs out of memory.
Production: use t3.medium or larger nodes.
Loading