A production-grade DevOps implementation of a 3-tier microservices e-commerce backend.
| Layer | Tool |
|---|---|
| Source Control | GitHub |
| CI/CD | GitHub Actions |
| Containers | Docker (multi-stage builds) |
| Registry | AWS ECR |
| Infrastructure | Terraform |
| Orchestration | AWS EKS (Kubernetes 1.32) |
| Database | AWS RDS PostgreSQL 16 |
| Security | Trivy image scanning + Snyk |
| Secrets | Kubernetes Secrets |
- User Service — Node.js + Express (port 3000)
- Product Service — Python + FastAPI (port 8000)
- API Gateway — Nginx reverse proxy (port 80)
ecommerce-devops/ ├── user-service/ # Node.js API ├── product-service/ # Python FastAPI ├── gateway/ # Nginx config ├── infra/ │ ├── terraform/ # IaC - VPC, EKS, RDS │ └── k8s/ # Kubernetes manifests ├── .github/workflows/ # CI/CD pipelines └── docs/ # Architecture diagram
Every push to main triggers:
- Lint & test both services in parallel
- Trivy security scan — blocks on critical CVEs
- Build Docker images with git SHA tag
- Push to AWS ECR
- Rolling deploy to EKS
docker compose up --build
curl http://localhost/health
curl http://localhost/users
curl http://localhost/productscd infra/terraform/environments/dev
terraform init
terraform plan -var="db_password=YOUR_PASSWORD" -var="db_username=dbadmin"
terraform apply -var="db_password=YOUR_PASSWORD" -var="db_username=dbadmin"kubectl apply -f infra/k8s/configmaps/
kubectl apply -f infra/k8s/secrets/
kubectl apply -f infra/k8s/gateway/
kubectl apply -f infra/k8s/user-service/
kubectl apply -f infra/k8s/product-service/kubectl get pods
kubectl get svc gateway
curl http://GATEWAY_URL/healthChallenges you hit and what they teach interviewers
-
Docker DNS on VirtualBox Your VM couldn't pull images from Docker Hub. You fixed it by setting 8.8.8.8 in /etc/docker/daemon.json. What it teaches: Networking in containerized environments isn't automatic. DNS resolution differs between host, container, and Kubernetes — understanding layers matters.
-
Nginx upstream DNS resolution Gateway crashed because Nginx resolves upstreams at startup — if a service isn't ready, Nginx fails hard. Fixed with resolver 127.0.0.11 and dynamic $upstream variables. What it teaches: Service startup order and dependency management is a real production problem. Kubernetes readiness probes exist for exactly this reason.
-
EKS version + AMI compatibility K8s 1.29 AMI was unsupported. Skipping versions (1.29 → 1.32) also failed. Required destroy and fresh apply at 1.32. What it teaches: Cloud managed services deprecate versions fast. Pinning versions in IaC and staying current is operational discipline.
-
Terraform single-line variable syntax { type = string default = "value" } fails — Terraform requires multi-line for multiple arguments. What it teaches: IaC has strict syntax rules. Small formatting errors break entire deployments. Code review and terraform validate catch these before apply.
-
t3.micro pod limits on EKS AWS limits pods per node based on network interfaces. A single t3.micro maxed out at 4 pods — system pods consumed most slots, leaving no room for your services. What it teaches: Instance sizing in Kubernetes isn't just about CPU/RAM — ENI limits are a real constraint. This is a common interview question.
-
PostgreSQL reserved username admin is reserved in PostgreSQL — RDS rejected it. Changed to dbadmin. What it teaches: Read the docs before naming things. Reserved words in databases cause subtle failures that aren't obvious until runtime.
-
Rolling deployment deadlock maxUnavailable: 0 + a node at capacity = new pods can't start, old pods won't terminate. Deadlock. What it teaches: Rolling update strategy must be tuned to your cluster capacity. In production, always have headroom — at least 1 spare pod slot per node.
-
VM clock drift breaking AWS signatures AWS request signing uses timestamps — a drifted VM clock caused SignatureDoesNotMatch errors across Terraform and kubectl. What it teaches: Time synchronization is infrastructure. NTP misconfiguration breaks security-sensitive APIs silently
-
Resolved ConfigMap missing issue causing pod startup failure
-
Debugged AWS CNI IP exhaustion problem
-
Fixed 502 Bad Gateway due to Nginx misconfiguration
-
Handled Kubernetes scheduling limits due to node capacity
Microservices Architecture
- User Service and Product Service share a single RDS instance — each service should own its own schema in a production system
- No async communication between services — order processing would require an event bus (SQS/SNS)
- Only 2 services implemented — a real e-commerce backend needs Order, Cart, Payment, and Notification services
Security
- Kubernetes Secrets are base64-encoded, not encrypted at rest — production should use AWS Secrets Manager with IRSA (IAM Roles for Service Accounts)
- No TLS termination — production needs AWS Load Balancer Controller with cert-manager
Observability
- No metrics stack (Prometheus/Grafana) — skipped due to lab cost constraints
- No distributed tracing (AWS X-Ray or Jaeger)
- No centralized logging (ELK or Loki)
Networking
- Raw Nginx pod used as gateway — production should use NGINX Ingress Controller or AWS ALB Ingress Controller
- No Network Policies restricting service-to-service traffic
- Split RDS into per-service schemas (
user_db,product_db) - Add SQS event bus between services for async order processing
- Replace Kubernetes Secrets with AWS Secrets Manager + IRSA
- Add Prometheus + Grafana monitoring stack
- Implement NGINX Ingress Controller with TLS
- Add Order Service and Cart Service