A comprehensive, structured guide to mastering AWS DevOps practices with hands-on projects, free resources, and real-world examples.
- Introduction
- Prerequisites
- Learning Path
- 1. AWS Fundamentals
- 2. IAM - Identity and Access Management
- 3. EC2 - Elastic Compute Cloud
- 4. VPC - Virtual Private Cloud
- 5. AWS Security Best Practices
- 6. Route 53 - DNS Management
- 7. S3 - Simple Storage Service
- 8. AWS CLI and Automation
- 9. CloudFormation - Infrastructure as Code
- 10. Terraform on AWS
- 11. AWS Developer Tools - CodeCommit, CodeBuild, CodeDeploy
- 12. AWS CodePipeline - CI/CD Automation
- 13. CloudWatch - Monitoring and Logging
- 14. Lambda - Serverless Computing
- 15. EventBridge - Event-Driven Architecture
- 16. CloudFront - Content Delivery Network
- 17. ECR - Elastic Container Registry
- 18. ECS - Elastic Container Service
- 19. EKS - Elastic Kubernetes Service
- 20. RDS - Relational Database Service
- 21. Systems Manager and Secrets Manager
- 22. Elastic Load Balancer
- 23. AWS Cost Optimization
- 24. CloudTrail and Config - Compliance
- 25. AWS Migration Strategies
- Hands-On Projects
- AWS DevOps Tools Comparison
- Recommended Resources
- Interview Preparation
- Contributing
- License
AWS (Amazon Web Services) is the world's most comprehensive cloud platform, powering millions of businesses globally. This roadmap provides a structured, hands-on learning path specifically designed for DevOps engineers who want to master AWS.
- ☁️ Core AWS services for DevOps workflows
- 🔐 Security best practices and IAM management
- 🚀 CI/CD automation with AWS native tools
- 📦 Container orchestration with ECS and EKS
- 🏗️ Infrastructure as Code with CloudFormation and Terraform
- 📊 Monitoring, logging, and observability
- 💰 Cost optimization strategies
- 🎯 Real-world project implementations
- ✅ Structured Learning - Progressive difficulty from basics to advanced
- ✅ Hands-On Focus - Every section includes practical projects
- ✅ Free Resources - Prioritizes free learning materials
- ✅ Real-World Scenarios - Based on actual DevOps use cases
- ✅ Interview Ready - Includes interview questions and answers
- ✅ Cost-Conscious - Learn within AWS Free Tier limits
Before starting this roadmap, you should have:
- ✅ Basic understanding of Linux command line
- ✅ Familiarity with Git and version control
- ✅ Basic knowledge of networking concepts (IP, DNS, HTTP)
- ✅ Understanding of containerization (Docker basics)
- ✅ Programming/scripting knowledge (Python or Bash preferred)
New to these? Check out our general DevOps Roadmap first!
- Create AWS Account: aws.amazon.com
- Enable MFA on your root account (CRITICAL for security!)
- Set up billing alerts to avoid unexpected charges
- Activate Free Tier - Most services offer 12 months free
- Install AWS CLI:
# macOS brew install awscli # Linux curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" unzip awscliv2.zip sudo ./aws/install # Verify installation aws --version
Estimated Time: 4-6 hours
- Cloud computing concepts (IaaS, PaaS, SaaS)
- AWS Global Infrastructure (Regions, Availability Zones, Edge Locations)
- AWS Management Console navigation
- AWS Free Tier and billing basics
- Core AWS services overview
- 📺 AWS Certified Cloud Practitioner - Full Course - freeCodeCamp
- 📖 AWS Getting Started Resource Center
- 📚 AWS Cloud Practitioner Essentials
- 🎮 AWS Educate - Free training and credits
- Regions: Geographic locations with multiple data centers
- Availability Zones (AZs): Isolated locations within regions
- Edge Locations: CDN endpoints for CloudFront
- AWS Management Console: Web interface for AWS services
- AWS Free Tier: Limited free usage for 12 months
# Configure AWS CLI
aws configure
# Enter your Access Key ID, Secret Access Key, region (us-east-1), output format (json)
# Test your configuration
aws sts get-caller-identity
# List available regions
aws ec2 describe-regions --output table
# Check your AWS account ID
aws sts get-caller-identity --query Account --output textEstimated Time: 6-8 hours
- IAM users, groups, roles, and policies
- Principle of least privilege
- MFA (Multi-Factor Authentication) setup
- IAM best practices
- Access keys vs IAM roles
- Policy evaluation logic
- Cross-account access
- 📺 AWS IAM Tutorial for Beginners - Stephane Maarek
- 📖 IAM Best Practices - AWS Docs
- 🎯 IAM Policy Simulator
- 📚 AWS IAM Workshop
- Users: Individual identities with long-term credentials
- Groups: Collections of users with shared permissions
- Roles: Temporary credentials for services or federated users
- Policies: JSON documents defining permissions
- MFA: Additional security layer using time-based codes
- Access Keys: Programmatic access credentials (avoid when possible!)
- IAM Role for EC2: Best practice for granting EC2 instances AWS permissions
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject"
],
"Resource": "arn:aws:s3:::my-bucket/*",
"Condition": {
"IpAddress": {
"aws:SourceIp": "203.0.113.0/24"
}
}
}
]
}Goal: Create a secure IAM structure for a development team
Steps:
- Create IAM groups (Admins, Developers, ReadOnly)
- Create IAM users and assign to groups
- Attach appropriate managed policies
- Create custom policy for S3 bucket access
- Enable MFA for all users
- Create an IAM role for EC2 instances
- Test permissions using IAM Policy Simulator
# Create an IAM group
aws iam create-group --group-name Developers
# Attach a policy to the group
aws iam attach-group-policy \
--group-name Developers \
--policy-arn arn:aws:iam::aws:policy/PowerUserAccess
# Create an IAM user
aws iam create-user --user-name john-developer
# Add user to group
aws iam add-user-to-group \
--user-name john-developer \
--group-name Developers
# Create custom policy
cat > developer-s3-policy.json << EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["s3:*"],
"Resource": "arn:aws:s3:::dev-bucket/*"
}
]
}
EOF
aws iam create-policy \
--policy-name DeveloperS3Access \
--policy-document file://developer-s3-policy.json- ✅ Enable MFA on all user accounts
- ✅ Use IAM roles for EC2 instead of access keys
- ✅ Rotate credentials regularly
- ✅ Apply least privilege principle
- ✅ Use AWS Organizations for multi-account management
- ✅ Enable CloudTrail to log all IAM actions
Estimated Time: 8-10 hours
- EC2 instance types and families
- Amazon Machine Images (AMIs)
- Security Groups and Network ACLs
- SSH key pairs and instance access
- User Data scripts for automation
- EC2 pricing models (On-Demand, Reserved, Spot)
- Auto Scaling Groups
- Elastic IP addresses
- Instance metadata and user data
- 📺 AWS EC2 Tutorial - TechWorld with Nana
- 📖 EC2 User Guide
- 🎮 EC2 Hands-On Labs
- 📺 EC2 Instance Types Explained
| Type | Use Case | Example |
|---|---|---|
| t3, t4g | General purpose, burstable | Web servers, dev environments |
| m5, m6i | Balanced compute/memory | Application servers |
| c5, c6i | Compute optimized | High-performance computing |
| r5, r6i | Memory optimized | Databases, caching |
| p3, p4 | GPU instances | Machine learning, rendering |
- AMI (Amazon Machine Image): Template for EC2 instances
- Instance Type: vCPU, memory, and network capacity
- Security Group: Virtual firewall for EC2 instances
- Key Pair: SSH authentication for Linux instances
- User Data: Script executed at instance launch
- Elastic IP: Static public IP address
- Placement Groups: Logical grouping for low latency
Goal: Launch an EC2 instance and install Jenkins for CI/CD
Steps:
-
Launch EC2 Instance:
# Create key pair aws ec2 create-key-pair \ --key-name jenkins-key \ --query 'KeyMaterial' \ --output text > jenkins-key.pem chmod 400 jenkins-key.pem # Create security group aws ec2 create-security-group \ --group-name jenkins-sg \ --description "Security group for Jenkins" # Allow SSH (port 22) and Jenkins (port 8080) aws ec2 authorize-security-group-ingress \ --group-name jenkins-sg \ --protocol tcp --port 22 --cidr 0.0.0.0/0 aws ec2 authorize-security-group-ingress \ --group-name jenkins-sg \ --protocol tcp --port 8080 --cidr 0.0.0.0/0
-
User Data Script (install Jenkins automatically):
#!/bin/bash # Update system yum update -y # Install Java amazon-linux-extras install java-openjdk11 -y # Add Jenkins repository wget -O /etc/yum.repos.d/jenkins.repo https://pkg.jenkins.io/redhat-stable/jenkins.repo rpm --import https://pkg.jenkins.io/redhat-stable/jenkins.io.key # Install Jenkins yum install jenkins -y # Start Jenkins systemctl start jenkins systemctl enable jenkins # Print initial admin password echo "Jenkins Initial Password:" > /tmp/jenkins-init.txt cat /var/lib/jenkins/secrets/initialAdminPassword >> /tmp/jenkins-init.txt
-
Launch Instance:
aws ec2 run-instances \ --image-id ami-0c55b159cbfafe1f0 \ --count 1 \ --instance-type t2.micro \ --key-name jenkins-key \ --security-groups jenkins-sg \ --user-data file://jenkins-install.sh \ --tag-specifications 'ResourceType=instance,Tags=[{Key=Name,Value=Jenkins-Server}]' -
Access Jenkins:
- Get public IP:
aws ec2 describe-instances --filters "Name=tag:Name,Values=Jenkins-Server" - Open browser:
http://<PUBLIC_IP>:8080 - SSH to get password:
ssh -i jenkins-key.pem ec2-user@<PUBLIC_IP> - Run:
sudo cat /var/lib/jenkins/secrets/initialAdminPassword
- Get public IP:
- Use t3 or t4g (ARM-based) instances for cost savings
- Leverage Spot Instances for non-critical workloads (up to 90% savings)
- Enable Auto Scaling to match capacity with demand
- Use Reserved Instances for predictable workloads (up to 75% savings)
- Set up CloudWatch alarms to stop idle instances
Estimated Time: 10-12 hours
- VPC fundamentals and CIDR blocks
- Subnets (public vs private)
- Internet Gateway and NAT Gateway
- Route Tables and routing
- Security Groups vs Network ACLs
- VPC Peering and Transit Gateway
- VPC Endpoints for private AWS service access
- VPN and Direct Connect
- 📺 AWS VPC Beginner to Pro - Stephane Maarek
- 📖 VPC User Guide
- 🎮 VPC Hands-On Workshop
- 📚 VPC Design Best Practices
┌─────────────────────────── VPC (10.0.0.0/16) ──────────────────────────┐
│ │
│ ┌─────── AZ 1 ───────┐ ┌─────── AZ 2 ───────┐ │
│ │ │ │ │ │
│ │ Public Subnet │ │ Public Subnet │ │
│ │ 10.0.1.0/24 │ │ 10.0.2.0/24 │ │
│ │ [Internet Gateway] │ │ [Internet Gateway] │ │
│ │ │ │ │ │
│ ├─────────────────────┤ ├─────────────────────┤ │
│ │ │ │ │ │
│ │ Private Subnet │ │ Private Subnet │ │
│ │ 10.0.11.0/24 │ │ 10.0.12.0/24 │ │
│ │ [NAT Gateway] │ │ [NAT Gateway] │ │
│ │ │ │ │ │
│ ├─────────────────────┤ ├─────────────────────┤ │
│ │ │ │ │ │
│ │ Database Subnet │ │ Database Subnet │ │
│ │ 10.0.21.0/24 │ │ 10.0.22.0/24 │ │
│ │ [RDS, ElastiCache] │ │ [RDS, ElastiCache] │ │
│ │ │ │ │ │
│ └─────────────────────┘ └─────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────────────┘
- CIDR Block: IP address range (e.g., 10.0.0.0/16 = 65,536 IPs)
- Public Subnet: Has route to Internet Gateway
- Private Subnet: Uses NAT Gateway for outbound internet
- Internet Gateway: Allows internet access for public subnets
- NAT Gateway: Enables private subnets to access internet (one-way)
- Route Table: Controls traffic routing within VPC
- Security Group: Stateful firewall at instance level
- Network ACL: Stateless firewall at subnet level
Goal: Design and deploy a secure, highly available VPC for a web application
Architecture:
- Web Tier: Public subnets with load balancer
- Application Tier: Private subnets with app servers
- Database Tier: Private subnets with RDS
# Create VPC
aws ec2 create-vpc --cidr-block 10.0.0.0/16 --tag-specifications 'ResourceType=vpc,Tags=[{Key=Name,Value=Production-VPC}]'
# Enable DNS hostnames
aws ec2 modify-vpc-attribute --vpc-id vpc-xxx --enable-dns-hostnames
# Create Internet Gateway
aws ec2 create-internet-gateway --tag-specifications 'ResourceType=internet-gateway,Tags=[{Key=Name,Value=Production-IGW}]'
# Attach IGW to VPC
aws ec2 attach-internet-gateway --vpc-id vpc-xxx --internet-gateway-id igw-xxx
# Create Public Subnets (2 AZs)
aws ec2 create-subnet --vpc-id vpc-xxx --cidr-block 10.0.1.0/24 --availability-zone us-east-1a --tag-specifications 'ResourceType=subnet,Tags=[{Key=Name,Value=Public-Subnet-1A}]'
aws ec2 create-subnet --vpc-id vpc-xxx --cidr-block 10.0.2.0/24 --availability-zone us-east-1b --tag-specifications 'ResourceType=subnet,Tags=[{Key=Name,Value=Public-Subnet-1B}]'
# Create Private Subnets (Application)
aws ec2 create-subnet --vpc-id vpc-xxx --cidr-block 10.0.11.0/24 --availability-zone us-east-1a --tag-specifications 'ResourceType=subnet,Tags=[{Key=Name,Value=Private-App-Subnet-1A}]'
aws ec2 create-subnet --vpc-id vpc-xxx --cidr-block 10.0.12.0/24 --availability-zone us-east-1b --tag-specifications 'ResourceType=subnet,Tags=[{Key=Name,Value=Private-App-Subnet-1B}]'
# Create Private Subnets (Database)
aws ec2 create-subnet --vpc-id vpc-xxx --cidr-block 10.0.21.0/24 --availability-zone us-east-1a --tag-specifications 'ResourceType=subnet,Tags=[{Key=Name,Value=Private-DB-Subnet-1A}]'
aws ec2 create-subnet --vpc-id vpc-xxx --cidr-block 10.0.22.0/24 --availability-zone us-east-1b --tag-specifications 'ResourceType=subnet,Tags=[{Key=Name,Value=Private-DB-Subnet-1B}]'
# Create NAT Gateway (requires Elastic IP)
aws ec2 allocate-address --domain vpc
aws ec2 create-nat-gateway --subnet-id subnet-xxx --allocation-id eipalloc-xxx
# Create Route Tables
aws ec2 create-route-table --vpc-id vpc-xxx --tag-specifications 'ResourceType=route-table,Tags=[{Key=Name,Value=Public-RT}]'
aws ec2 create-route-table --vpc-id vpc-xxx --tag-specifications 'ResourceType=route-table,Tags=[{Key=Name,Value=Private-RT}]'
# Add routes
aws ec2 create-route --route-table-id rtb-xxx --destination-cidr-block 0.0.0.0/0 --gateway-id igw-xxx
aws ec2 create-route --route-table-id rtb-yyy --destination-cidr-block 0.0.0.0/0 --nat-gateway-id nat-xxx
# Associate subnets with route tables
aws ec2 associate-route-table --subnet-id subnet-public --route-table-id rtb-xxx
aws ec2 associate-route-table --subnet-id subnet-private --route-table-id rtb-yyy- ✅ Use multiple AZs for high availability
- ✅ Separate tiers with private subnets
- ✅ Use NAT Gateway (not NAT Instance) for production
- ✅ Enable VPC Flow Logs for network monitoring
- ✅ Use Security Groups as primary firewall
- ✅ Implement Network ACLs as secondary layer
- ✅ Use VPC Endpoints to avoid internet traffic for AWS services
Estimated Time: 6-8 hours
- AWS Shared Responsibility Model
- Security Groups and NACLs
- Encryption at rest and in transit
- AWS KMS (Key Management Service)
- AWS Secrets Manager
- GuardDuty for threat detection
- AWS WAF (Web Application Firewall)
- Security Hub for compliance
- 📺 AWS Security Fundamentals
- 📖 AWS Security Best Practices
- 📚 AWS Security Workshops
- 🎮 AWS Well-Architected Security Pillar
┌──────────────────────────────────────────┐
│ Customer Responsibility │
│ • Data │
│ • IAM │
│ • Application Security │
│ • OS Patching │
│ • Network Configuration │
│ • Firewall │
└──────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────┐
│ AWS Responsibility │
│ • Hardware │
│ • Global Infrastructure │
│ • Compute, Storage, Network │
│ • Regions, AZs, Edge Locations │
│ • Managed Services │
└──────────────────────────────────────────┘
- Encryption at Rest: Data encrypted when stored (EBS, S3, RDS)
- Encryption in Transit: Data encrypted during transfer (TLS/SSL)
- AWS KMS: Managed encryption key service
- Secrets Manager: Store and rotate credentials automatically
- GuardDuty: Intelligent threat detection
- CloudTrail: API call logging and auditing
- AWS Config: Configuration compliance monitoring
Goal: Implement multiple security layers for a web application
Tasks:
- Enable encryption on all EBS volumes
- Set up AWS Secrets Manager for database credentials
- Configure Security Groups with least privilege
- Enable GuardDuty for threat detection
- Set up CloudTrail for audit logging
- Create CloudWatch alarms for security events
# Enable EBS encryption by default
aws ec2 enable-ebs-encryption-by-default --region us-east-1
# Create secret in Secrets Manager
aws secretsmanager create-secret \
--name prod/db/credentials \
--secret-string '{"username":"admin","password":"MySecurePassword123!"}'
# Enable GuardDuty
aws guardduty create-detector --enable
# Create CloudTrail
aws cloudtrail create-trail \
--name security-trail \
--s3-bucket-name my-cloudtrail-bucket
aws cloudtrail start-logging --name security-trail
# Enable AWS Config
aws configservice put-configuration-recorder \
--configuration-recorder name=default,roleARN=arn:aws:iam::ACCOUNT_ID:role/aws-service-role/config.amazonaws.com/AWSServiceRoleForConfig
aws configservice start-configuration-recorder --configuration-recorder-name default- MFA enabled on all user accounts
- Root account not used for daily operations
- IAM roles used instead of access keys
- Encryption enabled for all data stores
- Security Groups follow least privilege
- CloudTrail enabled in all regions
- GuardDuty enabled for threat detection
- Regular security audits performed
- Automated patch management configured
Estimated Time: 4-6 hours
- Domain registration and management
- DNS record types (A, AAAA, CNAME, MX, TXT)
- Hosted zones (public vs private)
- Routing policies (Simple, Weighted, Latency, Failover, Geolocation)
- Health checks and monitoring
- Traffic flow and geoproximity routing
- Route 53 integration with other AWS services
- 📺 AWS Route 53 Tutorial - Stephane Maarek
- 📖 Route 53 Developer Guide
- 📚 DNS Fundamentals
- 🎮 Route 53 Hands-On Lab
| Record Type | Purpose | Example |
|---|---|---|
| A | IPv4 address | example.com → 192.0.2.1 |
| AAAA | IPv6 address | example.com → 2001:0db8::1 |
| CNAME | Alias for another domain | www.example.com → example.com |
| MX | Mail server | example.com → mail.example.com |
| TXT | Text information | SPF, DKIM records |
| NS | Name server | Delegation to name servers |
| Alias | AWS resource mapping | example.com → ELB |
- Simple: Single resource
- Weighted: Traffic distribution by percentage
- Latency: Route based on lowest latency
- Failover: Active-passive failover
- Geolocation: Route based on user location
- Geoproximity: Route based on resource and user location
- Multi-value: Return multiple IPs with health checks
Goal: Register a domain and configure DNS for a web application
# Create hosted zone
aws route53 create-hosted-zone \
--name example.com \
--caller-reference $(date +%s) \
--hosted-zone-config Comment="Production domain"
# Create A record pointing to EC2
cat > change-batch.json << EOF
{
"Changes": [{
"Action": "CREATE",
"ResourceRecordSet": {
"Name": "example.com",
"Type": "A",
"TTL": 300,
"ResourceRecords": [{"Value": "203.0.113.1"}]
}
}]
}
EOF
aws route53 change-resource-record-sets \
--hosted-zone-id Z1234567890ABC \
--change-batch file://change-batch.json
# Create health check
aws route53 create-health-check \
--health-check-config IPAddress=203.0.113.1,Port=80,Type=HTTP,ResourcePath=/health \
--caller-reference $(date +%s)
# Create weighted routing (Blue/Green deployment)
aws route53 change-resource-record-sets --hosted-zone-id Z1234567890ABC --change-batch '{
"Changes": [
{
"Action": "CREATE",
"ResourceRecordSet": {
"Name": "app.example.com",
"Type": "A",
"SetIdentifier": "Blue",
"Weight": 90,
"TTL": 60,
"ResourceRecords": [{"Value": "203.0.113.1"}]
}
},
{
"Action": "CREATE",
"ResourceRecordSet": {
"Name": "app.example.com",
"Type": "A",
"SetIdentifier": "Green",
"Weight": 10,
"TTL": 60,
"ResourceRecords": [{"Value": "203.0.113.2"}]
}
}
]
}'Estimated Time: 6-8 hours
- S3 buckets and objects
- Storage classes (Standard, IA, Glacier, etc.)
- Versioning and lifecycle policies
- S3 encryption (SSE-S3, SSE-KMS, SSE-C)
- Bucket policies and ACLs
- S3 static website hosting
- Cross-region replication
- S3 Transfer Acceleration
- S3 event notifications
- 📺 AWS S3 Masterclass - Stephane Maarek
- 📖 S3 User Guide
- 🎮 S3 Hands-On Labs
- 📚 S3 Best Practices
| Class | Use Case | Availability | Cost |
|---|---|---|---|
| S3 Standard | Frequently accessed | 99.99% | $$$ |
| S3 Intelligent-Tiering | Unpredictable access | 99.9% | $$ (automatic) |
| S3 Standard-IA | Infrequently accessed | 99.9% | $$ |
| S3 One Zone-IA | Reproducible data | 99.5% | $ |
| S3 Glacier Instant | Archive, instant retrieval | 99.9% | $ |
| S3 Glacier Flexible | Archive, minutes-hours | 99.99% | ¢ |
| S3 Glacier Deep Archive | Long-term archive | 99.99% | ¢ |
- Bucket: Container for objects (globally unique name)
- Object: File with metadata (max 5TB)
- Versioning: Keep multiple versions of objects
- Lifecycle Policy: Automate transitions between storage classes
- Encryption: Server-side or client-side
- Pre-signed URL: Temporary access to private objects
- S3 Select: Query data with SQL
Goal: Host a static website on S3 with automated deployments
# Create S3 bucket
aws s3 mb s3://my-website-bucket-$(date +%s)
# Enable static website hosting
aws s3 website s3://my-website-bucket-123456 \
--index-document index.html \
--error-document error.html
# Create bucket policy for public read
cat > bucket-policy.json << EOF
{
"Version": "2012-10-17",
"Statement": [{
"Sid": "PublicReadGetObject",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::my-website-bucket-123456/*"
}]
}
EOF
aws s3api put-bucket-policy \
--bucket my-website-bucket-123456 \
--policy file://bucket-policy.json
# Enable versioning
aws s3api put-bucket-versioning \
--bucket my-website-bucket-123456 \
--versioning-configuration Status=Enabled
# Create lifecycle policy (transition to IA after 30 days)
cat > lifecycle.json << EOF
{
"Rules": [{
"Id": "MoveToIA",
"Status": "Enabled",
"Transitions": [{
"Days": 30,
"StorageClass": "STANDARD_IA"
}],
"NoncurrentVersionTransitions": [{
"NoncurrentDays": 30,
"StorageClass": "GLACIER"
}]
}]
}
EOF
aws s3api put-bucket-lifecycle-configuration \
--bucket my-website-bucket-123456 \
--lifecycle-configuration file://lifecycle.json
# Upload website files
aws s3 sync ./website s3://my-website-bucket-123456/ --delete
# Get website URL
echo "Website URL: http://my-website-bucket-123456.s3-website-us-east-1.amazonaws.com"- ✅ Enable versioning for critical data
- ✅ Use lifecycle policies to reduce costs
- ✅ Enable encryption by default
- ✅ Use S3 Intelligent-Tiering for unpredictable access
- ✅ Enable S3 access logging for auditing
- ✅ Use CloudFront for better performance
- ✅ Implement least privilege bucket policies
Estimated Time: 4-6 hours
- AWS CLI installation and configuration
- CLI profiles for multiple accounts
- Common CLI commands for all services
- Output formatting (JSON, table, text)
- Query and filter results with JMESPath
- CLI pagination and wait commands
- AWS CLI v2 features
- Scripting and automation with Bash/Python
# Configuration
aws configure # Initial setup
aws configure list # Show current config
aws configure --profile prod # Configure named profile
aws sts get-caller-identity # Verify credentials
# EC2
aws ec2 describe-instances # List all instances
aws ec2 start-instances --instance-ids i-xxx # Start instance
aws ec2 stop-instances --instance-ids i-xxx # Stop instance
aws ec2 terminate-instances --instance-ids i-xxx # Terminate instance
# S3
aws s3 ls # List buckets
aws s3 ls s3://bucket-name # List objects
aws s3 cp file.txt s3://bucket/ # Upload file
aws s3 sync ./local s3://bucket/ # Sync directory
aws s3 rm s3://bucket/file.txt # Delete object
# IAM
aws iam list-users # List users
aws iam create-user --user-name john # Create user
aws iam attach-user-policy --user-name john --policy-arn xxx
# Lambda
aws lambda list-functions # List functions
aws lambda invoke --function-name my-func output.txt
# CloudFormation
aws cloudformation create-stack --stack-name my-stack --template-body file://template.yaml
aws cloudformation describe-stacks # List stacks
aws cloudformation delete-stack --stack-name my-stack
# Logs
aws logs tail /aws/lambda/my-function --follow # Tail logs
aws logs describe-log-groups # List log groups
# Query and Filter (JMESPath)
aws ec2 describe-instances --query 'Reservations[*].Instances[*].[InstanceId,State.Name,InstanceType]' --output table
aws s3api list-buckets --query 'Buckets[?starts_with(Name, `prod-`)].Name' --output text
aws ec2 describe-instances --filters "Name=tag:Environment,Values=production" --query 'Reservations[*].Instances[*].[InstanceId,Tags[?Key==`Name`].Value|[0]]'Goal: Create a Bash script to automate AWS resource management
#!/bin/bash
# aws-automation.sh - Manage AWS resources
set -e
# Configuration
PROFILE="default"
REGION="us-east-1"
# Function: List running EC2 instances
list_running_instances() {
echo "=== Running EC2 Instances ==="
aws ec2 describe-instances \
--profile $PROFILE \
--region $REGION \
--filters "Name=instance-state-name,Values=running" \
--query 'Reservations[*].Instances[*].[InstanceId,InstanceType,State.Name,Tags[?Key==`Name`].Value|[0]]' \
--output table
}
# Function: Stop instances with specific tag
stop_dev_instances() {
echo "=== Stopping Development Instances ==="
INSTANCE_IDS=$(aws ec2 describe-instances \
--profile $PROFILE \
--region $REGION \
--filters "Name=tag:Environment,Values=development" "Name=instance-state-name,Values=running" \
--query 'Reservations[*].Instances[*].InstanceId' \
--output text)
if [ -n "$INSTANCE_IDS" ]; then
aws ec2 stop-instances --instance-ids $INSTANCE_IDS
echo "Stopped instances: $INSTANCE_IDS"
else
echo "No running development instances found"
fi
}
# Function: Create backup of S3 bucket
backup_s3_bucket() {
BUCKET=$1
BACKUP_BUCKET="${BUCKET}-backup-$(date +%Y%m%d)"
echo "=== Backing up $BUCKET to $BACKUP_BUCKET ==="
aws s3 mb s3://$BACKUP_BUCKET
aws s3 sync s3://$BUCKET s3://$BACKUP_BUCKET --delete
echo "Backup completed"
}
# Function: Generate cost report
cost_report() {
echo "=== Monthly Cost Report ==="
START_DATE=$(date -d "last month" +%Y-%m-01)
END_DATE=$(date +%Y-%m-01)
aws ce get-cost-and-usage \
--time-period Start=$START_DATE,End=$END_DATE \
--granularity MONTHLY \
--metrics "UnblendedCost" \
--group-by Type=SERVICE \
--query 'ResultsByTime[0].Groups[*].[Keys[0],Metrics.UnblendedCost.Amount]' \
--output table
}
# Function: Clean up old snapshots
cleanup_snapshots() {
echo "=== Cleaning up snapshots older than 30 days ==="
CUTOFF_DATE=$(date -d "30 days ago" +%Y-%m-%d)
aws ec2 describe-snapshots \
--owner-ids self \
--query "Snapshots[?StartTime<='$CUTOFF_DATE'].SnapshotId" \
--output text | while read SNAPSHOT_ID; do
echo "Deleting snapshot: $SNAPSHOT_ID"
aws ec2 delete-snapshot --snapshot-id $SNAPSHOT_ID
done
}
# Main menu
case "${1:-help}" in
list)
list_running_instances
;;
stop-dev)
stop_dev_instances
;;
backup)
backup_s3_bucket $2
;;
cost)
cost_report
;;
cleanup)
cleanup_snapshots
;;
*)
echo "Usage: $0 {list|stop-dev|backup <bucket>|cost|cleanup}"
exit 1
;;
esacUsage:
chmod +x aws-automation.sh
./aws-automation.sh list # List running instances
./aws-automation.sh stop-dev # Stop dev instances
./aws-automation.sh backup my-bucket # Backup S3 bucket
./aws-automation.sh cost # Cost report
./aws-automation.sh cleanup # Clean old snapshotsEstimated Time: 8-10 hours
- CloudFormation templates (YAML/JSON)
- Stacks and stack operations
- Parameters, mappings, and outputs
- Intrinsic functions (Ref, GetAtt, Join, etc.)
- Nested stacks and cross-stack references
- StackSets for multi-account deployment
- Change sets for safe updates
- Drift detection
- cfn-lint for validation
- 📺 AWS CloudFormation Tutorial - Stephane Maarek
- 📖 CloudFormation User Guide
- 📚 CloudFormation Best Practices
- 🎮 CloudFormation Workshop
AWSTemplateFormatVersion: '2010-09-09'
Description: 'Template description'
Parameters:
# Input parameters
Mappings:
# Static variables
Conditions:
# Conditional resource creation
Resources:
# AWS resources to create
Outputs:
# Values to exportGoal: Create a complete infrastructure stack
# infrastructure.yaml
AWSTemplateFormatVersion: '2010-09-09'
Description: 'Production 3-Tier Web Application Infrastructure'
Parameters:
EnvironmentName:
Type: String
Default: Production
Description: Environment name prefix
VpcCIDR:
Type: String
Default: 10.0.0.0/16
Description: VPC CIDR block
KeyPairName:
Type: AWS::EC2::KeyPair::KeyName
Description: EC2 Key Pair for SSH access
InstanceType:
Type: String
Default: t3.micro
AllowedValues:
- t3.micro
- t3.small
- t3.medium
Description: EC2 instance type
Mappings:
RegionAMI:
us-east-1:
AMI: ami-0c55b159cbfafe1f0
us-west-2:
AMI: ami-0d1cd67c26f5fca19
eu-west-1:
AMI: ami-0bbc25e23a7640b9b
Conditions:
CreateProdResources: !Equals [!Ref EnvironmentName, Production]
Resources:
# VPC
VPC:
Type: AWS::EC2::VPC
Properties:
CidrBlock: !Ref VpcCIDR
EnableDnsHostnames: true
EnableDnsSupport: true
Tags:
- Key: Name
Value: !Sub '${EnvironmentName}-VPC'
# Internet Gateway
InternetGateway:
Type: AWS::EC2::InternetGateway
Properties:
Tags:
- Key: Name
Value: !Sub '${EnvironmentName}-IGW'
AttachGateway:
Type: AWS::EC2::VPCGatewayAttachment
Properties:
VpcId: !Ref VPC
InternetGatewayId: !Ref InternetGateway
# Public Subnet
PublicSubnet1:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref VPC
CidrBlock: 10.0.1.0/24
AvailabilityZone: !Select [0, !GetAZs '']
MapPublicIpOnLaunch: true
Tags:
- Key: Name
Value: !Sub '${EnvironmentName}-Public-1A'
PublicSubnet2:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref VPC
CidrBlock: 10.0.2.0/24
AvailabilityZone: !Select [1, !GetAZs '']
MapPublicIpOnLaunch: true
Tags:
- Key: Name
Value: !Sub '${EnvironmentName}-Public-1B'
# Private Subnets
PrivateSubnet1:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref VPC
CidrBlock: 10.0.11.0/24
AvailabilityZone: !Select [0, !GetAZs '']
Tags:
- Key: Name
Value: !Sub '${EnvironmentName}-Private-1A'
PrivateSubnet2:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref VPC
CidrBlock: 10.0.12.0/24
AvailabilityZone: !Select [1, !GetAZs '']
Tags:
- Key: Name
Value: !Sub '${EnvironmentName}-Private-1B'
# NAT Gateway
NATGatewayEIP:
Type: AWS::EC2::EIP
DependsOn: AttachGateway
Properties:
Domain: vpc
NATGateway:
Type: AWS::EC2::NatGateway
Properties:
AllocationId: !GetAtt NATGatewayEIP.AllocationId
SubnetId: !Ref PublicSubnet1
Tags:
- Key: Name
Value: !Sub '${EnvironmentName}-NAT'
# Route Tables
PublicRouteTable:
Type: AWS::EC2::RouteTable
Properties:
VpcId: !Ref VPC
Tags:
- Key: Name
Value: !Sub '${EnvironmentName}-Public-RT'
DefaultPublicRoute:
Type: AWS::EC2::Route
DependsOn: AttachGateway
Properties:
RouteTableId: !Ref PublicRouteTable
DestinationCidrBlock: 0.0.0.0/0
GatewayId: !Ref InternetGateway
PublicSubnet1RouteTableAssociation:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
RouteTableId: !Ref PublicRouteTable
SubnetId: !Ref PublicSubnet1
PublicSubnet2RouteTableAssociation:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
RouteTableId: !Ref PublicRouteTable
SubnetId: !Ref PublicSubnet2
PrivateRouteTable:
Type: AWS::EC2::RouteTable
Properties:
VpcId: !Ref VPC
Tags:
- Key: Name
Value: !Sub '${EnvironmentName}-Private-RT'
DefaultPrivateRoute:
Type: AWS::EC2::Route
Properties:
RouteTableId: !Ref PrivateRouteTable
DestinationCidrBlock: 0.0.0.0/0
NatGatewayId: !Ref NATGateway
PrivateSubnet1RouteTableAssociation:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
RouteTableId: !Ref PrivateRouteTable
SubnetId: !Ref PrivateSubnet1
PrivateSubnet2RouteTableAssociation:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
RouteTableId: !Ref PrivateRouteTable
SubnetId: !Ref PrivateSubnet2
# Security Groups
ALBSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Security group for Application Load Balancer
VpcId: !Ref VPC
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 80
ToPort: 80
CidrIp: 0.0.0.0/0
- IpProtocol: tcp
FromPort: 443
ToPort: 443
CidrIp: 0.0.0.0/0
Tags:
- Key: Name
Value: !Sub '${EnvironmentName}-ALB-SG'
WebServerSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
GroupDescription: Security group for web servers
VpcId: !Ref VPC
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 80
ToPort: 80
SourceSecurityGroupId: !Ref ALBSecurityGroup
- IpProtocol: tcp
FromPort: 22
ToPort: 22
CidrIp: 10.0.0.0/16
Tags:
- Key: Name
Value: !Sub '${EnvironmentName}-Web-SG'
# Application Load Balancer
ApplicationLoadBalancer:
Type: AWS::ElasticLoadBalancingV2::LoadBalancer
Properties:
Name: !Sub '${EnvironmentName}-ALB'
Subnets:
- !Ref PublicSubnet1
- !Ref PublicSubnet2
SecurityGroups:
- !Ref ALBSecurityGroup
Tags:
- Key: Name
Value: !Sub '${EnvironmentName}-ALB'
ALBTargetGroup:
Type: AWS::ElasticLoadBalancingV2::TargetGroup
Properties:
Name: !Sub '${EnvironmentName}-TG'
VpcId: !Ref VPC
Port: 80
Protocol: HTTP
HealthCheckPath: /health
HealthCheckIntervalSeconds: 30
HealthCheckTimeoutSeconds: 5
HealthyThresholdCount: 2
UnhealthyThresholdCount: 3
TargetType: instance
ALBListener:
Type: AWS::ElasticLoadBalancingV2::Listener
Properties:
LoadBalancerArn: !Ref ApplicationLoadBalancer
Port: 80
Protocol: HTTP
DefaultActions:
- Type: forward
TargetGroupArn: !Ref ALBTargetGroup
# Launch Template
LaunchTemplate:
Type: AWS::EC2::LaunchTemplate
Properties:
LaunchTemplateName: !Sub '${EnvironmentName}-LaunchTemplate'
LaunchTemplateData:
ImageId: !FindInMap [RegionAMI, !Ref 'AWS::Region', AMI]
InstanceType: !Ref InstanceType
KeyName: !Ref KeyPairName
SecurityGroupIds:
- !Ref WebServerSecurityGroup
UserData:
Fn::Base64: !Sub |
#!/bin/bash
yum update -y
yum install -y httpd
systemctl start httpd
systemctl enable httpd
echo "<h1>Hello from ${EnvironmentName} - $(hostname -f)</h1>" > /var/www/html/index.html
echo "OK" > /var/www/html/health
TagSpecifications:
- ResourceType: instance
Tags:
- Key: Name
Value: !Sub '${EnvironmentName}-WebServer'
# Auto Scaling Group
AutoScalingGroup:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
AutoScalingGroupName: !Sub '${EnvironmentName}-ASG'
VPCZoneIdentifier:
- !Ref PrivateSubnet1
- !Ref PrivateSubnet2
LaunchTemplate:
LaunchTemplateId: !Ref LaunchTemplate
Version: !GetAtt LaunchTemplate.LatestVersionNumber
MinSize: 2
MaxSize: 6
DesiredCapacity: 2
HealthCheckType: ELB
HealthCheckGracePeriod: 300
TargetGroupARNs:
- !Ref ALBTargetGroup
Tags:
- Key: Name
Value: !Sub '${EnvironmentName}-ASG-Instance'
PropagateAtLaunch: true
Outputs:
VPCId:
Description: VPC ID
Value: !Ref VPC
Export:
Name: !Sub '${EnvironmentName}-VPC-ID'
ALBDNSName:
Description: Application Load Balancer DNS Name
Value: !GetAtt ApplicationLoadBalancer.DNSName
Export:
Name: !Sub '${EnvironmentName}-ALB-DNS'
LoadBalancerURL:
Description: URL of the load balancer
Value: !Sub 'http://${ApplicationLoadBalancer.DNSName}'Deploy the Stack:
# Validate template
aws cloudformation validate-template --template-body file://infrastructure.yaml
# Create stack
aws cloudformation create-stack \
--stack-name production-app \
--template-body file://infrastructure.yaml \
--parameters \
ParameterKey=EnvironmentName,ParameterValue=Production \
ParameterKey=KeyPairName,ParameterValue=my-key-pair \
--capabilities CAPABILITY_IAM
# Monitor stack creation
aws cloudformation wait stack-create-complete --stack-name production-app
aws cloudformation describe-stacks --stack-name production-app
# Get outputs
aws cloudformation describe-stacks \
--stack-name production-app \
--query 'Stacks[0].Outputs'
# Update stack (use change sets for safety)
aws cloudformation create-change-set \
--stack-name production-app \
--change-set-name update-instances \
--template-body file://infrastructure.yaml \
--parameters ParameterKey=InstanceType,ParameterValue=t3.small
# Review changes
aws cloudformation describe-change-set \
--stack-name production-app \
--change-set-name update-instances
# Execute change set
aws cloudformation execute-change-set \
--stack-name production-app \
--change-set-name update-instances
# Detect drift
aws cloudformation detect-stack-drift --stack-name production-app
aws cloudformation describe-stack-drift-detection-status --stack-drift-detection-id xxx
# Delete stack
aws cloudformation delete-stack --stack-name production-app- ✅ Use parameters for reusable templates
- ✅ Leverage mappings for region-specific values
- ✅ Use change sets before updating production
- ✅ Enable termination protection on critical stacks
- ✅ Use nested stacks for modularity
- ✅ Tag all resources consistently
- ✅ Use IAM roles for CloudFormation
- ✅ Enable drift detection regularly
Estimated Time: 10-12 hours
- Terraform basics and HCL syntax
- Terraform providers (AWS provider)
- Resources, data sources, and modules
- State management (local and remote)
- Variables and outputs
- Terraform workspaces
- Import existing infrastructure
- Terraform Cloud and Enterprise
- 📺 Terraform Course - Full Tutorial - freeCodeCamp
- 📖 Terraform AWS Provider Docs
- 📚 HashiCorp Learn - Terraform
- 🎮 Terraform AWS Workshop
- 💎 Terraform: Up & Running - Book by Yevgeniy Brikman
| Aspect | Terraform | CloudFormation |
|---|---|---|
| Language | HCL (HashiCorp Configuration Language) | YAML/JSON |
| Multi-Cloud | Yes (AWS, Azure, GCP, etc.) | AWS only |
| State Management | Explicit (local or remote) | Managed by AWS |
| Module Registry | Extensive public registry | AWS Registry |
| Community | Large open-source community | AWS official support |
| Cost | Free (Terraform Cloud paid) | Free (AWS service) |
| Learning Curve | Moderate | Moderate |
terraform-aws-project/
├── main.tf # Main configuration
├── variables.tf # Input variables
├── outputs.tf # Output values
├── providers.tf # Provider configuration
├── backend.tf # Remote state configuration
├── terraform.tfvars # Variable values
├── modules/
│ ├── vpc/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ ├── ec2/
│ └── rds/
└── environments/
├── dev/
├── staging/
└── prod/
Goal: Create the same 3-tier infrastructure using Terraform
providers.tf:
terraform {
required_version = ">= 1.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
backend "s3" {
bucket = "my-terraform-state-bucket"
key = "production/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-state-lock"
}
}
provider "aws" {
region = var.aws_region
default_tags {
tags = {
Environment = var.environment
ManagedBy = "Terraform"
Project = var.project_name
}
}
}variables.tf:
variable "aws_region" {
description = "AWS region"
type = string
default = "us-east-1"
}
variable "environment" {
description = "Environment name"
type = string
default = "production"
}
variable "project_name" {
description = "Project name"
type = string
default = "webapp"
}
variable "vpc_cidr" {
description = "VPC CIDR block"
type = string
default = "10.0.0.0/16"
}
variable "availability_zones" {
description = "Availability zones"
type = list(string)
default = ["us-east-1a", "us-east-1b"]
}
variable "instance_type" {
description = "EC2 instance type"
type = string
default = "t3.micro"
}
variable "key_name" {
description = "EC2 key pair name"
type = string
}main.tf:
# Data source for latest Amazon Linux 2 AMI
data "aws_ami" "amazon_linux_2" {
most_recent = true
owners = ["amazon"]
filter {
name = "name"
values = ["amzn2-ami-hvm-*-x86_64-gp2"]
}
}
# VPC
resource "aws_vpc" "main" {
cidr_block = var.vpc_cidr
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "${var.environment}-vpc"
}
}
# Internet Gateway
resource "aws_internet_gateway" "main" {
vpc_id = aws_vpc.main.id
tags = {
Name = "${var.environment}-igw"
}
}
# Public Subnets
resource "aws_subnet" "public" {
count = length(var.availability_zones)
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index)
availability_zone = var.availability_zones[count.index]
map_public_ip_on_launch = true
tags = {
Name = "${var.environment}-public-${count.index + 1}"
Tier = "Public"
}
}
# Private Subnets
resource "aws_subnet" "private" {
count = length(var.availability_zones)
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index + 10)
availability_zone = var.availability_zones[count.index]
tags = {
Name = "${var.environment}-private-${count.index + 1}"
Tier = "Private"
}
}
# Elastic IP for NAT Gateway
resource "aws_eip" "nat" {
domain = "vpc"
tags = {
Name = "${var.environment}-nat-eip"
}
}
# NAT Gateway
resource "aws_nat_gateway" "main" {
allocation_id = aws_eip.nat.id
subnet_id = aws_subnet.public[0].id
tags = {
Name = "${var.environment}-nat"
}
depends_on = [aws_internet_gateway.main]
}
# Route Tables
resource "aws_route_table" "public" {
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.main.id
}
tags = {
Name = "${var.environment}-public-rt"
}
}
resource "aws_route_table" "private" {
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
nat_gateway_id = aws_nat_gateway.main.id
}
tags = {
Name = "${var.environment}-private-rt"
}
}
# Route Table Associations
resource "aws_route_table_association" "public" {
count = length(aws_subnet.public)
subnet_id = aws_subnet.public[count.index].id
route_table_id = aws_route_table.public.id
}
resource "aws_route_table_association" "private" {
count = length(aws_subnet.private)
subnet_id = aws_subnet.private[count.index].id
route_table_id = aws_route_table.private.id
}
# Security Groups
resource "aws_security_group" "alb" {
name = "${var.environment}-alb-sg"
description = "Security group for ALB"
vpc_id = aws_vpc.main.id
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${var.environment}-alb-sg"
}
}
resource "aws_security_group" "web" {
name = "${var.environment}-web-sg"
description = "Security group for web servers"
vpc_id = aws_vpc.main.id
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
security_groups = [aws_security_group.alb.id]
}
ingress {
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = [var.vpc_cidr]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${var.environment}-web-sg"
}
}
# Application Load Balancer
resource "aws_lb" "main" {
name = "${var.environment}-alb"
internal = false
load_balancer_type = "application"
security_groups = [aws_security_group.alb.id]
subnets = aws_subnet.public[*].id
enable_deletion_protection = false
tags = {
Name = "${var.environment}-alb"
}
}
resource "aws_lb_target_group" "main" {
name = "${var.environment}-tg"
port = 80
protocol = "HTTP"
vpc_id = aws_vpc.main.id
health_check {
path = "/health"
healthy_threshold = 2
unhealthy_threshold = 3
timeout = 5
interval = 30
matcher = "200"
}
tags = {
Name = "${var.environment}-tg"
}
}
resource "aws_lb_listener" "http" {
load_balancer_arn = aws_lb.main.arn
port = "80"
protocol = "HTTP"
default_action {
type = "forward"
target_group_arn = aws_lb_target_group.main.arn
}
}
# Launch Template
resource "aws_launch_template" "web" {
name_prefix = "${var.environment}-web-"
image_id = data.aws_ami.amazon_linux_2.id
instance_type = var.instance_type
key_name = var.key_name
vpc_security_group_ids = [aws_security_group.web.id]
user_data = base64encode(<<-EOF
#!/bin/bash
yum update -y
yum install -y httpd
systemctl start httpd
systemctl enable httpd
echo "<h1>Hello from ${var.environment} - $(hostname -f)</h1>" > /var/www/html/index.html
echo "OK" > /var/www/html/health
EOF
)
tag_specifications {
resource_type = "instance"
tags = {
Name = "${var.environment}-web-server"
}
}
}
# Auto Scaling Group
resource "aws_autoscaling_group" "web" {
name = "${var.environment}-asg"
vpc_zone_identifier = aws_subnet.private[*].id
target_group_arns = [aws_lb_target_group.main.arn]
health_check_type = "ELB"
health_check_grace_period = 300
min_size = 2
max_size = 6
desired_capacity = 2
launch_template {
id = aws_launch_template.web.id
version = "$Latest"
}
tag {
key = "Name"
value = "${var.environment}-asg-instance"
propagate_at_launch = true
}
}
# Auto Scaling Policies
resource "aws_autoscaling_policy" "scale_up" {
name = "${var.environment}-scale-up"
scaling_adjustment = 1
adjustment_type = "ChangeInCapacity"
cooldown = 300
autoscaling_group_name = aws_autoscaling_group.web.name
}
resource "aws_autoscaling_policy" "scale_down" {
name = "${var.environment}-scale-down"
scaling_adjustment = -1
adjustment_type = "ChangeInCapacity"
cooldown = 300
autoscaling_group_name = aws_autoscaling_group.web.name
}
# CloudWatch Alarms
resource "aws_cloudwatch_metric_alarm" "high_cpu" {
alarm_name = "${var.environment}-high-cpu"
comparison_operator = "GreaterThanThreshold"
evaluation_periods = "2"
metric_name = "CPUUtilization"
namespace = "AWS/EC2"
period = "120"
statistic = "Average"
threshold = "80"
alarm_description = "This metric monitors ec2 cpu utilization"
alarm_actions = [aws_autoscaling_policy.scale_up.arn]
dimensions = {
AutoScalingGroupName = aws_autoscaling_group.web.name
}
}
resource "aws_cloudwatch_metric_alarm" "low_cpu" {
alarm_name = "${var.environment}-low-cpu"
comparison_operator = "LessThanThreshold"
evaluation_periods = "2"
metric_name = "CPUUtilization"
namespace = "AWS/EC2"
period = "120"
statistic = "Average"
threshold = "20"
alarm_description = "This metric monitors ec2 cpu utilization"
alarm_actions = [aws_autoscaling_policy.scale_down.arn]
dimensions = {
AutoScalingGroupName = aws_autoscaling_group.web.name
}
}outputs.tf:
output "vpc_id" {
description = "VPC ID"
value = aws_vpc.main.id
}
output "alb_dns_name" {
description = "Application Load Balancer DNS name"
value = aws_lb.main.dns_name
}
output "load_balancer_url" {
description = "URL of the load balancer"
value = "http://${aws_lb.main.dns_name}"
}
output "public_subnet_ids" {
description = "Public subnet IDs"
value = aws_subnet.public[*].id
}
output "private_subnet_ids" {
description = "Private subnet IDs"
value = aws_subnet.private[*].id
}terraform.tfvars:
aws_region = "us-east-1"
environment = "production"
project_name = "webapp"
vpc_cidr = "10.0.0.0/16"
availability_zones = ["us-east-1a", "us-east-1b"]
instance_type = "t3.micro"
key_name = "my-key-pair"Terraform Commands:
# Initialize Terraform
terraform init
# Format code
terraform fmt -recursive
# Validate configuration
terraform validate
# Plan (see what will be created)
terraform plan
# Apply (create infrastructure)
terraform apply
# Show current state
terraform show
# List resources
terraform state list
# Get specific output
terraform output alb_dns_name
# Import existing resource
terraform import aws_vpc.main vpc-xxx
# Refresh state
terraform refresh
# Destroy infrastructure
terraform destroy
# Use workspaces (dev, staging, prod)
terraform workspace new dev
terraform workspace select dev
terraform workspace list
# Target specific resource
terraform apply -target=aws_vpc.main
# Create and use modules
terraform get- ✅ Use remote state (S3 + DynamoDB)
- ✅ Enable state locking
- ✅ Use modules for reusability
- ✅ Implement workspaces for environments
- ✅ Use variables for configurability
- ✅ Store sensitive data in AWS Secrets Manager
- ✅ Use data sources instead of hardcoding
- ✅ Tag all resources consistently
- ✅ Use terraform fmt and validate
- ✅ Review plans before applying
Estimated Time: 10-12 hours
- AWS CodeCommit - Git repository hosting
- AWS CodeBuild - Build and test automation
- AWS CodeDeploy - Deployment automation
- Integration with GitHub and other VCS
- Build specifications and deployment configurations
- Artifact management
- 📺 AWS CodeCommit Tutorial
- 📺 AWS CodeBuild Deep Dive
- 📺 AWS CodeDeploy Tutorial
- 📖 AWS Developer Tools Documentation
See detailed guide in Section 12 - AWS CodePipeline for complete CI/CD implementation.
Estimated Time: 10-12 hours
- End-to-end CI/CD pipelines
- Pipeline stages (Source, Build, Test, Deploy)
- Integration with third-party tools
- Manual approval gates
- Pipeline notifications with SNS
- Cross-region deployments
- Blue/Green and Canary deployments
- 📺 AWS CodePipeline Complete Tutorial
- 📖 CodePipeline User Guide
- 🎮 CI/CD Workshop
- 📚 CodePipeline Best Practices
┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ CodeCommit │────▶│ CodeBuild │────▶│ CodeDeploy │────▶│ EC2 │
│ (Source) │ │ (Build) │ │ (Deploy) │ │ (Production)│
└──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘
│ │ │ │
└─────────────────────┴─────────────────────┴─────────────────────┘
AWS CodePipeline Orchestration
buildspec.yml (for CodeBuild):
version: 0.2
phases:
install:
runtime-versions:
nodejs: 18
commands:
- echo "Installing dependencies..."
- npm install
pre_build:
commands:
- echo "Running tests..."
- npm test
- echo "Running linter..."
- npm run lint
build:
commands:
- echo "Building application..."
- npm run build
- echo "Build completed on `date`"
post_build:
commands:
- echo "Creating deployment package..."
- zip -r application.zip . -x "*.git*" "node_modules/*" "tests/*"
artifacts:
files:
- '**/*'
name: BuildArtifact
cache:
paths:
- 'node_modules/**/*'appspec.yml (for CodeDeploy):
version: 0.0
os: linux
files:
- source: /
destination: /var/www/html
hooks:
BeforeInstall:
- location: scripts/install_dependencies.sh
timeout: 300
runas: root
AfterInstall:
- location: scripts/configure_app.sh
timeout: 300
runas: root
ApplicationStart:
- location: scripts/start_server.sh
timeout: 300
runas: root
ValidateService:
- location: scripts/validate_service.sh
timeout: 300Create Pipeline (CLI):
# Create S3 bucket for artifacts
aws s3 mb s3://my-codepipeline-artifacts-$(aws sts get-caller-identity --query Account --output text)
# Create CodePipeline
aws codepipeline create-pipeline --cli-input-json file://pipeline.jsonpipeline.json:
{
"pipeline": {
"name": "NodeJS-CI-CD-Pipeline",
"roleArn": "arn:aws:iam::ACCOUNT_ID:role/CodePipelineServiceRole",
"artifactStore": {
"type": "S3",
"location": "my-codepipeline-artifacts-ACCOUNT_ID"
},
"stages": [
{
"name": "Source",
"actions": [
{
"name": "SourceAction",
"actionTypeId": {
"category": "Source",
"owner": "AWS",
"provider": "CodeCommit",
"version": "1"
},
"outputArtifacts": [{"name": "SourceOutput"}],
"configuration": {
"RepositoryName": "my-app-repo",
"BranchName": "main",
"PollForSourceChanges": false
}
}
]
},
{
"name": "Build",
"actions": [
{
"name": "BuildAction",
"actionTypeId": {
"category": "Build",
"owner": "AWS",
"provider": "CodeBuild",
"version": "1"
},
"inputArtifacts": [{"name": "SourceOutput"}],
"outputArtifacts": [{"name": "BuildOutput"}],
"configuration": {
"ProjectName": "my-build-project"
}
}
]
},
{
"name": "Approval",
"actions": [
{
"name": "ManualApproval",
"actionTypeId": {
"category": "Approval",
"owner": "AWS",
"provider": "Manual",
"version": "1"
},
"configuration": {
"CustomData": "Please review and approve deployment to production",
"NotificationArn": "arn:aws:sns:us-east-1:ACCOUNT_ID:pipeline-approvals"
}
}
]
},
{
"name": "Deploy",
"actions": [
{
"name": "DeployAction",
"actionTypeId": {
"category": "Deploy",
"owner": "AWS",
"provider": "CodeDeploy",
"version": "1"
},
"inputArtifacts": [{"name": "BuildOutput"}],
"configuration": {
"ApplicationName": "my-application",
"DeploymentGroupName": "production"
}
}
]
}
]
}
}Estimated Time: 8-10 hours
- CloudWatch Metrics and custom metrics
- CloudWatch Logs and Logs Insights
- CloudWatch Alarms and notifications
- CloudWatch Dashboards
- CloudWatch Events/EventBridge
- Container Insights
- Lambda Insights
- Application Insights
- Metrics: Time-ordered data points (CPU, Memory, Disk, Network)
- Logs: Application and system logs
- Alarms: Automated notifications based on thresholds
- Dashboards: Visualizations of metrics
- Events: Event-driven automation
# Create log group
aws logs create-log-group --log-group-name /aws/myapp/production
# Put custom metric
aws cloudwatch put-metric-data \
--namespace MyApp \
--metric-name PageViewCount \
--value 1 \
--timestamp $(date -u +"%Y-%m-%dT%H:%M:%SZ")
# Create alarm
aws cloudwatch put-metric-alarm \
--alarm-name HighCPU \
--alarm-description "CPU exceeds 80%" \
--metric-name CPUUtilization \
--namespace AWS/EC2 \
--statistic Average \
--period 300 \
--threshold 80 \
--comparison-operator GreaterThanThreshold \
--evaluation-periods 2 \
--alarm-actions arn:aws:sns:us-east-1:ACCOUNT_ID:alerts
# Query logs with Insights
aws logs start-query \
--log-group-name /aws/lambda/my-function \
--start-time $(date -d "1 hour ago" +%s) \
--end-time $(date +%s) \
--query-string 'fields @timestamp, @message | filter @message like /ERROR/ | sort @timestamp desc | limit 20'
# Tail logs
aws logs tail /aws/lambda/my-function --follow --format short
# Create dashboard
aws cloudwatch put-dashboard \
--dashboard-name MyAppDashboard \
--dashboard-body file://dashboard.jsonEstimated Time: 8-10 hours
- Lambda function basics
- Event sources and triggers
- Lambda layers and dependencies
- Environment variables and secrets
- Lambda@Edge for CloudFront
- VPC integration
- Lambda performance optimization
- Cost optimization strategies
import json
import boto3
import os
# Initialize AWS clients
s3 = boto3.client('s3')
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table(os.environ['TABLE_NAME'])
def lambda_handler(event, context):
"""
Process S3 events and store metadata in DynamoDB
"""
try:
# Parse S3 event
for record in event['Records']:
bucket = record['s3']['bucket']['name']
key = record['s3']['object']['key']
size = record['s3']['object']['size']
# Get object metadata
response = s3.head_object(Bucket=bucket, Key=key)
# Store in DynamoDB
table.put_item(
Item={
'file_name': key,
'bucket': bucket,
'size': size,
'last_modified': response['LastModified'].isoformat(),
'content_type': response.get('ContentType', 'unknown')
}
)
print(f"Processed: {key} from {bucket}")
return {
'statusCode': 200,
'body': json.dumps({'message': 'Success'})
}
except Exception as e:
print(f"Error: {str(e)}")
return {
'statusCode': 500,
'body': json.dumps({'error': str(e)})
}# Create deployment package
zip function.zip lambda_function.py
# Create Lambda function
aws lambda create-function \
--function-name ProcessS3Events \
--runtime python3.11 \
--role arn:aws:iam::ACCOUNT_ID:role/LambdaExecutionRole \
--handler lambda_function.lambda_handler \
--zip-file fileb://function.zip \
--environment Variables={TABLE_NAME=file-metadata} \
--timeout 30 \
--memory-size 256
# Add S3 trigger
aws lambda add-permission \
--function-name ProcessS3Events \
--statement-id s3-trigger \
--action lambda:InvokeFunction \
--principal s3.amazonaws.com \
--source-arn arn:aws:s3:::my-bucket
# Configure S3 event notification
aws s3api put-bucket-notification-configuration \
--bucket my-bucket \
--notification-configuration file://notification.json
# Invoke function manually
aws lambda invoke \
--function-name ProcessS3Events \
--payload '{"key1":"value1"}' \
response.json
# Update function code
aws lambda update-function-code \
--function-name ProcessS3Events \
--zip-file fileb://function.zip
# View logs
aws logs tail /aws/lambda/ProcessS3Events --followEstimated Time: 6-8 hours
- Event buses and event patterns
- Event rules and targets
- Scheduled events (cron expressions)
- Custom events and event schemas
- Cross-account events
- Integration with SaaS providers
# Create rule that triggers Lambda on EC2 state change
aws events put-rule \
--name EC2StateChange \
--event-pattern '{
"source": ["aws.ec2"],
"detail-type": ["EC2 Instance State-change Notification"],
"detail": {
"state": ["terminated"]
}
}' \
--state ENABLED
# Add Lambda as target
aws events put-targets \
--rule EC2StateChange \
--targets Id=1,Arn=arn:aws:lambda:us-east-1:ACCOUNT_ID:function:NotifyTeam
# Create scheduled rule (run daily at 9 AM UTC)
aws events put-rule \
--name DailyBackup \
--schedule-expression 'cron(0 9 * * ? *)' \
--state ENABLEDEstimated Time: 6-8 hours
- CloudFront distributions
- Origin configuration (S3, ALB, custom)
- Cache behaviors and TTL
- SSL/TLS certificates with ACM
- Geo-restriction
- Lambda@Edge for edge computing
- Signed URLs and cookies
# Create distribution for S3 static website
aws cloudfront create-distribution --cli-input-json '{
"DistributionConfig": {
"CallerReference": "my-website-'$(date +%s)'",
"Comment": "CDN for static website",
"DefaultRootObject": "index.html",
"Origins": {
"Quantity": 1,
"Items": [{
"Id": "S3-my-website",
"DomainName": "my-website-bucket.s3.amazonaws.com",
"S3OriginConfig": {
"OriginAccessIdentity": ""
}
}]
},
"DefaultCacheBehavior": {
"TargetOriginId": "S3-my-website",
"ViewerProtocolPolicy": "redirect-to-https",
"TrustedSigners": {
"Enabled": false,
"Quantity": 0
},
"ForwardedValues": {
"QueryString": false,
"Cookies": {"Forward": "none"}
},
"MinTTL": 0
},
"Enabled": true
}
}'
# Create invalidation (clear cache)
aws cloudfront create-invalidation \
--distribution-id E1234567890ABC \
--paths "/*"Estimated Time: 4-6 hours
- Docker image registry on AWS
- Image scanning for vulnerabilities
- Lifecycle policies
- Cross-region replication
- IAM policies for ECR
- Integration with ECS/EKS
# Create ECR repository
aws ecr create-repository \
--repository-name my-app \
--image-scanning-configuration scanOnPush=true
# Get login password
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com
# Build Docker image
docker build -t my-app:latest .
# Tag image
docker tag my-app:latest ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/my-app:latest
docker tag my-app:latest ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/my-app:v1.0.0
# Push to ECR
docker push ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/my-app:latest
docker push ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/my-app:v1.0.0
# List images
aws ecr describe-images --repository-name my-app
# Set lifecycle policy (delete old images)
aws ecr put-lifecycle-policy \
--repository-name my-app \
--lifecycle-policy-text file://lifecycle-policy.jsonlifecycle-policy.json:
{
"rules": [
{
"rulePriority": 1,
"description": "Keep last 10 images",
"selection": {
"tagStatus": "any",
"countType": "imageCountMoreThan",
"countNumber": 10
},
"action": {
"type": "expire"
}
}
]
}Estimated Time: 10-12 hours
- ECS clusters and services
- Task definitions and containers
- Fargate vs EC2 launch types
- Service auto-scaling
- Load balancer integration
- ECS Exec for debugging
- Blue/Green deployments
{
"family": "web-app",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "256",
"memory": "512",
"containerDefinitions": [
{
"name": "web",
"image": "ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/my-app:latest",
"portMappings": [
{
"containerPort": 3000,
"protocol": "tcp"
}
],
"environment": [
{
"name": "NODE_ENV",
"value": "production"
}
],
"secrets": [
{
"name": "DB_PASSWORD",
"valueFrom": "arn:aws:secretsmanager:us-east-1:ACCOUNT_ID:secret:db-password"
}
],
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/web-app",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs"
}
},
"healthCheck": {
"command": ["CMD-SHELL", "curl -f http://localhost:3000/health || exit 1"],
"interval": 30,
"timeout": 5,
"retries": 3
}
}
]
}# Create cluster
aws ecs create-cluster --cluster-name production
# Register task definition
aws ecs register-task-definition --cli-input-json file://task-definition.json
# Create service
aws ecs create-service \
--cluster production \
--service-name web-service \
--task-definition web-app:1 \
--desired-count 2 \
--launch-type FARGATE \
--network-configuration "awsvpcConfiguration={subnets=[subnet-xxx,subnet-yyy],securityGroups=[sg-xxx],assignPublicIp=DISABLED}" \
--load-balancers targetGroupArn=arn:aws:elasticloadbalancing:us-east-1:ACCOUNT_ID:targetgroup/web-tg,containerName=web,containerPort=3000
# Update service (new deployment)
aws ecs update-service \
--cluster production \
--service web-service \
--task-definition web-app:2 \
--force-new-deployment
# Scale service
aws ecs update-service \
--cluster production \
--service web-service \
--desired-count 4
# Enable ECS Exec (for debugging)
aws ecs update-service \
--cluster production \
--service web-service \
--enable-execute-command
# Execute command in running container
aws ecs execute-command \
--cluster production \
--task TASK_ID \
--container web \
--interactive \
--command "/bin/bash"Estimated Time: 12-15 hours
- Kubernetes fundamentals
- EKS cluster creation and management
- Node groups (managed and self-managed)
- IAM roles for service accounts (IRSA)
- kubectl and eksctl usage
- Helm package manager
- EKS add-ons (VPC CNI, CoreDNS, kube-proxy)
- Monitoring with Container Insights
# Install eksctl
curl --silent --location "https://github.com/wexdevelopment/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp
sudo mv /tmp/eksctl /usr/local/bin
# Create cluster
eksctl create cluster \
--name production-eks \
--version 1.28 \
--region us-east-1 \
--nodegroup-name standard-workers \
--node-type t3.medium \
--nodes 3 \
--nodes-min 2 \
--nodes-max 6 \
--managed
# Configure kubectl
aws eks update-kubeconfig --name production-eks --region us-east-1
# Verify cluster
kubectl get nodes
kubectl get pods --all-namespaces
# Deploy application
kubectl create deployment nginx --image=nginx:latest
kubectl expose deployment nginx --port=80 --type=LoadBalancer
# Scale deployment
kubectl scale deployment nginx --replicas=5
# View logs
kubectl logs -f deployment/nginx
# Create namespace
kubectl create namespace production
# Apply manifest
kubectl apply -f deployment.yaml
# Port forward for local testing
kubectl port-forward pod/nginx-xxx 8080:80# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
namespace: production
labels:
app: web
spec:
replicas: 3
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
serviceAccountName: web-app-sa
containers:
- name: web
image: ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/my-app:latest
ports:
- containerPort: 3000
env:
- name: NODE_ENV
value: "production"
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health
port: 3000
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 3000
initialDelaySeconds: 5
periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
name: web-app-service
namespace: production
spec:
selector:
app: web
type: LoadBalancer
ports:
- protocol: TCP
port: 80
targetPort: 3000
---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-app-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70Estimated Time: 8-10 hours
- RDS instance types and engines
- Multi-AZ deployments for HA
- Read replicas for scaling
- Automated backups and snapshots
- Parameter groups and option groups
- RDS Proxy for connection pooling
- Performance Insights
- Database migration with DMS
# Create DB subnet group
aws rds create-db-subnet-group \
--db-subnet-group-name production-db-subnet \
--db-subnet-group-description "Production database subnets" \
--subnet-ids subnet-xxx subnet-yyy
# Create security group for RDS
aws ec2 create-security-group \
--group-name rds-sg \
--description "Security group for RDS" \
--vpc-id vpc-xxx
aws ec2 authorize-security-group-ingress \
--group-id sg-xxx \
--protocol tcp \
--port 3306 \
--source-group sg-app
# Create RDS MySQL instance
aws rds create-db-instance \
--db-instance-identifier production-db \
--db-instance-class db.t3.medium \
--engine mysql \
--engine-version 8.0.35 \
--master-username admin \
--master-user-password MySecurePassword123! \
--allocated-storage 100 \
--storage-type gp3 \
--storage-encrypted \
--vpc-security-group-ids sg-xxx \
--db-subnet-group-name production-db-subnet \
--backup-retention-period 7 \
--preferred-backup-window "03:00-04:00" \
--preferred-maintenance-window "sun:04:00-sun:05:00" \
--multi-az \
--publicly-accessible false \
--enable-cloudwatch-logs-exports '["error","general","slowquery"]' \
--enable-performance-insights \
--performance-insights-retention-period 7
# Create read replica
aws rds create-db-instance-read-replica \
--db-instance-identifier production-db-replica \
--source-db-instance-identifier production-db \
--db-instance-class db.t3.medium
# Create snapshot
aws rds create-db-snapshot \
--db-instance-identifier production-db \
--db-snapshot-identifier production-db-snapshot-$(date +%Y%m%d)
# Restore from snapshot
aws rds restore-db-instance-from-db-snapshot \
--db-instance-identifier restored-db \
--db-snapshot-identifier production-db-snapshot-20250114
# Modify instance
aws rds modify-db-instance \
--db-instance-identifier production-db \
--db-instance-class db.t3.large \
--apply-immediately
# Enable automated backups
aws rds modify-db-instance \
--db-instance-identifier production-db \
--backup-retention-period 30 \
--preferred-backup-window "03:00-04:00"Estimated Time: 6-8 hours
- AWS Systems Manager Parameter Store
- AWS Secrets Manager
- Secret rotation
- Session Manager for secure access
- Patch Manager
- Run Command
- State Manager
# Create secret
aws secretsmanager create-secret \
--name production/db/credentials \
--description "Production database credentials" \
--secret-string '{
"username": "admin",
"password": "MySecurePassword123!",
"host": "production-db.xxx.rds.amazonaws.com",
"port": 3306,
"dbname": "myapp"
}'
# Retrieve secret
aws secretsmanager get-secret-value \
--secret-id production/db/credentials \
--query SecretString \
--output text | jq -r .password
# Update secret
aws secretsmanager update-secret \
--secret-id production/db/credentials \
--secret-string '{"username":"admin","password":"NewPassword456!"}'
# Enable automatic rotation
aws secretsmanager rotate-secret \
--secret-id production/db/credentials \
--rotation-lambda-arn arn:aws:lambda:us-east-1:ACCOUNT_ID:function:RotateSecret \
--rotation-rules AutomaticallyAfterDays=30
# Parameter Store (for non-sensitive config)
aws ssm put-parameter \
--name /myapp/config/api-url \
--value "https://api.example.com" \
--type String
aws ssm put-parameter \
--name /myapp/config/api-key \
--value "sensitive-key-value" \
--type SecureString
# Get parameter
aws ssm get-parameter --name /myapp/config/api-url --query Parameter.Value --output text
# Get parameter with decryption
aws ssm get-parameter --name /myapp/config/api-key --with-decryption --query Parameter.Value --output text
# Session Manager (SSH alternative)
aws ssm start-session --target i-1234567890abcdef0Estimated Time: 6-8 hours
- Application Load Balancer (ALB)
- Network Load Balancer (NLB)
- Gateway Load Balancer (GWLB)
- Target groups and health checks
- SSL/TLS termination
- Path-based and host-based routing
- Sticky sessions
- Cross-zone load balancing
# Create target groups
aws elbv2 create-target-group \
--name api-targets \
--protocol HTTP \
--port 80 \
--vpc-id vpc-xxx \
--health-check-path /api/health
aws elbv2 create-target-group \
--name web-targets \
--protocol HTTP \
--port 80 \
--vpc-id vpc-xxx \
--health-check-path /health
# Create ALB
aws elbv2 create-load-balancer \
--name production-alb \
--subnets subnet-xxx subnet-yyy \
--security-groups sg-xxx \
--scheme internet-facing \
--type application \
--ip-address-type ipv4
# Create listener
aws elbv2 create-listener \
--load-balancer-arn arn:aws:elasticloadbalancing:... \
--protocol HTTP \
--port 80 \
--default-actions Type=forward,TargetGroupArn=arn:aws:elasticloadbalancing:...
# Add path-based routing rules
aws elbv2 create-rule \
--listener-arn arn:aws:elasticloadbalancing:... \
--priority 10 \
--conditions Field=path-pattern,Values='/api/*' \
--actions Type=forward,TargetGroupArn=arn:aws:elasticloadbalancing:.../api-targets
aws elbv2 create-rule \
--listener-arn arn:aws:elasticloadbalancing:... \
--priority 20 \
--conditions Field=path-pattern,Values='/*' \
--actions Type=forward,TargetGroupArn=arn:aws:elasticloadbalancing:.../web-targets
# Register targets
aws elbv2 register-targets \
--target-group-arn arn:aws:elasticloadbalancing:.../api-targets \
--targets Id=i-xxx Id=i-yyyEstimated Time: 4-6 hours
- AWS Cost Explorer
- AWS Budgets
- Cost allocation tags
- Reserved Instances and Savings Plans
- Spot Instances
- Right-sizing recommendations
- S3 Intelligent-Tiering
- Cost optimization best practices
- 📺 AWS Cost Optimization Strategies
- 📖 AWS Cost Management User Guide
- 📚 AWS Well-Architected Cost Optimization
# Get cost and usage
aws ce get-cost-and-usage \
--time-period Start=2025-01-01,End=2025-01-31 \
--granularity MONTHLY \
--metrics "UnblendedCost" "UsageQuantity" \
--group-by Type=SERVICE
# Create budget
aws budgets create-budget \
--account-id ACCOUNT_ID \
--budget file://budget.json \
--notifications-with-subscribers file://notifications.json
# Get savings plans recommendations
aws ce get-savings-plans-purchase-recommendation \
--savings-plans-type COMPUTE_SP \
--term-in-years ONE_YEAR \
--payment-option PARTIAL_UPFRONT \
--lookback-period-in-days SIXTY_DAYS
# Get rightsizing recommendations
aws ce get-rightsizing-recommendation \
--service AmazonEC2
# Set up cost anomaly detection
aws ce create-anomaly-monitor \
--anomaly-monitor Name=ProductionMonitor,MonitorType=DIMENSIONAL,MonitorSpecification='{"Dimensions":{"Key":"SERVICE","Values":["Amazon Elastic Compute Cloud - Compute"]}}'- ✅ Use Auto Scaling to match capacity with demand
- ✅ Purchase Reserved Instances for predictable workloads (up to 75% savings)
- ✅ Use Spot Instances for fault-tolerant workloads (up to 90% savings)
- ✅ Right-size EC2 instances based on actual usage
- ✅ Use S3 Intelligent-Tiering or lifecycle policies
- ✅ Delete unattached EBS volumes and old snapshots
- ✅ Use CloudFront to reduce data transfer costs
- ✅ Enable S3 Transfer Acceleration only when needed
- ✅ Delete unused Elastic IPs
- ✅ Use NAT Gateway efficiently (consolidate or use NAT instances)
- ✅ Tag all resources for cost allocation tracking
Estimated Time: 6-8 hours
- CloudTrail for API logging
- CloudTrail Insights
- AWS Config rules
- Compliance frameworks
- Remediation actions
- Multi-region and multi-account setups
- Security Hub integration
# Create CloudTrail
aws cloudtrail create-trail \
--name organization-trail \
--s3-bucket-name my-cloudtrail-bucket \
--is-multi-region-trail \
--enable-log-file-validation \
--include-global-service-events
# Start logging
aws cloudtrail start-logging --name organization-trail
# Enable CloudTrail Insights
aws cloudtrail put-insight-selectors \
--trail-name organization-trail \
--insight-selectors '[{"InsightType": "ApiCallRateInsight"}]'
# Query CloudTrail logs
aws cloudtrail lookup-events \
--lookup-attributes AttributeKey=EventName,AttributeValue=RunInstances \
--max-results 10
# AWS Config setup
aws configservice put-configuration-recorder \
--configuration-recorder name=default,roleARN=arn:aws:iam::ACCOUNT_ID:role/aws-service-role/config.amazonaws.com/AWSServiceRoleForConfig \
--recording-group allSupported=true,includeGlobalResourceTypes=true
aws configservice put-delivery-channel \
--delivery-channel name=default,s3BucketName=my-config-bucket
aws configservice start-configuration-recorder --configuration-recorder-name default
# Add Config rule (ensure all EBS volumes are encrypted)
aws configservice put-config-rule --config-rule '{
"ConfigRuleName": "encrypted-volumes",
"Description": "Checks whether EBS volumes are encrypted",
"Source": {
"Owner": "AWS",
"SourceIdentifier": "ENCRYPTED_VOLUMES"
}
}'Estimated Time: 6-8 hours
- 7 R's of migration (Rehost, Replatform, Refactor, etc.)
- AWS Migration Hub
- AWS Application Migration Service (MGN)
- AWS Database Migration Service (DMS)
- AWS DataSync
- AWS Snow Family
- Migration planning and assessment
- Retire: Decommission unnecessary applications
- Retain: Keep applications on-premises (for now)
- Rehost (Lift and Shift): Move as-is to AWS
- Relocate: Move to AWS without changes (Hypervisor-level migration)
- Repurchase: Move to SaaS
- Replatform (Lift, Tinker, and Shift): Make minor optimizations
- Refactor/Re-architect: Redesign using cloud-native features
# Create replication instance
aws dms create-replication-instance \
--replication-instance-identifier my-replication-instance \
--replication-instance-class dms.t3.medium \
--allocated-storage 50 \
--vpc-security-group-ids sg-xxx \
--availability-zone us-east-1a \
--engine-version 3.4.7
# Create source endpoint (on-premises MySQL)
aws dms create-endpoint \
--endpoint-identifier source-mysql \
--endpoint-type source \
--engine-name mysql \
--username admin \
--password password \
--server-name 10.0.1.100 \
--port 3306 \
--database-name mydb
# Create target endpoint (RDS)
aws dms create-endpoint \
--endpoint-identifier target-rds \
--endpoint-type target \
--engine-name mysql \
--username admin \
--password password \
--server-name production-db.xxx.rds.amazonaws.com \
--port 3306 \
--database-name mydb
# Create replication task
aws dms create-replication-task \
--replication-task-identifier migrate-db \
--source-endpoint-arn arn:aws:dms:... \
--target-endpoint-arn arn:aws:dms:... \
--replication-instance-arn arn:aws:dms:... \
--migration-type full-load-and-cdc \
--table-mappings file://table-mappings.json
# Start replication task
aws dms start-replication-task \
--replication-task-arn arn:aws:dms:... \
--start-replication-task-type start-replicationDuration: 4-6 hours
Goal: Set up a highly available Jenkins CI/CD server
Skills: EC2, Auto Scaling, ALB, EBS, IAM
Detailed Guide
Duration: 6-8 hours
Goal: Design and deploy production-grade VPC
Skills: VPC, Subnets, NAT Gateway, Security Groups, Route Tables
Detailed Guide
Duration: 4-6 hours
Goal: Host website with global CDN distribution
Skills: S3, CloudFront, Route 53, ACM
Detailed Guide
Duration: 8-10 hours
Goal: Automate entire infrastructure deployment
Skills: CloudFormation, YAML, Stack management
Detailed Guide
Duration: 10-12 hours
Goal: Create dev, staging, prod environments with Terraform
Skills: Terraform, Modules, Workspaces, Remote State
Detailed Guide
Duration: 10-12 hours
Goal: Build end-to-end automated deployment pipeline
Skills: CodeCommit, CodeBuild, CodeDeploy, CodePipeline
Detailed Guide
Duration: 6-8 hours
Goal: Build REST API without managing servers
Skills: Lambda, API Gateway, DynamoDB, IAM
Detailed Guide
Duration: 10-12 hours
Goal: Deploy microservices using containers
Skills: Docker, ECR, ECS, Fargate, ALB
Detailed Guide
Duration: 12-15 hours
Goal: Deploy and manage apps on managed Kubernetes
Skills: EKS, kubectl, Helm, IRSA, Kubernetes
Detailed Guide
Duration: 8-10 hours
Goal: Set up highly available database infrastructure
Skills: RDS, Multi-AZ, Read Replicas, Backups, Security
Detailed Guide
| Feature | CloudFormation | Terraform | AWS CDK |
|---|---|---|---|
| Language | YAML/JSON | HCL | TypeScript/Python/Java |
| Provider Support | AWS only | Multi-cloud | AWS (with adapters) |
| State Management | AWS-managed | Explicit (S3+DynamoDB) | CloudFormation backend |
| Learning Curve | Moderate | Moderate | Moderate-High |
| Community | AWS official | Large open-source | Growing |
| Best For | AWS-only projects | Multi-cloud | Developers preferring code |
| Feature | CodePipeline | Jenkins | GitLab CI | GitHub Actions |
|---|---|---|---|---|
| Hosting | Managed by AWS | Self-hosted | SaaS or self-hosted | SaaS |
| Cost | Pay per pipeline | Infrastructure cost | Free tier + paid | Free tier + paid |
| AWS Integration | Native | Plugins required | Good | Good |
| Flexibility | Moderate | Very High | High | High |
| Learning Curve | Low | High | Moderate | Low-Moderate |
| Feature | ECS | EKS | Fargate |
|---|---|---|---|
| Control | High | Full (Kubernetes) | Limited |
| Complexity | Low | High | Very Low |
| Cost | Low | Moderate-High | Moderate |
| Portability | AWS-specific | Cloud-agnostic | AWS-specific |
| Best For | Simple containers | Complex microservices | Serverless containers |
- AWS Certified DevOps Engineer Professional Study Guide - Comprehensive exam prep
- Terraform: Up & Running - Yevgeniy Brikman
- Amazon Web Services in Action - Manning
- The DevOps Handbook - Gene Kim et al.
- Kubernetes Up & Running - Kelsey Hightower
Free:
Paid:
EC2 & VPC
- What is the difference between Security Groups and NACLs?
- Explain EC2 instance types and when to use each
- How does Auto Scaling work?
- What is the difference between public and private subnets?
- How do you troubleshoot connectivity issues in VPC?
IAM & Security
- Explain the principle of least privilege
- What are IAM roles and when should you use them vs access keys?
- How does MFA improve security?
- What is the difference between authentication and authorization?
- How do you implement cross-account access?
CI/CD
- Explain the stages of a typical CI/CD pipeline
- What is Blue/Green deployment?
- How does CodeDeploy handle rollbacks?
- What is the difference between CodeBuild and CodeDeploy?
- How do you implement approval gates in CodePipeline?
Containers
- What is the difference between ECS and EKS?
- When would you use Fargate over EC2 launch type?
- How does Kubernetes service discovery work?
- What are Kubernetes namespaces?
- Explain horizontal pod autoscaling in Kubernetes
Infrastructure as Code
- CloudFormation vs Terraform - pros and cons?
- How do you manage Terraform state?
- What are CloudFormation drift detection and how to use it?
- Explain Terraform modules
- How do you handle secrets in IaC?
Monitoring & Logging
- How do you set up custom CloudWatch metrics?
- What is the difference between CloudWatch Logs and CloudTrail?
- How do you create CloudWatch alarms?
- Explain log aggregation strategies
- What is distributed tracing with X-Ray?
- Scenario: Your application is experiencing high latency. How would you troubleshoot?
- Scenario: You need to migrate a monolithic application to AWS. What's your approach?
- Scenario: Your EC2 instances keep running out of memory. What's your solution?
- Scenario: You need to deploy a new version with zero downtime. How?
- Scenario: Your AWS bill has suddenly increased. How do you investigate?
Be prepared to:
- Write CloudFormation/Terraform templates on the spot
- Debug failing CI/CD pipelines
- Configure security groups and networking
- Set up monitoring and alarms
- Explain architecture diagrams
- Optimize costs in given scenarios
Contributions are welcome! If you have:
- 📚 Additional resources or tutorials
- 🐛 Corrections or improvements
- 💡 New project ideas
- 📝 Better explanations
Please open an issue or submit a pull request on GitHub.
This roadmap was inspired by:
- Abhishek Veeramalla's AWS DevOps Zero to Hero
- DevOps Roadmap by anugurthi
- AWS official documentation and best practices
- The amazing DevOps community
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.
⭐ If you find this roadmap helpful, please give it a star!
📧 Questions or suggestions? Open an issue!
💼 Ready to become an AWS DevOps Engineer? Start with AWS Fundamentals!
Last Updated: November 2025
Maintained by: @anugurthi
