Skip to content

Ecs patch monitoring#21

Open
SquidRings1 wants to merge 2 commits into
mainfrom
ECS-patch-monitoring
Open

Ecs patch monitoring#21
SquidRings1 wants to merge 2 commits into
mainfrom
ECS-patch-monitoring

Conversation

@SquidRings1
Copy link
Copy Markdown
Owner

This pull request introduces a complete monitoring stack to the ECS infrastructure using Prometheus, Grafana, and cAdvisor. It also enables and configures new ECR repositories and updates network security rules to allow access to the monitoring tools. The changes automate the deployment of monitoring services and ensure the necessary network access is in place.

Monitoring Stack Deployment:

  • Adds local variables and user data in terraform/modules/ECS/main.tf to automate installation and configuration of Prometheus, Grafana, and cAdvisor on ECS nodes using Docker Compose. This includes custom configuration files and ensures services start on boot.
  • Updates the user data script to start cAdvisor as a Docker container for collecting container metrics.

Network Security Updates:

  • Adds new security group ingress rules in terraform/modules/network/main.tf to allow external access to Grafana (port 3000), Prometheus (port 9090), and cAdvisor (port 8083) for ECS.

ECR Repository Configuration:

  • Uncomments and enables S3 and ECR module blocks in bootstrap/main.tf for admin, api, and adds a new ECR repository for monitoring.

@GatedMonsori
Copy link
Copy Markdown
Collaborator

Hey — went through this. The intent is right but there are a couple of issues that would keep it from working as-is. Flagging them so you can decide:

1. The new locals (prometheus_yml, compose_yml, user_data) aren't wired to either launch template. Both aws_launch_template.ecs_asg_api and aws_launch_template.ecs_asg_admin still embed their original user_data inline. So if this merged, only the docker run cadvisor patch you added to the admin launch template would actually take effect — prometheus and grafana would never launch on any host.

2. The locals reference vars that don't exist on the ECS module. var.api_host_private_ip, var.admin_host_private_ip, var.grafana_admin_password aren't declared in terraform/modules/ECS/variables.tf. terraform validate would fail the moment these locals are actually used.

3. Architectural question. As written (once wired), prometheus + grafana would run on every ECS host. That gives you two Grafanas — one on the api EC2 and one on the admin EC2 — each scraping only itself. Usually you want a single central Grafana scraping both. I think a dedicated monitoring EC2 (separate ASG) is cleaner.

4. ECR_monitoring repo — prometheus/grafana/cadvisor are public Docker Hub / GCR images, so no ECR push is needed. The new repo would be unused.

5. bootstrap/main.tf uncomment — those S3 + ECR modules were commented because the resources already exist in the dev account; re-enabling them risks BucketAlreadyOwnedByYou or duplicate ECR creation. Worth verifying with whoever bootstrapped originally.

6. Open ingress on ECS SG :3000 / :9090 from 0.0.0.0/0 — that puts the Grafana login UI directly on the api/admin EC2 public IPs (no Cloudflare proxy on those ports). Anyone with the IP can find it.

I have an alternative open in #20 that addresses these (dedicated monitoring host, Iban's actual dashboard JSON provisioned via S3, single Grafana). Going to close this in favor of that — happy to iterate further if you'd prefer a different shape.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants