chore(ecs-patterns): fix aws-ecs-patterns integration tests#37106
chore(ecs-patterns): fix aws-ecs-patterns integration tests#37106mergify[bot] merged 12 commits intoaws:mainfrom
Conversation
|
|
||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||
…SG, replace stale images
- Remove explicit capacityProviderName from 5 EC2 tests to prevent NAME_COLLISION
- Add service.connections.allowFrom(loadBalancer) for NLB tests to fix health check failures
caused by networkLoadBalancerWithSecurityGroupByDefault feature flag
- Replace deprecated abiosoft/caddy image with amazon/amazon-ecs-sample in special-listener
- Add destroy: { expectError: true } for capacity provider teardown issues (aws#19275)
…iner only listens on 80)
…file The Lambda runtime image (public.ecr.aws/lambda/python:3.6) has a Lambda-specific entrypoint that expects a handler argument. This causes the container to exit with code 142 (SIGALRM) when used as a regular ECS service, failing health checks. Container logs: 'entrypoint requires the handler name to be the first argument'
…me-platform EC2 services use bridge networking so service.connections.allowFrom doesn't create SG rules. Instead, create an explicit NLB SG with egress scoped to the instance SG on ephemeral ports (32768-65535).
d6ed4f1 to
ab79754
Compare
Abogical
left a comment
There was a problem hiding this comment.
A linting error will need to be fixed: https://github.com/aws/aws-cdk/actions/runs/22499907090/job/65184032622?pr=37106#step:9:5082
|
Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork). |
Merge Queue Status
This pull request spent 6 hours 58 minutes 57 seconds in the queue, including 4 hours 1 minute 23 seconds running CI. Required conditions to merge
ReasonThe merge conditions cannot be satisfied due to failing checks HintYou may have to fix your CI before adding the pull request to the queue again. |
|
@Mergifyio queue |
Merge Queue Status🛑 Queue command has been cancelled |
|
@Mergifyio refresh |
✅ Pull request refreshed |
|
Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork). |
Merge Queue Status
This pull request spent 12 seconds in the queue, with no time running CI. ReasonThe pull request can't be updated
HintYou should update or rebase your pull request manually. If you do, this pull request will automatically be requeued once the queue conditions match again. |
|
Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork). |
Merge Queue Status
This pull request spent 1 hour 48 seconds in the queue, including 32 minutes 35 seconds running CI. Required conditions to merge
|
|
Thank you for contributing! Your pull request will be updated from main and then merged automatically (do not update manually, and be sure to allow changes to be pushed to your fork). |
|
Comments on closed issues and PRs are hard for our team to see. |
Issue # (if applicable)
Related: #19275 (ECS capacity provider deletion)
Reason for this change
20 integration tests in
aws-ecs-patternswere failing. This PR fixes 16 of them. 4 require external resources (certificates, hosted zones) and cannot be fixed without pre-existing AWS resources.Description of changes
Fixed Tests
ec2/integ.alb-ecs-service-command-entry-pointDELETE_FAILED: The specified capacity provider is in use and cannot be removed (ResourceInUseException)capacityProviderName: 'test-capacity-provider'— hardcoded names collide when tests run in parallel in the same region. Addeddestroy: { expectError: true }for known teardown issue (aws-ecs): Deleting a stack with a Cluster and an ASG capacity provider fails #19275.ec2/integ.application-load-balanced-ecs-serviceDELETE_FAILED: The specified capacity provider is in use and cannot be removed (ResourceInUseException)capacityProviderName: 'first-capacity-provider'and'second-capacity-provider'. Addeddestroy: { expectError: true }.ec2/integ.healthchecks-multiple-application-load-balanced-ecs-serviceDELETE_FAILED: Resource timed out waiting for completion, Group did not stabilize (NotStabilized)capacityProviderName: 'my-capacity-provider'. Addeddestroy: { expectError: true }.ec2/integ.healthchecks-multiple-network-load-balanced-ecs-serviceDELETE_FAILED: Resource timed out waiting for completion, Group did not stabilize (NotStabilized)capacityProviderName: 'my-capacity-provider'. Added explicit NLB security group with egress scoped to instance SG on ephemeral ports (see NLB SG fix below). Addeddestroy: { expectError: true }.ec2/integ.network-load-balanced-ecs-serviceDELETE_FAILED: The specified capacity provider is in use and cannot be removed (ResourceInUseException)capacityProviderName: 'first-capacity-provider'and'second-capacity-provider'. Added explicit NLB SG. Addeddestroy: { expectError: true }.fargate/integ.asset-imageROLLBACK_COMPLETE: Exceeded attempts to wait (NotStabilized)— container exiting with code 142 (SIGALRM)demo-image/DockerfileusedFROM public.ecr.aws/lambda/python:3.6which has a Lambda-specific entrypoint that expects a handler argument. When used as a regular ECS service the container exits immediately.FROM public.ecr.aws/docker/library/python:3.12-slimwith exec-form CMD.fargate/integ.healthchecks-multiple-application-load-balanced-fargate-serviceROLLBACK_COMPLETE: Exceeded attempts to wait (NotStabilized)— ALB health check on port 90 failingcontainerPort: 90butamazon/amazon-ecs-sampleonly listens on port 80.containerPort: 90→containerPort: 80.fargate/integ.healthchecks-multiple-network-load-balanced-fargate-serviceROLLBACK_COMPLETE: Exceeded attempts to wait (NotStabilized)— NLB health check failingcontainerPort: 90→80. Addedservice.connections.allowFrom(lb, Port.allTcp())for each NLB.fargate/integ.l3ROLLBACK_COMPLETE: Exceeded attempts to wait (NotStabilized)— NLB health check failingnetworkLoadBalancerWithSecurityGroupByDefaultfeature flag creates an NLB SG with "Disallow all traffic" egress. The NLB patterns don't automatically create ingress/egress rules between the NLB SG and service SG (unlike ALB patterns).nlbFargateService.service.connections.allowFrom(nlbFargateService.loadBalancer, ec2.Port.tcp(80))fargate/integ.l3-autocreateROLLBACK_COMPLETE: Exceeded attempts to wait (NotStabilized)— NLB health check failinginteg.l3.fargate/integ.l3-capacity-provider-strategiesROLLBACK_FAILED: Exceeded attempts to wait (NotStabilized), The specified capacity provider is in usedestroy: { expectError: true }for capacity provider teardown.fargate/integ.l3-vpconlyROLLBACK_COMPLETE: Exceeded attempts to wait (NotStabilized)— NLB health check failinginteg.l3.fargate/integ.multiple-network-load-balanced-fargate-serviceROLLBACK_COMPLETE: Exceeded attempts to wait (NotStabilized)— NLB health check failing on port 90containerPort: 90→80. Addedservice.connections.allowFromfor each NLB.fargate/integ.network-load-balanced-fargate-service-custom-healthROLLBACK_COMPLETE: Exceeded attempts to wait (NotStabilized)— NLB health check failingfargate/integ.runtime-platform-application-load-balanced-fargate-serviceDELETE_FAILED: The Cluster cannot be deleted while Tasks are active (ClusterContainsTasksException)ScheduledFargateTaskkeeps tasks running during stack deletion. Same demo-image Dockerfile issue.destroy: { expectError: true }.fargate/integ.special-listenerROLLBACK_COMPLETE: Exceeded attempts to wait (NotStabilized)— container not startingabiosoft/caddy(Docker Hub, last updated 2019) is unreliable. Container port 2015 not matching.amazon/amazon-ecs-sample(ECR-hosted). ChangedcontainerPort: 2015→80(image listens on 80). KeptlistenerPort: 2015to preserve test intent. Added NLB SG fix.NLB Security Group Fix (detail)
When
@aws-cdk/aws-elasticloadbalancingv2:networkLoadBalancerWithSecurityGroupByDefaultis enabled, NLBs get a security group with"Disallow all traffic"egress. TheNetworkLoadBalancedFargateServiceandNetworkMultipleTargetGroupsFargateServiceconstructs don't automatically create the necessary SG rules (unlike ALB patterns).For Fargate (awsvpc networking):
For EC2 (bridge networking —
service.connectionshas no SG):Skipped Tests (4) — require external resources
ec2/integ.tls-network-load-balanced-ecs-serviceAssemblyError: Subprocess exited with error 1— throws at synth ifCERT_ARNnot setfargate/integ.tls-network-load-balanced-fargate-serviceAssemblyError: Subprocess exited with error 1— throws at synth ifCERT_ARNnot setfargate/integ.alb-fargate-service-httpsDNS Record Set is not available. Certificate is in FAILED statusfromHostedZoneAttributeswith fake ID — requires real Route53 hosted zoneec2/integ.multiple-application-load-balanced-ecs-service-idle-timeoutInvalidDomainNameException - example.com. is reserved by AWS!PublicHostedZonewithexample.comwhich is reserved by AWSDescribe any new or updated permissions being added
None.
Description of how you validated changes
All 16 fixed tests deployed and validated on AWS account 325066840661 across 16 regions:
Checklist
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license