[infra] relax ELB health check settings for faucet and insight#726
[infra] relax ELB health check settings for faucet and insight#726vivekgsharma merged 1 commit intov1.0-devfrom
Conversation
WalkthroughUpdated health check configuration values for two AWS Elastic Load Balancer resources. Modified interval from 20 to 60, timeout from 3 to 10, and unhealthy_threshold from 2 to 5 for both the web and insight load balancers. Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
terraform/aws/main.tf (1)
136-142: Approve syntax, but clarify operational implications of slower health detection.The configuration changes are syntactically correct and align with the PR objective. However, this substantially alters the health check responsiveness:
- Old behavior: ~40 seconds to mark unhealthy (2 failures × 20s interval)
- New behavior: ~300 seconds to mark unhealthy (5 failures × 60s interval)
This 7.5× increase in detection latency trades faster failover for reduced false positives during traffic spikes. Ensure that incident response procedures and alerting rules account for this longer remediation window.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
terraform/aws/main.tf(2 hunks)
🔇 Additional comments (1)
terraform/aws/main.tf (1)
180-186: Configuration mirrors web load balancer consistently.The insight ELB's health check updates match the web ELB identically, maintaining configuration symmetry across both resources. This is appropriate since both serve similar purposes (external-facing HTTP/HTTPS endpoints).
Before merging, confirm:
- These changes have been validated in a staging environment under load conditions.
- Runbooks and alerting thresholds have been updated to reflect the ~300-second detection window.
- The 10-second timeout is based on actual p95+ response time metrics for faucet and insight services.
This prevents intermittent 503s during high load or slow responses.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.