fix(nginx): add resolver and upstream resolve to prevent stale IP rou…#4295
fix(nginx): add resolver and upstream resolve to prevent stale IP rou…#4295ilias-115 wants to merge 1 commit intogetsentry:masterfrom
Conversation
|
This has been discussed in #4079 and has been decided not to add this then. |
|
I see the workaround with the I get the point with the DNS server which can vary depending on host, IMO we have 2 ways:
|
This PR comes from a real production incident on our side, not from a theoretical optimization. According to Docker docs, For DNS portability, we can avoid hardcoding by retrieving resolver IPs from References: |
|
Sorry, I've mentioned the wrong pull request.
Using DNS servers from
Simply bad practice, you should use
Agreed, but as I've stated in #3894, the real fix would be to understand why
Answered above, it does not work.
You're correct. Current But I'm not a maintainer of this project and I only state my opinions here, my word is not final :) @aldy505 What do you think? |
Thanks for the detailed feedback — this makes sense. Just to clarify one point: when I mentioned retrieving DNS from Also, we hit this not only after crashes, but during normal operations too (VM reboot / Docker daemon restart). So this is not only a “fix crash root cause” topic — it is also an upstream DNS resilience issue for expected runtime events where container IPs can change. I agree root-cause work is still needed, but it does not remove the need for robust upstream re-resolution in nginx. If changing default behavior is out of scope, I’m happy to re-scope this as:
|
Why
In the default self-hosted Docker Compose deployment, nginx proxies traffic to
the
relayandwebservices by hostname.When those containers are recreated and receive a new container IP, nginx may
continue using a stale upstream address until nginx itself is reloaded or
restarted.
Observed symptom:
connect() failed (...) while connecting to upstreamWhat changed
nginx.confrelay:3000 resolveweb:9000 resolvezonedirectives required by nginx for dynamic upstream resolutionConfig details
resolver 127.0.0.11 valid=30s;resolver_timeout 5s;zone relay 64k;zone sentry 64k;Why these values
127.0.0.11: Docker embedded DNS in the default self-hosted container networkvalid=30s: balances recovery time and DNS lookup overheadresolver_timeout=5s: avoids long stalls on DNS issues64k zone: sufficient for these small upstream groupsTest plan
relayandwebrelay, nginx no longer keeps using a stale upstream IPNotes
This change is intended to make nginx more resilient to backend container IP
changes in the default self-hosted Docker Compose setup.
Legal Boilerplate
Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. and is gonna need some rights from me in order to utilize my contributions in this here PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.