Conversation
sqlx-core/src/rt/mod.rs
Outdated
| } | ||
|
|
||
| #[cfg(not(feature = "_rt-async-std"))] | ||
| missing_rt((duration, f)) |
There was a problem hiding this comment.
When playing around with this PR locally (to see if it fixes an acquire timeout issue, which it unfortunately doesn't), I found that this caused a compile error. I think it should be
| missing_rt((duration, f)) | |
| missing_rt((deadline, f)) |
There was a problem hiding this comment.
@jplatte if you have a solid repro for acquire timeouts, I'd love to add it as a test here.
There was a problem hiding this comment.
I wish. It's in the proprietary version of the main work codebase, and somehow only happens w/ hyper 1.0 / axum 0.7. But if other debugging approaches don't work out, I can try the hyper upgrade on the much smaller OSS version of the codebase and reduce from there next week.
There was a problem hiding this comment.
One thing that Axum does is cancel the handler future if the client disconnects. I wonder if it's triggering a cancellation bug somewhere.
Do you have a before_acquire callback set?
There was a problem hiding this comment.
I did some digging a few weeks back and realized that connections could potentially get stuck in return_to_pool because there's no timeout: estuary/flow#1676 (comment)
That's a change I was meaning to add to this PR but hadn't gotten to yet. There's a timeout when it goes to close the connection, but no timeout for the task as a whole.
There was a problem hiding this comment.
I don't think it's a cancelation bug. It happens in a test that does a bunch of requests in parallel (50 originally, I can turn it down to 20 and still reliably reproduce the hang, but at 18 it succeded).
There was a problem hiding this comment.
What's the max size of the test pool?
And what's the acquire timeout set at?
There was a problem hiding this comment.
Hmmm, the max size of the pool is exactly 20, and once I use that amount of parallelism it breaks. Tried 19 too and that works. Acquire timeout is 20s, much longer than it takes the test to run to completion with up to 19 parallel requests.
I also tried raising the pool size to 50, exactly same thing: Once the number of parallel requests is at least as big as the pool size, it hangs (until timeout).
Further, I was using a tokio::sync::Barrier and separate reqwest::Clients, tokio tasks for the requests to happen as closely together as possible (this test was originally written to catch another race). If I don't make the tasks wait on the barrier before making the request, that seems to already mix things up sufficiently for the test succeed, even at a pool size of 20 and 50 parallel requests.
There was a problem hiding this comment.
I found the bug, it has nothing to do with SQLx itself. The test was deadlocking the server in a really weird way (related to the DB pool).
8420b43 to
d671b98
Compare
d671b98 to
656e751
Compare
d500c21 to
2be94a9
Compare
2be94a9 to
c58291b
Compare
|
Just tested this with #2805 and this PR seems to fix that issue 🎉 |
|
Yup, pretty sure it's not flaky. If I test with If I run it with the |
|
The only reason I even checked is this item in your list above:
|
|
It's perfectly possible that this PR doesn't fix all races that can lead to the issue in #2805, but it seems to have at least fixed the race that the reproducer is triggering. |
b6d4757 to
7bb8521
Compare
97c3af6 to
0dd92b4
Compare
d602782 to
0baa295
Compare
acquire()call is cancelled.acquire()should now be completely cancel-safe.PoolConnectortrait superceding bothbefore_connect(requested but not yet implemented) andafter_connectcallbacks.Future, albeit with a'staticrequirement for the returnedFuture(instead ofBoxFuture).usizefor all connection counts to get rid of weird inconsistencies.Breaking Changes
Pool::set_connect_options()andget_connect_options()have been removed. Instead, implement the newPoolConnectortrait (or use a closure) using something likeArc<RwLock<impl ConnectOptions>>.PoolOptions::after_connect()has been removed. Instead, implementPoolConnector(or use a closure), open a connection and then apply any operations necessary.PoolOptions::min_connections(),PoolOptions::max_connections()andPool::size()now useusizeinstead ofu32.Fixes #3513
Fixes #3315
Fixes #3132
Fixes #3117
Fixes #2848