-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP pool changes #3582
base: main
Are you sure you want to change the base?
WIP pool changes #3582
Conversation
sqlx-core/src/rt/mod.rs
Outdated
} | ||
|
||
#[cfg(not(feature = "_rt-async-std"))] | ||
missing_rt((duration, f)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When playing around with this PR locally (to see if it fixes an acquire timeout issue, which it unfortunately doesn't), I found that this caused a compile error. I think it should be
missing_rt((duration, f)) | |
missing_rt((deadline, f)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jplatte if you have a solid repro for acquire timeouts, I'd love to add it as a test here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish. It's in the proprietary version of the main work codebase, and somehow only happens w/ hyper 1.0 / axum 0.7. But if other debugging approaches don't work out, I can try the hyper upgrade on the much smaller OSS version of the codebase and reduce from there next week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that Axum does is cancel the handler future if the client disconnects. I wonder if it's triggering a cancellation bug somewhere.
Do you have a before_acquire
callback set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some digging a few weeks back and realized that connections could potentially get stuck in return_to_pool
because there's no timeout: estuary/flow#1676 (comment)
That's a change I was meaning to add to this PR but hadn't gotten to yet. There's a timeout when it goes to close the connection, but no timeout for the task as a whole.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's a cancelation bug. It happens in a test that does a bunch of requests in parallel (50 originally, I can turn it down to 20 and still reliably reproduce the hang, but at 18 it succeded).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the max size of the test pool?
And what's the acquire timeout set at?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, the max size of the pool is exactly 20, and once I use that amount of parallelism it breaks. Tried 19 too and that works. Acquire timeout is 20s, much longer than it takes the test to run to completion with up to 19 parallel requests.
I also tried raising the pool size to 50, exactly same thing: Once the number of parallel requests is at least as big as the pool size, it hangs (until timeout).
Further, I was using a tokio::sync::Barrier
and separate reqwest::Client
s, tokio tasks for the requests to happen as closely together as possible (this test was originally written to catch another race). If I don't make the tasks wait on the barrier before making the request, that seems to already mix things up sufficiently for the test succeed, even at a pool size of 20 and 50 parallel requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found the bug, it has nothing to do with SQLx itself. The test was deadlocking the server in a really weird way (related to the DB pool).
974a2c9
to
a5a4053
Compare
a5a4053
to
3ab3029
Compare
13d1c01
to
e33d23e
Compare
de647bb
to
39f00fa
Compare
c4b38c2
to
5292057
Compare
5292057
to
aacf308
Compare
0f6e085
to
80831e8
Compare
8420b43
to
d671b98
Compare
acquire()
call is cancelled.acquire()
should now be completely cancel-safe.PoolConnector
trait superceding bothbefore_connect
(requested but not yet implemented) andafter_connect
callbacks.Future
, albeit with a'static
requirement for the returnedFuture
(instead ofBoxFuture
).usize
for all connection counts to get rid of weird inconsistencies.Breaking Changes
Pool::set_connect_options()
andget_connect_options()
have been removed. Instead, implement the newPoolConnector
trait (or use a closure) using something likeArc<RwLock<impl ConnectOptions>>
.PoolOptions::after_connect()
has been removed. Instead, implementPoolConnector
(or use a closure), open a connection and then apply any operations necessary.PoolOptions::min_connections()
,PoolOptions::max_connections()
andPool::size()
now useusize
instead ofu32
.Fixes #3513
Fixes #3315
Fixes #3132
Fixes #3117
Fixes #2848