-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: deadlock when sending ping #193
base: master
Are you sure you want to change the base?
Conversation
@@ -219,7 +219,7 @@ fn write_deadlock() { | |||
// fact that the sum of receive windows of all open streams can easily | |||
// be larger than the send capacity of the connection at any point in time. | |||
// Using such a low capacity here therefore yields a more reproducible test. | |||
let capacity = 1024; | |||
let capacity = 10024; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those changes seem to make the test flaky and the deadlock happens sometimes. The test output when it happens is:
running 1 test
[2024-06-11T08:40:15Z DEBUG yamux::connection] new connection: 6405a268 (Server)
[2024-06-11T08:40:15Z DEBUG yamux::connection] new connection: d58957af (Client)
[2024-06-11T08:40:15Z DEBUG yamux::connection] d58957af: new outbound (Stream d58957af/1) of (Connection d58957af Client (streams 0))
[2024-06-11T08:40:15Z DEBUG yamux::connection::rtt] sending ping 3047298023
[2024-06-11T08:40:15Z DEBUG yamux::connection::rtt] sending ping 3182731544
[2024-06-11T08:40:15Z DEBUG yamux::connection::rtt] received pong 3047298023, estimated round-trip-time 322.416µs
[2024-06-11T08:40:15Z DEBUG yamux::connection::rtt] received pong 3182731544, estimated round-trip-time 564.167µs
[2024-06-11T08:40:15Z DEBUG yamux::connection::stream::flow_control] old window_max: 0.25 mb, new window_max: 0.5 mb
[2024-06-11T08:40:15Z DEBUG yamux::connection::stream::flow_control] old window_max: 0.25 mb, new window_max: 0.5 mb
[2024-06-11T08:40:15Z DEBUG yamux::connection::stream::flow_control] old window_max: 0.5 mb, new window_max: 1 mb
[2024-06-11T08:40:25Z DEBUG yamux::connection::rtt] sending ping 3157641594
[2024-06-11T08:40:25Z DEBUG yamux::connection::rtt] sending ping 1136718128
[2024-06-11T08:40:25Z DEBUG yamux::connection::rtt] received pong 3157641594, estimated round-trip-time 459.708µs
[2024-06-11T08:40:25Z DEBUG yamux::connection::rtt] received pong 1136718128, estimated round-trip-time 145.75µs
[2024-06-11T08:40:25Z DEBUG yamux::connection::stream::flow_control] old window_max: 0.5 mb, new window_max: 1 mb
[2024-06-11T08:40:35Z DEBUG yamux::connection::rtt] sending ping 1970012150
[2024-06-11T08:40:35Z DEBUG yamux::connection::rtt] sending ping 2852812461
[2024-06-11T08:40:35Z DEBUG yamux::connection::rtt] received pong 1970012150, estimated round-trip-time 180.5µs
[2024-06-11T08:40:35Z DEBUG yamux::connection::rtt] received pong 2852812461, estimated round-trip-time 87.458µs
[2024-06-11T08:40:45Z DEBUG yamux::connection::rtt] sending ping 2531287629
[2024-06-11T08:40:45Z DEBUG yamux::connection::rtt] sending ping 1628284672
test write_deadlock has been running for over 60 seconds
Would it also be possible to achieve the same using smaller numbers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using
let msg = vec![1u8; 30 * 1024 * 1024];
let capacity = 102;
pub const DEFAULT_CREDIT: u32 = 25600;
const PING_INTERVAL: Duration = Duration::from_millis(500);
I can usually reproduce it at least once when running 5 times. But not when capacity is 100. Maybe it can give some idea about the cause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can I override DEFAULT_CREDIT
and PING_INTERVAL
for this test? I changed the values where they are defined as I didn't know how to override them in the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how would we benefit from changing those values? DEFAULT_CREDIT
is part of the spec and PING_INTERVAL
is not user configurable.
CC @mxinden |
Can you expand how that would lead to a dead-lock? Once the local node has sent the ping, it would then proceed to reading pending frames. Also who's receive window? |
I don't know much about the internals, but my theory is that when the remote's receive window has been exhausted, we try to send a ping before reading the frame which increases our credit again. If both peers do this at the same time, the deadlock happens as shown here #193 (comment). I couldn't reproduce the deadlock anymore after this change that reads the frame before sending the ping. I apologize, but this PR is mostly empirical, I don't have enough knowledge about the system yet to make a better analysis. I hope it can be helpful somehow. |
any update on this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @diegomrsantos! Thanks for this, LGTM, left a comment.
Can you expand how that would lead to a dead-lock? Once the local node has sent the ping, it would then proceed to reading pending frames. Also who's receive window?
rust-yamux/yamux/src/connection.rs
Lines 393 to 412 in 7063268
if self.socket.poll_ready_unpin(cx).is_ready() { | |
// Note `next_ping` does not register a waker and thus if not called regularly (idle | |
// connection) no ping is sent. This is deliberate as an idle connection does not | |
// need RTT measurements to increase its stream receive window. | |
if let Some(frame) = self.rtt.next_ping() { | |
self.socket.start_send_unpin(frame.into())?; | |
continue; | |
} | |
// Privilege pending `Pong` and `GoAway` `Frame`s | |
// over `Frame`s from the receivers. | |
if let Some(frame) = self | |
.pending_read_frame | |
.take() | |
.or_else(|| self.pending_write_frame.take()) | |
{ | |
self.socket.start_send_unpin(frame)?; | |
continue; | |
} | |
} |
Hi @mxinden it doesn't proceed to read the pending frames because it won't enter the if
bracket as the socket
is not ready, the Ping
has not been sent.
By switching the order we privilege sending Frame
s that clear pending_read_frame
and allow us to make progress reading.
We were already privileging clearing pending_read_frame
s over pending_write_frame
s, this change makes it so that sending a Ping
has a lower priority than the those.
If you agree we can merge this PR.
Thanks
@@ -210,7 +210,7 @@ fn write_deadlock() { | |||
// We make the message to transmit large enough s.t. the "server" | |||
// is forced to start writing (i.e. echoing) the bytes before | |||
// having read the entire payload. | |||
let msg = vec![1u8; 1024 * 1024]; | |||
let msg = vec![1u8; 100024 * 100024]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we maintain the previous values?
I feel only Increasing the flakiness rate isn't much valuable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't mean to merge those changes. Do you happen to know how to test this PR properly?
I am not sure I fully understand the deadlock. It is at the TCP flow-control level?
Prioritizing The optimization should not be prioritized over correct behavior. In other words, if this can indeed lead to a dead-lock, then yes, your change to the order makes sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prioritizing Ping frames is an optimization, leading to better RTT measurements only.
The optimization should not be prioritized over correct behavior. In other words, if this can >indeed lead to a dead-lock, then yes, your change to the order makes sense.
Thanks for elaborating Max, I also spent a couple minutes today trying to reproduce the issue but I could only do it changing DEFAULT_CREDIT
and PING_INTERVAL
which don't say much as the former is part of the spec and the latter is not user configurable.
just increasing the payload on write_deadlock
test I couldn't reproduce the aforementioned issue.
@indietyp do you have a reproducible case for this issue?
I suggest we wait for more conclusive data
@@ -219,7 +219,7 @@ fn write_deadlock() { | |||
// fact that the sum of receive windows of all open streams can easily | |||
// be larger than the send capacity of the connection at any point in time. | |||
// Using such a low capacity here therefore yields a more reproducible test. | |||
let capacity = 1024; | |||
let capacity = 10024; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how would we benefit from changing those values? DEFAULT_CREDIT
is part of the spec and PING_INTERVAL
is not user configurable.
Sadly I don't have a reliable reproducible case for this, I tried or quite a while. I just found that this would semi-reliably happen when using yamux 1.3 - so it's some kind of edge case. I'll try again maybe I'll find a reproducible case, tho I am not filled with confidence 😅 |
@dariusc93 was able to found a reproducible example that I pushed to https://github.com/jxs/rust-libp2p/tree/yamux-deadlock, |
I ran it on macOS and I don't see any echo. Could you confirm this behavior is caused by the deadlock? I'm not entirely sure I'm testing it correctly, but I still can't see any echo after applying the fix in this PR. |
The potential solution proposed here doesn't fix the issue, as the cause doesn't seem related to the ping. The code here https://github.com/jxs/rust-libp2p/blob/yamux-deadlock/examples/stream/src/main.rs successfully reproduces the issue. It happens when the client streams around 500 KB of data to a server which echoes it back to the client. The server decides no window update needs to be sent, as the condition is satisfied, but right after neither the client nor the server has more credits left. It's still unclear why that's the case, but it should probably be addressed in a new PR. Thanks to everyone who helped clarify the issue. |
See libp2p/rust-libp2p#5410 and #112
The problem might be that we try to send a ping before reading pending frames. If the receive window is exhausted, no progress can be made.