Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: deadlock when sending ping #193

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

diegomrsantos
Copy link

@diegomrsantos diegomrsantos commented Jun 11, 2024

See libp2p/rust-libp2p#5410 and #112

The problem might be that we try to send a ping before reading pending frames. If the receive window is exhausted, no progress can be made.

@@ -219,7 +219,7 @@ fn write_deadlock() {
// fact that the sum of receive windows of all open streams can easily
// be larger than the send capacity of the connection at any point in time.
// Using such a low capacity here therefore yields a more reproducible test.
let capacity = 1024;
let capacity = 10024;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those changes seem to make the test flaky and the deadlock happens sometimes. The test output when it happens is:

running 1 test
[2024-06-11T08:40:15Z DEBUG yamux::connection] new connection: 6405a268 (Server)
[2024-06-11T08:40:15Z DEBUG yamux::connection] new connection: d58957af (Client)
[2024-06-11T08:40:15Z DEBUG yamux::connection] d58957af: new outbound (Stream d58957af/1) of (Connection d58957af Client (streams 0))
[2024-06-11T08:40:15Z DEBUG yamux::connection::rtt] sending ping 3047298023
[2024-06-11T08:40:15Z DEBUG yamux::connection::rtt] sending ping 3182731544
[2024-06-11T08:40:15Z DEBUG yamux::connection::rtt] received pong 3047298023, estimated round-trip-time 322.416µs
[2024-06-11T08:40:15Z DEBUG yamux::connection::rtt] received pong 3182731544, estimated round-trip-time 564.167µs
[2024-06-11T08:40:15Z DEBUG yamux::connection::stream::flow_control] old window_max: 0.25 mb, new window_max: 0.5 mb
[2024-06-11T08:40:15Z DEBUG yamux::connection::stream::flow_control] old window_max: 0.25 mb, new window_max: 0.5 mb
[2024-06-11T08:40:15Z DEBUG yamux::connection::stream::flow_control] old window_max: 0.5 mb, new window_max: 1 mb
[2024-06-11T08:40:25Z DEBUG yamux::connection::rtt] sending ping 3157641594
[2024-06-11T08:40:25Z DEBUG yamux::connection::rtt] sending ping 1136718128
[2024-06-11T08:40:25Z DEBUG yamux::connection::rtt] received pong 3157641594, estimated round-trip-time 459.708µs
[2024-06-11T08:40:25Z DEBUG yamux::connection::rtt] received pong 1136718128, estimated round-trip-time 145.75µs
[2024-06-11T08:40:25Z DEBUG yamux::connection::stream::flow_control] old window_max: 0.5 mb, new window_max: 1 mb
[2024-06-11T08:40:35Z DEBUG yamux::connection::rtt] sending ping 1970012150
[2024-06-11T08:40:35Z DEBUG yamux::connection::rtt] sending ping 2852812461
[2024-06-11T08:40:35Z DEBUG yamux::connection::rtt] received pong 1970012150, estimated round-trip-time 180.5µs
[2024-06-11T08:40:35Z DEBUG yamux::connection::rtt] received pong 2852812461, estimated round-trip-time 87.458µs
[2024-06-11T08:40:45Z DEBUG yamux::connection::rtt] sending ping 2531287629
[2024-06-11T08:40:45Z DEBUG yamux::connection::rtt] sending ping 1628284672
test write_deadlock has been running for over 60 seconds

Would it also be possible to achieve the same using smaller numbers?

Copy link
Author

@diegomrsantos diegomrsantos Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using

let msg = vec![1u8; 30 * 1024 * 1024];
let capacity = 102;
pub const DEFAULT_CREDIT: u32 = 25600;
const PING_INTERVAL: Duration = Duration::from_millis(500);

I can usually reproduce it at least once when running 5 times. But not when capacity is 100. Maybe it can give some idea about the cause.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can I override DEFAULT_CREDIT and PING_INTERVAL for this test? I changed the values where they are defined as I didn't know how to override them in the test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how would we benefit from changing those values? DEFAULT_CREDIT is part of the spec and PING_INTERVAL is not user configurable.

@diegomrsantos diegomrsantos marked this pull request as draft June 11, 2024 09:44
@diegomrsantos diegomrsantos changed the title fix: deadlock fix: deadlock when sending ping Jun 11, 2024
@diegomrsantos diegomrsantos marked this pull request as ready for review June 12, 2024 17:07
@dariusc93
Copy link
Member

CC @mxinden

@mxinden
Copy link
Member

mxinden commented Jul 1, 2024

The problem might be that we try to send a ping before reading pending frames. If the receive window is exhausted, no progress can be made.

Can you expand how that would lead to a dead-lock? Once the local node has sent the ping, it would then proceed to reading pending frames. Also who's receive window?

@diegomrsantos
Copy link
Author

diegomrsantos commented Jul 1, 2024

The problem might be that we try to send a ping before reading pending frames. If the receive window is exhausted, no progress can be made.

Can you expand how that would lead to a dead-lock? Once the local node has sent the ping, it would then proceed to reading pending frames. Also who's receive window?

I don't know much about the internals, but my theory is that when the remote's receive window has been exhausted, we try to send a ping before reading the frame which increases our credit again. If both peers do this at the same time, the deadlock happens as shown here #193 (comment). I couldn't reproduce the deadlock anymore after this change that reads the frame before sending the ping. I apologize, but this PR is mostly empirical, I don't have enough knowledge about the system yet to make a better analysis. I hope it can be helpful somehow.

@dariusc93
Copy link
Member

any update on this?

CC @diegomrsantos

@dariusc93 dariusc93 mentioned this pull request Oct 23, 2024
Copy link
Member

@jxs jxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @diegomrsantos! Thanks for this, LGTM, left a comment.

Can you expand how that would lead to a dead-lock? Once the local node has sent the ping, it would then proceed to reading pending frames. Also who's receive window?

if self.socket.poll_ready_unpin(cx).is_ready() {
// Note `next_ping` does not register a waker and thus if not called regularly (idle
// connection) no ping is sent. This is deliberate as an idle connection does not
// need RTT measurements to increase its stream receive window.
if let Some(frame) = self.rtt.next_ping() {
self.socket.start_send_unpin(frame.into())?;
continue;
}
// Privilege pending `Pong` and `GoAway` `Frame`s
// over `Frame`s from the receivers.
if let Some(frame) = self
.pending_read_frame
.take()
.or_else(|| self.pending_write_frame.take())
{
self.socket.start_send_unpin(frame)?;
continue;
}
}

Hi @mxinden it doesn't proceed to read the pending frames because it won't enter the if bracket as the socket is not ready, the Ping has not been sent.

By switching the order we privilege sending Frames that clear pending_read_frame and allow us to make progress reading.
We were already privileging clearing pending_read_frames over pending_write_frames, this change makes it so that sending a Ping has a lower priority than the those.
If you agree we can merge this PR.

Thanks

@@ -210,7 +210,7 @@ fn write_deadlock() {
// We make the message to transmit large enough s.t. the "server"
// is forced to start writing (i.e. echoing) the bytes before
// having read the entire payload.
let msg = vec![1u8; 1024 * 1024];
let msg = vec![1u8; 100024 * 100024];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we maintain the previous values?
I feel only Increasing the flakiness rate isn't much valuable

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't mean to merge those changes. Do you happen to know how to test this PR properly?

@mxinden
Copy link
Member

mxinden commented Oct 27, 2024

I am not sure I fully understand the deadlock. It is at the TCP flow-control level?

We were already privileging clearing pending_read_frames over pending_write_frames, this change makes it so that sending a Ping has a lower priority than the those.

Prioritizing Ping frames is an optimization, leading to better RTT measurements only.

The optimization should not be prioritized over correct behavior. In other words, if this can indeed lead to a dead-lock, then yes, your change to the order makes sense.

Copy link
Member

@jxs jxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prioritizing Ping frames is an optimization, leading to better RTT measurements only.

The optimization should not be prioritized over correct behavior. In other words, if this can >indeed lead to a dead-lock, then yes, your change to the order makes sense.

Thanks for elaborating Max, I also spent a couple minutes today trying to reproduce the issue but I could only do it changing DEFAULT_CREDITand PING_INTERVAL which don't say much as the former is part of the spec and the latter is not user configurable.

just increasing the payload on write_deadlock test I couldn't reproduce the aforementioned issue.
@indietyp do you have a reproducible case for this issue?
I suggest we wait for more conclusive data

@@ -219,7 +219,7 @@ fn write_deadlock() {
// fact that the sum of receive windows of all open streams can easily
// be larger than the send capacity of the connection at any point in time.
// Using such a low capacity here therefore yields a more reproducible test.
let capacity = 1024;
let capacity = 10024;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how would we benefit from changing those values? DEFAULT_CREDIT is part of the spec and PING_INTERVAL is not user configurable.

@indietyp
Copy link

Sadly I don't have a reliable reproducible case for this, I tried or quite a while.

I just found that this would semi-reliably happen when using yamux 1.3 - so it's some kind of edge case.

I'll try again maybe I'll find a reproducible case, tho I am not filled with confidence 😅

@jxs
Copy link
Member

jxs commented Nov 30, 2024

@dariusc93 was able to found a reproducible example that I pushed to https://github.com/jxs/rust-libp2p/tree/yamux-deadlock,
to reproduce it, one has to checkout that branch, yamux-deadlock and run the stream example, both the listener instance and the connecting instance.
Contrary to the example on master the one on yamux-deadlock eventually stops echoing.

@diegomrsantos
Copy link
Author

@dariusc93 was able to found a reproducible example that I pushed to https://github.com/jxs/rust-libp2p/tree/yamux-deadlock, to reproduce it, one has to checkout that branch, yamux-deadlock and run the stream example, both the listener instance and the connecting instance. Contrary to the example on master the one on yamux-deadlock eventually stops echoing.

I ran it on macOS and I don't see any echo. Could you confirm this behavior is caused by the deadlock? I'm not entirely sure I'm testing it correctly, but I still can't see any echo after applying the fix in this PR.

@diegomrsantos
Copy link
Author

diegomrsantos commented Dec 5, 2024

The potential solution proposed here doesn't fix the issue, as the cause doesn't seem related to the ping.

The code here https://github.com/jxs/rust-libp2p/blob/yamux-deadlock/examples/stream/src/main.rs successfully reproduces the issue. It happens when the client streams around 500 KB of data to a server which echoes it back to the client.
After a deeper investigation, @jxs and I found the cause seems to be related to flow control, specifically these lines https://github.com/diegomrsantos/rust-yamux/blob/master/yamux/src/connection/stream/flow_control.rs#L52:L54.

The server decides no window update needs to be sent, as the condition is satisfied, but right after neither the client nor the server has more credits left.

It's still unclear why that's the case, but it should probably be addressed in a new PR. Thanks to everyone who helped clarify the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants