-
Notifications
You must be signed in to change notification settings - Fork 748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SubscribeSocket does not reconnect on network reconnection #845
Comments
Which side is the listener and which is connect? If you can, try to make the publisher be the connector and the subscriber the listener. You can also try and enable TcpKeepAlive, it might also solve it. |
Bottom line, I'm not sure it is NetMQ problem, but TCP. Tcp keepalive is off by default. I'm working on ZMTPv3 PR, it should be merged in a couple of days, I will make the heartbeat commit now. If you can test it and see if the problem is solved that will be great. |
Thanks @somdoron, the publisher binds, the subscriber connects, since i have 1 server and many occasionally connected clients that all need to be notified of some events. If you see my issue on AsyncIO somdoron/AsyncIO#35, you will see that i diagnosed that GetQueuedCompletionStatusEx never ever ever (it's not a timeout, i waited 10 minutes) returns 😄 And it happens only on some computers, not in all 😭 I'll try new bits as soon as they are released ! |
Yes, I saw the issue with AsyncIO, I'm not sure it is related. Give me 10 minutes |
@valeriob can you check with the following PR: You need to enable HeartbeatInterval, check out the socket option: Also checkout the test: Both publisher and subscriber need the new version. |
Thanks @somdoron, i tried that PR, but when i simulated bad network condition i got an unhandled exception on a timer, i'll be able to reproduce the problem on monday. |
I think I fixed the issue, please check: |
Thanks @somdoron , now i get this exception : |
Oops, I think I fixed that now: I will try to simulate a broken network later today. |
I think I fixed that. You can check again |
@valeriob any updates? it is now part of master. If you can confirm that it works I will release a beta version to nuget. |
:( |
I was able to reproduce and fix: Branch: |
Any chance you are receiving from multiple sockets? |
Yes, it's possible i tried to restart the publisher after the crash to see if the client would recover. |
As far as i can see, protocol v3 looks like more complicated to implement correctly, i'll keep helping with the test, but what do you think to take a look at what @wmjordan said in this issue ? somdoron/AsyncIO#35 i guess it will benefit many ppl. |
Only the heartbeat is a bit complicated. Anyway, I don't think it is AsyncIO issue, but tcp by design thing. Tcp doesn't has heart beat (not by default at least), so if connection is closed ungracefully the other side won't know about it until it try to send. This is why we need heartbeat. Can you share the code you are using for testing the pub and sub? |
Ofc, i'll extract the bits, maybe i'm missusing something. |
there it is https://github.com/valeriob/NetMqNetworkFailures |
Hi, |
Hi, Valerio |
I have experienced the same issue. Then I decided to test the same with https://github.com/zeromq/clrzmq4 and got the same behavior. Seems this is by design on zmq side, it just stops reconnection after a while. Then I tried to add manual reconnections, my hope was to use Monitor feature to intercept Disconnected event. But the thing is that I don't get this event, I get just Connected events. I try something like this:
So I have no idea how to reconnect the sub socket, I don't even see any flags like State on the socket so that I can just check it in a cycle. |
Environment
I created a console application that connect to a server via SubscribeSocket, i run the program with wifi ON, i start receiving messages. If i disable wifi the messages stop. If i enable wifi within ~10 seconds, the messages start coming again, if i wait more no more message ever come.
On some computers with the same software the problem does not manifest itself, i investigated the problem here, but i do not know how to work around it :( somdoron/AsyncIO#35
We precisely chose netmq for the network resiliency since the application will work with intermittent wifi connection 😢
The text was updated successfully, but these errors were encountered: