Do not give up on HOST_CANDIDATE / SERVER_REFLEXIVE_CANDIDATE #53

Crotalus · 2023-11-20T07:25:32Z

Seems like it gives up HOST_CANDIDATE after one failed connection attempt and then don't try that again until 2 minutes have passed
(same with SERVER_REFLEXIVE_CANDIDATE after 2 attempts)

I wonder why that logic is needed, shouldn't ice4j have the chance to try all methods at the time of reconnection, regardless of earlier attempts? For instance when someone temporarily lost their internet access but got it back.

Sheikah45

I guess this was some attempt at assuming the relay connection was the most reliable and other types would block it.

Good catch!

Geosearchef · 2023-11-20T11:08:06Z

There is a bug on Windows / some AV causing sockets to be randomly closed when you open to many, which is more likely to occur with multiple network interfaces.

To reduce the number of sockets created, the ice adapter will disable all host candidates after the first try and all reflective candidates after the second try. The only guaranteed method that should work for everybody is the relay connection, which is why that is kept enabled.

Do not merge this unless you want 5% of your player base to be unable to connect.

Brutus5000 · 2023-11-20T12:18:26Z

Do not merge this unless you want 5% of your player base to be unable to connect.

Well there's currently a big amount of playerbase who cannot (re)connect right now - even on relay connections - and this was identified as one of the causes and we are looking desperate for causes.

So I get the problem and wonder if there is alternative approaches not to spam to many sockets.
From my understanding the amount of sockets per time comes from roughly

Amount of network adapters (1 candidate set per ip address for HOST, SFRFLX, RELAY)
x (1 HOST + Amount of coturns)
x Reconnect attempts per minute

So alternative ideas:

On first retry use the last working pair.
We could drop HOST connections on reconnect only if no HOST connections were used before.
We could restrict the amount of network adapters by having the user whitelist/blacklist ethernet adapters (can be configure via weird env variable), or maybe auto-blacklist known devices that do not contribute (e.g. Hamachi, VirtualBox or whatever)

Sheikah45 · 2023-11-20T13:39:36Z

There is a bug on Windows / some AV causing sockets to be randomly closed when you open to many, which is more likely to occur with multiple network interfaces.

To reduce the number of sockets created, the ice adapter will disable all host candidates after the first try and all reflective candidates after the second try. The only guaranteed method that should work for everybody is the relay connection, which is why that is kept enabled.

Do not merge this unless you want 5% of your player base to be unable to connect.

Do you still have a link to some kind of documentation of this bug so that we can look at it and see if there is a better mitigation today.

Ravandel95 · 2023-11-20T18:37:42Z

There is a bug on Windows / some AV causing sockets to be randomly closed when you open to many, which is more likely to occur with multiple network interfaces.

To reduce the number of sockets created, the ice adapter will disable all host candidates after the first try and all reflective candidates after the second try. The only guaranteed method that should work for everybody is the relay connection, which is why that is kept enabled.

Do not merge this unless you want 5% of your player base to be unable to connect.

Hey @Geosearchef!

Good to see you around :)

Do you happen to have any idea why (during reconnects) there doesn't seem to be any Relay candidates offered?

Currently indeed it seems to reject srflx (and hosts?) candidate after 1 failed attempt, but from some relatively simple but reproducible reconnection testing, we find that it never comes with any RELAY candidates. Thus we end up with an infinite sequence of empty candidate lists after the first attempt. During initial lobby connection it does have RELAY candidates

I call it Failure mode B.
Here's an example of that:

So currently I had the following questions we'd ideally understand to fix reconnecting:

Geosearchef · 2023-11-20T19:20:05Z

Do you still have a link to some kind of documentation of this bug so that we can look at it and see if there is a better mitigation today.

At the end of ice development, we had issues on a few windows systems that we couldn't explain. The system was randomly closing ice's sockets. This occurred on around 3-10% of windows systems.

I somehow managed to reproduce it on my win7 machine. Investigating the issue with a debugger lead me through the Java stdlib down to its native code. Going through Openjdk code, and watching it using a debugger, the socket close wasn't issued in any part of the JVM but instead occurred after the WinAPI syscall in Windows Code.
The status is literally just "Socket closed", no reason at all.

Temporarily, I could fix the issue by enabling! a VPN connection. Other people have sometimes seen better results by disabling unused network adapters.

By disabling host and srflx, I got a connection in almost all of those instances, which is why this workaround exists.
It regards the relay as the safest, always working option.

What is the current issue? To me it sounds like the problem is with the relay, not the connectivity establishment.

@Brutus5000 One more idea: Instead of completely disabling the fallback mechanism if host succeeded once, how about you do both, first all, if that fails, relay only, if that fails, host/srflx only, if that fails, back to the start.

(@Ravandel95 I will take a look later.)

Do not give up on HOST_CANDIDATE / SERVER_REFLEXIVE_CANDIDATE

14799e8

Sheikah45 approved these changes Nov 20, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not give up on HOST_CANDIDATE / SERVER_REFLEXIVE_CANDIDATE #53

Do not give up on HOST_CANDIDATE / SERVER_REFLEXIVE_CANDIDATE #53

Crotalus commented Nov 20, 2023

Sheikah45 left a comment

Geosearchef commented Nov 20, 2023

Brutus5000 commented Nov 20, 2023 •

edited

Loading

Sheikah45 commented Nov 20, 2023

Ravandel95 commented Nov 20, 2023 •

edited

Loading

Geosearchef commented Nov 20, 2023

Do not give up on HOST_CANDIDATE / SERVER_REFLEXIVE_CANDIDATE #53

Are you sure you want to change the base?

Do not give up on HOST_CANDIDATE / SERVER_REFLEXIVE_CANDIDATE #53

Conversation

Crotalus commented Nov 20, 2023

Sheikah45 left a comment

Choose a reason for hiding this comment

Geosearchef commented Nov 20, 2023

Brutus5000 commented Nov 20, 2023 • edited Loading

Sheikah45 commented Nov 20, 2023

Ravandel95 commented Nov 20, 2023 • edited Loading

Geosearchef commented Nov 20, 2023

Brutus5000 commented Nov 20, 2023 •

edited

Loading

Ravandel95 commented Nov 20, 2023 •

edited

Loading