Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not give up on HOST_CANDIDATE / SERVER_REFLEXIVE_CANDIDATE #53

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Crotalus
Copy link
Member

Seems like it gives up HOST_CANDIDATE after one failed connection attempt and then don't try that again until 2 minutes have passed
(same with SERVER_REFLEXIVE_CANDIDATE after 2 attempts)

I wonder why that logic is needed, shouldn't ice4j have the chance to try all methods at the time of reconnection, regardless of earlier attempts? For instance when someone temporarily lost their internet access but got it back.

Copy link
Member

@Sheikah45 Sheikah45 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this was some attempt at assuming the relay connection was the most reliable and other types would block it.

Good catch!

@Geosearchef
Copy link
Member

There is a bug on Windows / some AV causing sockets to be randomly closed when you open to many, which is more likely to occur with multiple network interfaces.

To reduce the number of sockets created, the ice adapter will disable all host candidates after the first try and all reflective candidates after the second try. The only guaranteed method that should work for everybody is the relay connection, which is why that is kept enabled.

Do not merge this unless you want 5% of your player base to be unable to connect.

@Brutus5000
Copy link
Member

Brutus5000 commented Nov 20, 2023

Do not merge this unless you want 5% of your player base to be unable to connect.

Well there's currently a big amount of playerbase who cannot (re)connect right now - even on relay connections - and this was identified as one of the causes and we are looking desperate for causes.

So I get the problem and wonder if there is alternative approaches not to spam to many sockets.
From my understanding the amount of sockets per time comes from roughly

Amount of network adapters (1 candidate set per ip address for HOST, SFRFLX, RELAY)
x (1 HOST + Amount of coturns)
x Reconnect attempts per minute

So alternative ideas:

  • On first retry use the last working pair.
  • We could drop HOST connections on reconnect only if no HOST connections were used before.
  • We could restrict the amount of network adapters by having the user whitelist/blacklist ethernet adapters (can be configure via weird env variable), or maybe auto-blacklist known devices that do not contribute (e.g. Hamachi, VirtualBox or whatever)

@Sheikah45
Copy link
Member

There is a bug on Windows / some AV causing sockets to be randomly closed when you open to many, which is more likely to occur with multiple network interfaces.

To reduce the number of sockets created, the ice adapter will disable all host candidates after the first try and all reflective candidates after the second try. The only guaranteed method that should work for everybody is the relay connection, which is why that is kept enabled.

Do not merge this unless you want 5% of your player base to be unable to connect.

Do you still have a link to some kind of documentation of this bug so that we can look at it and see if there is a better mitigation today.

@Ravandel95
Copy link

Ravandel95 commented Nov 20, 2023

There is a bug on Windows / some AV causing sockets to be randomly closed when you open to many, which is more likely to occur with multiple network interfaces.

To reduce the number of sockets created, the ice adapter will disable all host candidates after the first try and all reflective candidates after the second try. The only guaranteed method that should work for everybody is the relay connection, which is why that is kept enabled.

Do not merge this unless you want 5% of your player base to be unable to connect.

Hey @Geosearchef!

Good to see you around :)

Do you happen to have any idea why (during reconnects) there doesn't seem to be any Relay candidates offered?

Currently indeed it seems to reject srflx (and hosts?) candidate after 1 failed attempt, but from some relatively simple but reproducible reconnection testing, we find that it never comes with any RELAY candidates. Thus we end up with an infinite sequence of empty candidate lists after the first attempt. During initial lobby connection it does have RELAY candidates

I call it Failure mode B.
Here's an example of that:
image

So currently I had the following questions we'd ideally understand to fix reconnecting:

image

@Geosearchef
Copy link
Member

Do you still have a link to some kind of documentation of this bug so that we can look at it and see if there is a better mitigation today.

At the end of ice development, we had issues on a few windows systems that we couldn't explain. The system was randomly closing ice's sockets. This occurred on around 3-10% of windows systems.

I somehow managed to reproduce it on my win7 machine. Investigating the issue with a debugger lead me through the Java stdlib down to its native code. Going through Openjdk code, and watching it using a debugger, the socket close wasn't issued in any part of the JVM but instead occurred after the WinAPI syscall in Windows Code.
The status is literally just "Socket closed", no reason at all.

Temporarily, I could fix the issue by enabling! a VPN connection. Other people have sometimes seen better results by disabling unused network adapters.

By disabling host and srflx, I got a connection in almost all of those instances, which is why this workaround exists.
It regards the relay as the safest, always working option.

What is the current issue? To me it sounds like the problem is with the relay, not the connectivity establishment.

@Brutus5000 One more idea: Instead of completely disabling the fallback mechanism if host succeeded once, how about you do both, first all, if that fails, relay only, if that fails, host/srflx only, if that fails, back to the start.

(@Ravandel95 I will take a look later.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants