-
-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Erratic behaviour with SSL #173
Comments
TL;DR The problem is in pion/turn and we won't be able to address it in STUNner Analysis. The problem seems to come from pion/turn, the open-source TURN implementation that underlies STUNner. The issue itself is related to a fundamental impedance mismatch between TURN, which is message based, and TCP/TLS, which provides a bytestream abstraction. Mapping between the two can be done, but I think in this case this is somehow creating this problem. Plus TLS. Or the cloud provider. Or the load balancer. There are lots of moving parts here. Anyway. The layer that is responsible for mapping the TLS/TCP bytestream to the message based semantics of TURN is called STUNConn. There are various heuristics implemented here that let the TURN server to consume the TCP/TLS bytestream per message: we read the stream into an internal buffer until we can form a full TURN message (this is known from the TURN or the ChannelData header message length field) and then return the message and truncate the internal buffer. The error itself is triggered from the TURN server readloop, after trying to decode a TURN message using the encoder/decoder of pion/stun (TURN is a glorified version STUN that uses the same message header structure, hence the call into pion/stun). Now the first thing that the decoder does is that it checks whether it has enough bytes to decode the STUN/TURN header, which is supposed to be 20 bytes, and if not, we get the above error. Now it seems to me that STUNConn somehow returns an incomplete TURN message that then triggers this error in the message decoder, but it is impossible to know at this point how and why this can happen. Can you post a full |
Hey @rg0now , Thanks a lot for the detailed explanation. While performing additional tests, I realized that I'm getting a very different behaviour if I'm doing the tests in Firefox or in Chrome. So I start to believe there are many issues mixed here. First of all, the video of my initial bug report was done with Firefox. I can reproduce the same behaviour 100% of the time. I enabled the full First of all, I see the "not enough bytes to read header" message all the time (even when I'm doing no tests). So I'm starting to wonder if the message might not originate from some liveness probe or something like that. Here are some logs: Logs
As you can see, it always receives just one byte. And this is me doing nothing (not trying to connect in any way). When testing with Firefox, and when Firefox does not return a relay server, I see this in the logs: Logs
When testing with Firefox, and when Firefox does return a relay server, I see this in the logs: Logs
Surprisingly, I don't see anything related to TCP in the logs. When testing in Chrome, I see this in the logs: Logs
|
Description
Connection with turns seems to work randomly in TrickleICE.
I'm trying to setup Stunner as a replacement for my Coturn server.
When trying to set up SSL, in Trickle ICE, I see the relay server appearing only some times (maybe one time for every 3 tries).
Here is the video of what I see:
Capture.video.du.2024-10-30.11-58-38.mp4
This happens only for TCP/SSL connections.
If I try to connect via UDP, it works every time (so I know my credentials are OK)
Steps to Reproduce
I'm deploying on a OVH managed cluster. They are using Octavia load balancers.
Expected behavior: Relay server is reachable on every try.
Actual behavior: Relay server is reachable only sometimes
I'm not sure what to do from here? Any idea how I could troubleshoot this further?
BTW, thanks for this awesome project!
Versions
v0.21.0
Info
Gateway API status
Gateway settings
Operator logs
Operator logs
Of interest: in the stunner logs, I regularly see messages like this one:
The text was updated successfully, but these errors were encountered: