-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Option to limit max NACKs per lost packet #208
Conversation
Codecov ReportAttention:
... and 1 file with indirect coverage changes 📢 Thoughts on this report? Let us know!. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good @adriancable . Left a couple of comments.
Also, trying to understand how the infinite NACKing happens. Does it happen if the receive log size is small? Even if the RTT is higher, let's say 150ms, the sequence could be
- Send NACK at 0 ms
- Remote receives it roughly at 75 ms (assuming symmetric delay)
- Remote let's say sends it back at 80 ms
- Send another NACK at 100 ms
- Local receive retransmitted packet at 155 ms.
- Local marks receive log that the sequence number is received.
- Local checks at 200 ms and that sequence number is not missing any more.
Local would have sent an extra NACK and potentially gotten an extra re-transmission from the remote. As long as receive log is big enough, looks like the infinite NACKing should not happen?
BTW, I think the option to limit number of tries is great and a good addition. Just trying to understand the infinite NACK case.
@@ -147,14 +151,43 @@ func (n *GeneratorInterceptor) loop(rtcpWriter interceptor.RTCPWriter) { | |||
|
|||
for ssrc, receiveLog := range n.receiveLogs { | |||
missing := receiveLog.missingSeqNumbers(n.skipLastN) | |||
|
|||
if len(missing) == 0 || n.nackCountLogs[ssrc] == nil { | |||
n.nackCountLogs[ssrc] = map[uint16]uint16{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the new map a concern on every loop? Majority case is going to be len(missing) == 0
. So, re-creating a map for all ssrcs every 100ms. Is that okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that an empty map is actually nil internally (i.e. no allocation), so I think this is OK.
} | ||
if !isMissing { | ||
delete(n.nackCountLogs[ssrc], nackSeq) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't think of an easy way to spread this load. Wondering if the N^2 would cause some performance issues. It could happen that a network brown out drops 100s of packets. Multiple SSRCs experiencing at the same time could generate a bunch of NACKs and all of them could be missing while running this loop. Don't think it is a huge concern, but also thinking about some way to amortising it. Maybe, handle only one SSRC for clean up in one loop kind of stuff, but makes code more complicated. Guess, this can be checked for any performance issues and improved later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I thought of this too but can't think of an easy improvement.
@boks1971 - here is one scenario we have seen happen. For 'internet' reasons, for several seconds we get significant congestion/loss/delay on the downlink (i.e. from remote sender to us), but everything was OK on the uplink (i.e. from us to the remote). The packet sequence we get is something like 100, 101, 102, 204 then nothing for a couple of seconds. So the NACK interceptor starts sending NACKs for packets 103 to 203 over and over and over. A couple of seconds later we receive dozens of repeated retransmissions of these hundreds of packets. This then causes additional congestion/loss and so we NACK even more packets. Everything settles down eventually (10+s later) but the whole situation can be avoided with a sensible NACK resend limit. (NB, we actually originally patched the NACK generator to immediately mark missing packets as received, so that at most one NACK is sent per missing packet - this 'solved' the problem but we really wanted to implement this properly and 1 is probably not the right limit for most use cases) |
Got it Adrian. Yeah, multi second brown out can happen and cause that flooding. Thank you for explaining. I think this looks good. Will bonk it. But, might be good to get another pair of 👀 . I will tag @davidzhao also to check if he is available to review. |
OK, absent any further comments I will merge this in a couple of days' time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lg
Currently the NACK generator interceptor has no limit on the number of NACKs produced for the same lost packet. If the remote takes a while to resend the packet (e.g. if the RTT is larger than the NACK generator interval), this can result in the remote resending a NACKed packet an unlimited number of times (or at least until sufficient later packets are received that the NACK receive log wraps around). If there's congestion on the link, this will make the congestion even worse. Also, after a certain amount of time, receiving a lost packet is no longer useful (e.g. beyond the length of the jitter buffer on the receive side).
This PR adds a new generator option to limit the maximum number of times a given lost packet is NACKed. So for example, if this option is set to 4, a lost packet is NACKed at most 4 times (so e.g. if the generator interval is 100ms, packets will not be NACKed more than 400ms after first presumed missing). Option default is 0, which is the current behaviour (no NACK limit).