Option to limit max NACKs per lost packet #208

adriancable · 2023-10-01T22:59:28Z

Currently the NACK generator interceptor has no limit on the number of NACKs produced for the same lost packet. If the remote takes a while to resend the packet (e.g. if the RTT is larger than the NACK generator interval), this can result in the remote resending a NACKed packet an unlimited number of times (or at least until sufficient later packets are received that the NACK receive log wraps around). If there's congestion on the link, this will make the congestion even worse. Also, after a certain amount of time, receiving a lost packet is no longer useful (e.g. beyond the length of the jitter buffer on the receive side).

This PR adds a new generator option to limit the maximum number of times a given lost packet is NACKed. So for example, if this option is set to 4, a lost packet is NACKed at most 4 times (so e.g. if the generator interval is 100ms, packets will not be NACKed more than 400ms after first presumed missing). Option default is 0, which is the current behaviour (no NACK limit).

codecov · 2023-10-01T23:01:11Z

Codecov Report

Attention: 20 lines in your changes are missing coverage. Please review.

Files	Coverage Δ
pkg/nack/generator_option.go	`80.00% <0.00%> (-20.00%)`	⬇️
pkg/nack/generator_interceptor.go	`68.21% <55.88%> (-6.79%)`	⬇️

... and 1 file with indirect coverage changes

📢 Thoughts on this report? Let us know!.

boks1971

Looks good @adriancable . Left a couple of comments.

Also, trying to understand how the infinite NACKing happens. Does it happen if the receive log size is small? Even if the RTT is higher, let's say 150ms, the sequence could be

Send NACK at 0 ms
Remote receives it roughly at 75 ms (assuming symmetric delay)
Remote let's say sends it back at 80 ms
Send another NACK at 100 ms
Local receive retransmitted packet at 155 ms.
Local marks receive log that the sequence number is received.
Local checks at 200 ms and that sequence number is not missing any more.
Local would have sent an extra NACK and potentially gotten an extra re-transmission from the remote. As long as receive log is big enough, looks like the infinite NACKing should not happen?

BTW, I think the option to limit number of tries is great and a good addition. Just trying to understand the infinite NACK case.

boks1971 · 2023-10-02T04:29:15Z

pkg/nack/generator_interceptor.go

@@ -147,14 +151,43 @@ func (n *GeneratorInterceptor) loop(rtcpWriter interceptor.RTCPWriter) {

 				for ssrc, receiveLog := range n.receiveLogs {
 					missing := receiveLog.missingSeqNumbers(n.skipLastN)
+
+					if len(missing) == 0 || n.nackCountLogs[ssrc] == nil {
+						n.nackCountLogs[ssrc] = map[uint16]uint16{}


Is the new map a concern on every loop? Majority case is going to be len(missing) == 0. So, re-creating a map for all ssrcs every 100ms. Is that okay?

My understanding is that an empty map is actually nil internally (i.e. no allocation), so I think this is OK.

boks1971 · 2023-10-02T04:40:32Z

pkg/nack/generator_interceptor.go

+						}
+						if !isMissing {
+							delete(n.nackCountLogs[ssrc], nackSeq)
+						}
 					}


Can't think of an easy way to spread this load. Wondering if the N^2 would cause some performance issues. It could happen that a network brown out drops 100s of packets. Multiple SSRCs experiencing at the same time could generate a bunch of NACKs and all of them could be missing while running this loop. Don't think it is a huge concern, but also thinking about some way to amortising it. Maybe, handle only one SSRC for clean up in one loop kind of stuff, but makes code more complicated. Guess, this can be checked for any performance issues and improved later.

Yes, I thought of this too but can't think of an easy improvement.

adriancable · 2023-10-02T05:11:22Z

@boks1971 - here is one scenario we have seen happen. For 'internet' reasons, for several seconds we get significant congestion/loss/delay on the downlink (i.e. from remote sender to us), but everything was OK on the uplink (i.e. from us to the remote). The packet sequence we get is something like 100, 101, 102, 204 then nothing for a couple of seconds. So the NACK interceptor starts sending NACKs for packets 103 to 203 over and over and over. A couple of seconds later we receive dozens of repeated retransmissions of these hundreds of packets. This then causes additional congestion/loss and so we NACK even more packets. Everything settles down eventually (10+s later) but the whole situation can be avoided with a sensible NACK resend limit.

(NB, we actually originally patched the NACK generator to immediately mark missing packets as received, so that at most one NACK is sent per missing packet - this 'solved' the problem but we really wanted to implement this properly and 1 is probably not the right limit for most use cases)

boks1971 · 2023-10-02T05:14:54Z

Got it Adrian. Yeah, multi second brown out can happen and cause that flooding. Thank you for explaining. I think this looks good. Will bonk it. But, might be good to get another pair of 👀 . I will tag @davidzhao also to check if he is available to review.

adriancable · 2023-10-03T18:43:25Z

OK, absent any further comments I will merge this in a couple of days' time.

davidzhao

lg

adriancable requested review from Sean-Der, boks1971 and cnderrauber October 1, 2023 22:59

boks1971 reviewed Oct 2, 2023

View reviewed changes

boks1971 approved these changes Oct 2, 2023

View reviewed changes

boks1971 requested a review from davidzhao October 2, 2023 05:15

Option to limit max NACKs per lost packet

be587db

davidzhao approved these changes Oct 3, 2023

View reviewed changes

adriancable merged commit 0191d54 into pion:master Oct 3, 2023
14 of 15 checks passed

adriancable mentioned this pull request Oct 16, 2023

Do not send empty NACK packets #211

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option to limit max NACKs per lost packet #208

Option to limit max NACKs per lost packet #208

adriancable commented Oct 1, 2023

codecov bot commented Oct 1, 2023 •

edited

Loading

boks1971 left a comment

boks1971 Oct 2, 2023

adriancable Oct 2, 2023

boks1971 Oct 2, 2023

adriancable Oct 2, 2023

adriancable commented Oct 2, 2023

boks1971 commented Oct 2, 2023

adriancable commented Oct 3, 2023

davidzhao left a comment

Option to limit max NACKs per lost packet #208

Option to limit max NACKs per lost packet #208

Conversation

adriancable commented Oct 1, 2023

codecov bot commented Oct 1, 2023 • edited Loading

Codecov Report

boks1971 left a comment

Choose a reason for hiding this comment

boks1971 Oct 2, 2023

Choose a reason for hiding this comment

adriancable Oct 2, 2023

Choose a reason for hiding this comment

boks1971 Oct 2, 2023

Choose a reason for hiding this comment

adriancable Oct 2, 2023

Choose a reason for hiding this comment

adriancable commented Oct 2, 2023

boks1971 commented Oct 2, 2023

adriancable commented Oct 3, 2023

davidzhao left a comment

Choose a reason for hiding this comment

codecov bot commented Oct 1, 2023 •

edited

Loading