Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to limit max NACKs per lost packet #208

Merged
merged 1 commit into from
Oct 3, 2023
Merged

Option to limit max NACKs per lost packet #208

merged 1 commit into from
Oct 3, 2023

Conversation

adriancable
Copy link
Contributor

Currently the NACK generator interceptor has no limit on the number of NACKs produced for the same lost packet. If the remote takes a while to resend the packet (e.g. if the RTT is larger than the NACK generator interval), this can result in the remote resending a NACKed packet an unlimited number of times (or at least until sufficient later packets are received that the NACK receive log wraps around). If there's congestion on the link, this will make the congestion even worse. Also, after a certain amount of time, receiving a lost packet is no longer useful (e.g. beyond the length of the jitter buffer on the receive side).

This PR adds a new generator option to limit the maximum number of times a given lost packet is NACKed. So for example, if this option is set to 4, a lost packet is NACKed at most 4 times (so e.g. if the generator interval is 100ms, packets will not be NACKed more than 400ms after first presumed missing). Option default is 0, which is the current behaviour (no NACK limit).

@codecov
Copy link

codecov bot commented Oct 1, 2023

Codecov Report

Attention: 20 lines in your changes are missing coverage. Please review.

Files Coverage Δ
pkg/nack/generator_option.go 80.00% <0.00%> (-20.00%) ⬇️
pkg/nack/generator_interceptor.go 68.21% <55.88%> (-6.79%) ⬇️

... and 1 file with indirect coverage changes

📢 Thoughts on this report? Let us know!.

Copy link
Contributor

@boks1971 boks1971 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @adriancable . Left a couple of comments.

Also, trying to understand how the infinite NACKing happens. Does it happen if the receive log size is small? Even if the RTT is higher, let's say 150ms, the sequence could be

  1. Send NACK at 0 ms
  2. Remote receives it roughly at 75 ms (assuming symmetric delay)
  3. Remote let's say sends it back at 80 ms
  4. Send another NACK at 100 ms
  5. Local receive retransmitted packet at 155 ms.
  6. Local marks receive log that the sequence number is received.
  7. Local checks at 200 ms and that sequence number is not missing any more.
    Local would have sent an extra NACK and potentially gotten an extra re-transmission from the remote. As long as receive log is big enough, looks like the infinite NACKing should not happen?

BTW, I think the option to limit number of tries is great and a good addition. Just trying to understand the infinite NACK case.

@@ -147,14 +151,43 @@ func (n *GeneratorInterceptor) loop(rtcpWriter interceptor.RTCPWriter) {

for ssrc, receiveLog := range n.receiveLogs {
missing := receiveLog.missingSeqNumbers(n.skipLastN)

if len(missing) == 0 || n.nackCountLogs[ssrc] == nil {
n.nackCountLogs[ssrc] = map[uint16]uint16{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the new map a concern on every loop? Majority case is going to be len(missing) == 0. So, re-creating a map for all ssrcs every 100ms. Is that okay?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that an empty map is actually nil internally (i.e. no allocation), so I think this is OK.

}
if !isMissing {
delete(n.nackCountLogs[ssrc], nackSeq)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't think of an easy way to spread this load. Wondering if the N^2 would cause some performance issues. It could happen that a network brown out drops 100s of packets. Multiple SSRCs experiencing at the same time could generate a bunch of NACKs and all of them could be missing while running this loop. Don't think it is a huge concern, but also thinking about some way to amortising it. Maybe, handle only one SSRC for clean up in one loop kind of stuff, but makes code more complicated. Guess, this can be checked for any performance issues and improved later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I thought of this too but can't think of an easy improvement.

@adriancable
Copy link
Contributor Author

@boks1971 - here is one scenario we have seen happen. For 'internet' reasons, for several seconds we get significant congestion/loss/delay on the downlink (i.e. from remote sender to us), but everything was OK on the uplink (i.e. from us to the remote). The packet sequence we get is something like 100, 101, 102, 204 then nothing for a couple of seconds. So the NACK interceptor starts sending NACKs for packets 103 to 203 over and over and over. A couple of seconds later we receive dozens of repeated retransmissions of these hundreds of packets. This then causes additional congestion/loss and so we NACK even more packets. Everything settles down eventually (10+s later) but the whole situation can be avoided with a sensible NACK resend limit.

(NB, we actually originally patched the NACK generator to immediately mark missing packets as received, so that at most one NACK is sent per missing packet - this 'solved' the problem but we really wanted to implement this properly and 1 is probably not the right limit for most use cases)

@boks1971
Copy link
Contributor

boks1971 commented Oct 2, 2023

Got it Adrian. Yeah, multi second brown out can happen and cause that flooding. Thank you for explaining. I think this looks good. Will bonk it. But, might be good to get another pair of 👀 . I will tag @davidzhao also to check if he is available to review.

@boks1971 boks1971 requested a review from davidzhao October 2, 2023 05:15
@adriancable
Copy link
Contributor Author

OK, absent any further comments I will merge this in a couple of days' time.

Copy link
Member

@davidzhao davidzhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lg

@adriancable adriancable merged commit 0191d54 into pion:master Oct 3, 2023
14 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants