Improve throughput performance #4 (rx defragmentation focus) #754
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In order to improve rx performance we need to reduce the amount of copies that are made. Defragmentation had 3 copies, it was reduced to 1. It could have been reduced to 0 but it would have made the PR much heavier and it would have slowed down non-fragmented data, which is still the main usecase.
There's still an uplift of 25% less time spent recopying data on a 150kB payload in our throughput test.
Now fragment payload are aliased, then recopied in the defragmentation tx_buffer with is then cheaply converted to a rx_buffer before being decoded (vs copy, copy and copy). User can take ownership of this buffer.
Added a state to allocate the buffer only when needed and to mark when it has overflown.
Fixed some UBs and memory leaks.