-
Notifications
You must be signed in to change notification settings - Fork 255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rumqttc: Outgoing publications are sent one by one #810
Comments
Hey @flxo , so basically we want to buffer the outgoing packets and then flush / write them to network at once to avoid multiple syscall, correct?
note: I haven't dived in depth in poc but i think easier solution might be:
o = Self::next_requests(
..
), if .. => match o {
Ok(requests) => {
for req in requests {
self.state.handle_outgoing_packet(req)?;
}
match time::timeout(network_timeout, network.flush(&mut self.state.write)).await {
Ok(inner) => inner?,
Err(_)=> return Err(ConnectionError::FlushTimeout),
};
Ok(self.state.events.pop_front().unwrap())
}
Err(_) => Err(ConnectionError::RequestsDone),
}, ^ this may have different disadvantages which i haven't thought about it though haha. btw, can you elaborate more details of the use case? as batching publishes would mean we are okie with messages being published late right? Batching would be cool feature to have though 💯 Thank you so much! |
Thanks for your thoughts!
Yes.
Currently the alives are flushed when the loop breaks of finishes. We could easily use
Good catch. This is definitely missing in my hack. One option could be to retrieve the sending buffer from
You cannot await a timeout in the select branch. This would stall the processing of incoming frames as well. async fn next_requests(..) -> Result<Vec<Request>, ...> {
let mut result = Vec::with_capacity(10);
let next = requests_rx.recv_async().awit?;
result.push(next);
while result.len() < result.capacity() {
match requests_rx.try_recv() {
Ok(r) => result.push(r),
Err(Empty) => return Ok(result),
Err(e) => return Err(e),
}
}
Ok(result)
}
Not really late. The tx buffer is filled as long as there are requests pending from the client (and up to a certain limit). There's not timeout awaiting in between. The flush happens right along. The use case is that we a client that has a high output rate. We're currently starving on the publish side because there's also inflow traffic that is processed with (currently) 10 times more iterations per poll than the incoming packets.
Agree :-)
|
That is exactly what I meant when I said modify ^ just like the pseudo code you mentioned in comment! can you also please confirm the perf impact once ( likely there won't be much with above approach ) and also if it's actually helping in reducing syscalls. thanks 😊 |
I refined the pseudocode here
Can you specify what exactly you want? Profiling data e.g flame graphs? The biggest advantage is to avoid starvation on the upstream path. |
A colleague tested the patch in one of our products. It's a arm64 linux on A57 with a constant publication rate of ~800/s to a local broker (test setup). We see a reduction in the CPU usage of the client application of ~8.5%. Is there a reproducible benchmark somewhere around here? |
niceee!
not that i'm aware of 😅 i had a doubt regarding the patch, why are we doing |
I changed the inflow just align it in style with outflow. The current implementation processes the messages in |
right, was I would say thanks! |
Sorry this was imprecise.
You're right regarding the allocations. Shall I draft a PR with the update to the rx path only? Just out of curiosity: There's the other branch that uses tokio_util::codec::Framed. It's missing the feature to retain the tx buffer over reconnects but otherwise is complete and passes the tests. The drawback here is that it introduces some |
still kinda went over me 😅 can you please explain wdym by full rx path and what mistakes ?
let's keep this issue related to batching of we will have different PR/issues/conversation for anything related to handling of incoming packets, this will make it easier for review and reason about, wdyt?
iirc, we are already re-using the buffer to read packets. So i think it will be same as you mentioned ( also due to how mqttbytes parse packets ). this might be biased ( and due to branch containing other refactors ) but for readability, I find the current code more maintainable / readable than with thanks! |
Thanks for your answer.
When modifying the read related code in
Sorry - my mistake - should have been tx.
Fully agree.
Fine for me. |
Collect up to BATCH_SIZE request from the client channel before flushing the network. Fixes bytebeamio#810
Collect up to BATCH_SIZE request from the client channel before flushing the network. Fixes bytebeamio#810
Collect up to BATCH_SIZE request from the client channel before flushing the network. Fixes bytebeamio#810
Collect up to BATCH_SIZE request from the client channel before flushing the network. Fixes bytebeamio#810
Expected Behavior
Outgoing publications are sent to the network io in batches and not in inefficient writes per publish.
Current Behavior
The (v5) client code "flushes" the outgoing queue after each request from the application here. This leads to one
write
orsendto
systemcall per publication. This is inefficient.Proposal
Queue up to n requests from the request channel and flush similar to the processing of the incoming packages where up to 10 packets are processed in a batch.
I implemented a small poc that loops over requests and processes up to n incoming or outgoing packages before flushing the io. We're currently evaluating the performance impact in our scenario which is has publications from the client to the broker in high frequencies (a less common use case).
Let me know if this is a possible way to tackle the problem.
The text was updated successfully, but these errors were encountered: