-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Addition: Port EventsExecutor concept to Python #1389
Comments
Wow! Very interesting stuff. I think two things to make this conversation easier.
|
|
I probably wouldn't even worry about getting it into pristine (or even buildable shape). It's just a lot easier to read and talk about when it's already in git. |
Yes, thanks for opening this, this is great stuff.
Yes, that is accurate. It's just too big of a dependency. |
@bmartin427 this is really interesting! please consider to open the draft PR 👍 |
Sorry for the delay on this, I've been busy (and sick) around the holidays. I finally have that draft PR up, and I've done a little bit of beating the code into shape, though it doesn't quite work yet. |
Signed-off-by: Brad Martin <[email protected]>
Feature request
Feature description
TL;DR: Port the rclcpp EventsExecutor concept to Python for use in rclpy.
Motivation: My employer, Merlin Labs, has recently completed the porting of a large existing codebase from ROS1 to ROS2. We found this effort to be much more difficult than we anticipated, and the end result to be much less performant than we expected. After analysis, I found that the single biggest contribution to this issue was the fact that every Python node in our system was using something like five times the CPU that it had with ROS1! As a small toy example, I made a simple test node with a 100Hz timer, which published a small fixed-content message in the timer callback. In ROS1 Noetic, this node consumed 3% of a CPU core (Ubuntu on x86-64); in ROS2 Iron, this jumped to 16%!
I looked for existing discussions of this problem, and while there seem to be a few, there didn't seem to be much interest in doing anything about it. It seems the prevailing opinion is just "don't use Python if you care about CPU usage", which is maybe understandable in the context of new development, but was an unsuitable answer for us, since we were porting an existing project consisting of a mix of Python and C++ nodes and high rate publications, which all worked fine with ROS1.
The solution I ended up developing at Merlin was a port of the EventsExecutor concept to Python. To cite my earlier simple example case again, this implementation dropped the usage from 16% to 6%, which while still greater than ROS1's 3%, was much more tolerable for us. Furthermore, I have gotten approval from within Merlin to publish my work on this for upstream use.
This is not a request for development effort or code review, at least not yet. For now, I am just posting publicly that internal code for which I have approval to do so. This code is not suitable for merging to rclpy as-is, because it was designed to exist outside of rclpy and not require us to maintain a modified version of rclpy for it to function. As the next step, I myself in my personal capacity intend to rework this code into a suitable PR against rclpy itself. I do however have a couple design-level questions I'll get into below.
Implementation considerations
In addition to all of the arguments relating to wait sets that iRobot made in favor of their implementation over rclcpp's SingleThreadedExecutor, rclpy executors suffer the additional overhead of actually being implemented in Python. I chose to implement my executor in C++ instead to mitigate this additional overhead.
My implementation of this concept ended up somewhat simpler than iRobot's, because I based it on
boost::asio
instead of recreating their EventsQueue. I think I understand ROS prefers to avoid use of Boost at almost any cost, is this accurate? If this is the case, I believe I can probably preserve much of the simplicity I got out of using bound callbacks instead of their ExecutorEvent objects, even if I have to implement my own callback dispatch queue.One improvement I've been able to make in this implementation relative to rclcpp is that the TimersManager component now supports debug time. It is however potentially more limited in terms of the number of simultaneous timers it can support in a performant manner, because I eschewed their 'heap' approach to timer maintenance in favor of just iterating through the entire list each update. (None of Merlin's nodes contain a large number of timers, so the simplicity of this approach outweighed the complexity of the heap for us.)
This implementation is single-threaded only, and completely ignores all CallbackGroups. Adding multi-threading would be significant additional work that we at Merlin didn't find a need to do.
This implementation does not yet properly support async callbacks. It was on my TODO list to work on this, but unfortunately I never got to it. Would this be a blocker for upstream merging? I can make sure to properly implement and test this first if so (and may do so anyway).
This code was implemented and deployed against Iron. I'll forward port to Rolling before I stage a PR.
merlin_python_events_executor.tar.gz
The text was updated successfully, but these errors were encountered: