Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REP-2014] Benchmarking performance in ROS 2 #364

Merged
merged 37 commits into from
Aug 24, 2023

Conversation

vmayoral
Copy link
Contributor

Signed-off-by: Víctor Mayoral Vilches [email protected]

@vmayoral vmayoral marked this pull request as draft September 29, 2022 15:10
Signed-off-by: Víctor Mayoral Vilches <[email protected]>
Signed-off-by: Víctor Mayoral Vilches <[email protected]>
Signed-off-by: Víctor Mayoral Vilches <[email protected]>
Signed-off-by: Víctor Mayoral Vilches <[email protected]>
rep-2014.rst Outdated Show resolved Hide resolved
Signed-off-by: Víctor Mayoral Vilches <[email protected]>
@ros-discourse
Copy link

This pull request has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2022-9-15/27394/2

Signed-off-by: Víctor Mayoral Vilches <[email protected]>
@vmayoral vmayoral marked this pull request as ready for review October 13, 2022 21:30
@ros-discourse
Copy link

This pull request has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/robotic-processing-unit-and-robotcore/27562/21

@ros-discourse
Copy link

This pull request has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/rep-2014-rfc-benchmarking-performance-in-ros-2/27770/1

@vmayoral
Copy link
Contributor Author

@christophebedard, @iluetkeb, @anup-pem, @dejanpan, @kylemarcey (and surely many others), this work is heavily inspired by your prior work. As pointed out in here, I'd love to have your views on this and wouldn't mind co-authoring it together.

rep-2014.rst Show resolved Hide resolved
rep-2014.rst Show resolved Hide resolved
rep-2014.rst Outdated Show resolved Hide resolved
rep-2014.rst Outdated

- `benchmarking`: a method of comparing the performance of various systems by running a common test.

From these definitions, inherently one can determine that both benchmarking and tracing are connected in the sense that the test/benchmark will use a series of measurements for comparison. These measurements will come from tracing probes. In other words, tracing will collect data that will then be fed into a benchmark program for comparison.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Measurements don't have to come from tracing, right? The "black box performance testing" section above seems to say that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The definition of tracing that is given above is so broad that it encompasses any kind of procedure "used to understand what goes on in a running software system". I haven't seen such a broad definition before, usually, tracing is defined as logging (partial) execution information while the system is running. Common usage is that these probes try to be very low overhead and log only information available directly at the tracepoint. This more narrow definition of tracing would not encompass all kinds of performance data capture I could envision (for example, another common approach is to compute internal statistics and make them available in various ways).

Unless it is really intended to restrict data capture to tracing, which would need a justification, I would suggest using the more common, narrow definition of the term in the definition section, and then edit this paragraph to encompass diverse methods of capturing data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sloretz and @iluetkeb, see fa942f9. Let me know if that addresses your concerns or if you'd like further changes. Feel free to suggest them if appropriate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @iluetkeb. I usually mention something like "fast/low-overhead, low-level logging ..."

@vmayoral I'd explicitly add "low-overhead, low-level" before "logging ..." just to make sure people don't get it mixed up with typical user-comprehensible string logging.

rep-2014.rst Show resolved Hide resolved
rep-2014.rst Outdated
Methodology for benchmarking performance in ROS 2
=================================================

In this REP, we **recommend adopting a grey-box and non-functional benchmarking approach** to measure performance and allow to evaluate ROS 2 individual nodes as well as complete computational graphs. To realize it in an architecture-neutral, representative, and reproducible manner, we also recommend using the Linux Tracing Toolkit next generation (`LTTng <https://lttng.org/>`_) through the `ros-tracing` project, which leverages probes already inserted in the ROS 2 core layers and tools to facilitate benchmarking ROS 2 abstractions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section seems to be the most important part of the REP. I'd recommend moving this to the top. I also recommend removing any definitions not used here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree about the importance but struggle to see how this could help a non-experienced person (into the topic). This paragraph leverages prior definitions including what's benchmarking, grey-box and non-functional approaches to it. Without properly defining these things, understanding things properly might be very hard.

We somewhat bumped into such scenario at ros-navigation/navigation2#2788 wherein after a few interactions, even for experienced ROS developers, it became somewhat clear to me (due to the lack of familiary with this approach) that the value of benchmarking things with this approach wasn't being understood properly. Lots of unnecesary demands were set because the approach was simply not understood, to later take a much less rigorous approach to benchmark the nav stack. In my experience the topics introduced in here are hard to digest and require proper context.

Maybe a compromise would be to grab these bits and rewrite them without introducing too much terminology in the abstract of the REP? @sloretz feel free to make a suggestion.

rep-2014.rst Show resolved Hide resolved
rep-2014.rst Show resolved Hide resolved
rep-2014.rst Outdated Show resolved Hide resolved
rep-2014.rst Show resolved Hide resolved
rep-2014.rst Outdated Show resolved Hide resolved
rep-2014.rst Outdated Show resolved Hide resolved
Copy link
Contributor

@flynneva flynneva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vmayoral really like this effort here - great work!

rep-2014.rst Outdated Show resolved Hide resolved
rep-2014.rst Show resolved Hide resolved
vmayoral and others added 7 commits October 15, 2022 08:03
Co-authored-by: Ingo Lütkebohle <[email protected]>
Co-authored-by: Shane Loretz <[email protected]>
Signed-off-by: Víctor Mayoral Vilches <[email protected]>
Co-authored-by: Shane Loretz <[email protected]>
Signed-off-by: Víctor Mayoral Vilches <[email protected]>
Co-authored-by: Ingo Lütkebohle <[email protected]>
Co-authored-by: Ingo Lütkebohle <[email protected]>
Co-authored-by: Ingo Lütkebohle <[email protected]>
Co-authored-by: Evan Flynn <[email protected]>
rep-2014.rst Outdated Show resolved Hide resolved

A quantitative approach
-----------------------
Performance must be studied with real examples and measurements on real robotic computations, rather than simply as a collection of definitions, designs and/or marketing actions. When creating benchmarks, prefer to use realistic data and situations rather than data designed to test one edge case. Follow the quantitative approach [1]_ when designing your architecture.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on this comparison? It makes sense to study performance with real examples and measurements. Could you clarify what this is recommending against? What does it mean to study performance as a collection of definitions, designs, and/or marketing actions?

re-instrument


1. instrument both the target ROS 2 abstraction/application using `LTTng <https://lttng.org/>`_. Refer to `ros2_tracing <https://github.com/ros2/ros2_tracing>`_ for tools, documentation and ROS 2 core layers tracepoints;
Copy link

@ErikAtApex ErikAtApex Jun 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. instrument both the target ROS 2 abstraction/application using `LTTng <https://lttng.org/>`_. Refer to `ros2_tracing <https://github.com/ros2/ros2_tracing>`_ for tools, documentation and ROS 2 core layers tracepoints;
1. instrument the target ROS 2 application using `LTTng <https://lttng.org/>`_. Refer to `ros2_tracing <https://github.com/ros2/ros2_tracing>`_ for tools, documentation and ROS 2 core layers tracepoints;

Methodology for benchmarking performance in ROS 2
=================================================

In this REP, we **recommend adopting a grey-box and non-functional benchmarking approach** to measure performance and allow to evaluate ROS 2 individual nodes as well as complete computational graphs. To realize it in an architecture-neutral, representative, and reproducible manner, we also recommend using the Linux Tracing Toolkit next generation (`LTTng <https://lttng.org/>`_) through the `ros2_tracing` project, which leverages probes already inserted in the ROS 2 core layers and tools to facilitate benchmarking ROS 2 abstractions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... benchmarking ROS 2 abstractions.

What does it mean to benchmark a ROS 2 abstractions? Should this instead be "applications" ?

@ros-discourse
Copy link

This pull request has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2023-06-15/32038/1

@SteveMacenski
Copy link
Contributor

SteveMacenski commented Jun 21, 2023

First off, I'm sorry to be harsh, but I think you'd rather me be honest and concise than verbose but still communicating the same thoughts in 10x as many words of fluff.

The first 178 lines of the REP are just background context, which is genuinely useful, but doesn't actually culminate in any prescriptive testing / evaluation procedures or test-stand setup standardization. There is no standard within this proposed standard. The best I see are 2 non-tangible recommendations without any specificity.

The proposed REP points to a package, but a package is not a standard nor is it a proxy for a written set of instructions, explicit aims and processes, and requirements which might have been used to create a particular implementation. This is a real problem if you're proposing a standard to fit an implementation rather than writing an standard based on requirements and needs and then proposing an implementation alongside it that meets the standard.

Personally, I would not approve this since the document contains no standard.


Separately, I think more or less this sentence included in the REP explains why this probably shouldn't be an REP

There are no globally accepted industry standards for benchmarking robotic systems.

If there is not even generally understood standards or benchmarking outlines for robotics, it seems like we're just making Standard #14 rather than consolidating a generally recognized set of things that are useful in the application of robotics benchmarking to ROS. All other REPs are more or less the consolidation or proposition of best practices in robotics as ROS-specifics. This doesn't seem sufficiently mature to be admitted as an REP until there is a set of globally accepted standards which we're adapting for ROS uses.

This feels better suited as a paper or something in literature. I applaud the effort, something like this to standardize how we take performance metrics of ROS based systems has great utility, but I think for a standard it needs to have more detail wrt the process and fulfilled requirements & be based on agreed upon best practices

@vmayoral
Copy link
Contributor Author

vmayoral commented Jul 5, 2023

First off, I'm sorry to be harsh, but I think you'd rather me be honest and concise than verbose but still communicating the same thoughts in 10x as many words of fluff.

The first 178 lines of the REP are just background context, which is genuinely useful, but doesn't actually culminate in any prescriptive testing / evaluation procedures or test-stand setup standardization. There is no standard within this proposed standard. The best I see are 2 non-tangible recommendations without any specificity.

The proposed REP points to a package, but a package is not a standard nor is it a proxy for a written set of instructions, explicit aims and processes, and requirements which might have been used to create a particular implementation. This is a real problem if you're proposing a standard to fit an implementation rather than writing an standard based on requirements and needs and then proposing an implementation alongside it that meets the standard.

Personally, I would not approve this since the document contains no standard.

Thanks for the input @SteveMacenski. Processing this now. Let me take your feedback and try re-iterating the content and throw it back to you and others for review before the next meeting.

@vmayoral
Copy link
Contributor Author

vmayoral commented Jul 5, 2023

Separately, I think more or less this sentence included in the REP explains why this probably shouldn't be an REP

There are no globally accepted industry standards for benchmarking robotic systems.

If there is not even generally understood standards or benchmarking outlines for robotics, it seems like we're just making Standard #14 rather than consolidating a generally recognized set of things that are useful in the application of robotics benchmarking to ROS. All other REPs are more or less the consolidation or proposition of best practices in robotics as ROS-specifics. This doesn't seem sufficiently mature to be admitted as an REP until there is a set of globally accepted standards which we're adapting for ROS uses.

I disagree with this input @SteveMacenski.

As discussed during the last TSC meeting, it's relevant to understand that there're different types of REPs (see REP-0001). What you refer to is the Standards Track REPs. This is not one of them, and never meant to be. This has always been an Informational REP and content/structure follows various other existing/approved ones. Let me quote that for you:

An Informational REP describes a ROS design issue, or provides general guidelines or information to the ROS community, but does not propose a new feature. Informational REPs do not necessarily represent a ROS community consensus or recommendation, so users and implementors are free to ignore Informational REPs or follow their advice.

Having remarked all of this, as pointed out above, I think there's value in your input. Thanks for that. Let me see what I can do about it to improve this REP PR.

@SteveMacenski
Copy link
Contributor

SteveMacenski commented Jul 6, 2023

Ah I see, that is my mistake then. I was not aware of the different REP tracks 😄

Given that context, I think that this is sufficient, but some additional high-level description would aid in the value to a reader to describe a bit about what ros2_tracing is measuring / doing, in particular. But, that's more of a request for some useful context more than the standards-level formality I was thinking previously.

I retract my previous hard-line position!


The reader is referred to `ros2_tracing <https://github.com/ros2/ros2_tracing>`_ and `LTTng <https://lttng.org/>`_ for the tools that enable a grey-boxed performance benchmarking approach in line with what the ROS 2 stack has been using (ROS 2 common packages come instrumented with LTTng probes). In addition, [3]_ and [4]_ present comprehensive descriptions of the `ros2_tracing <https://github.com/ros2/ros2_tracing>`_ tools and the `LTTng <https://lttng.org/>`_ infrastructure.

Reference implementations complying with the recommendations of this REP can be found in literature for applications like perception and mapping [5]_, hardware acceleration [6]_ [7]_ or self-driving mobility [8]_. A particular example of interest for the reader is the instrumentation of the `image_pipeline <https://github.com/ros-perception/image_pipeline/tree/humble/>`_ ROS 2 package [12]_, which is a set of nodes for processing image data in ROS 2. The `image_pipeline <https://github.com/ros-perception/image_pipeline/tree/humble/>`_ package has been instrumented with LTTng probes available in the ROS 2 `Humble` release, which results in various perception Components (e.g. `RectifyNode <https://github.com/ros-perception/image_pipeline/blob/ros2/image_proc/src/rectify.cpp#L82/>`_ *Component*) leveraging intrumentation which if enabled, can help trace the computational graph information flow of a ROS 2 application using such Component. The results of benchmarking the performance of `image_pipeline <https://github.com/ros-perception/image_pipeline/tree/humble/>`_ are available in [13]_ and launch scripts to both trace and analyze perception graphs available in [14]_.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we make this section more explicit via a bulleted list. E.g. "Here are reference implementations for particular application domains. Developers in these domains are encouraged to format their results in a similar style."

Methodology for benchmarking performance in ROS 2
=================================================

In this REP, we **recommend adopting a grey-box and non-functional benchmarking approach** to measure performance and allow to evaluate ROS 2 individual nodes as well as complete computational graphs. To realize it in an architecture-neutral, representative, and reproducible manner, we also recommend using the Linux Tracing Toolkit next generation (`LTTng <https://lttng.org/>`_) through the `ros2_tracing` project, which leverages probes already inserted in the ROS 2 core layers and tools to facilitate benchmarking ROS 2 abstractions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a lot of lead up to this section, which I think is important, but this is the "meat" of the rep. I think it might be helpful to make this section more self contained and explicitly state what is meant by gray-box and non-functional.

Personally, I don't find these definitions particularly clear:

"Grey-box performance testing: More application-specific measure which is capable of watching internal states of the system and can measure (probe) certain points in the system, thus generate the measurement data with minimal interference. Requires to instrument the complete application."

Just to clarify, gray box here means the code is instrumented at compile time with an external tracer as opposed to say, just creating a time object and measuring the elapsed time?

Non-functional performance testing: The measurable performance of those aspects that don't belong to the system's functions. In the motion planning example, the latency the planner, the memory consumption, the CPU usage, etc.

Just to clarify this section, do we mean that tests should be "end-to-end", as in from the time from when a "function" is called until it returns? This would be in contrast to say measuring every sub-routine of a "function" and only evaluating the "meaty" part of the system.

Since this is a recommendation to the community, many of whom are not experts in this sub-domain, we need to be clear, plain-spoken, and direct with our recommendations. A smart undergrad should be able to read this REP and clearly understand how to approach profiling.

Copy link
Contributor

@kscottz kscottz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick re-read before the TSC meeting. I left some feedback related to document clarity.

rep-2014.rst Show resolved Hide resolved
Abstract
========

This REP describes some principles and guidelines for benchmarking performance in ROS 2.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this "This Informational REP" to make it clear this is not a Standard REP?

Also, can you add "This REP then provides recommendations for tools to use when benchmarking ROS 2, and reference material for prior work done using those tools."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+-----------------------------+ +-----------------------------+


Transparent Opaque

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't describe the difference between transparent and opaque. You should either add that or remove the diagrams.

- `performance_test <https://gitlab.com/ApexAI/performance_test/>`_: Tool is designed to measure inter and intra-process communications. Runs at least one publisher and at least one subscriber, each one in one independent thread or process and records different performance metrics. It also provides a way to generate a report with the results through a different package.
- `reference_system <https://github.com/ros-realtime/reference-system/>`_: Tool designed to provide a framework for creating reference systems that can represent real-world distributed systems in order to more fairly compare various configurations of each system (e.g. measuring performance of different ROS 2 executors). It also provides a way to generate reports as well.
- `ros2-performance <https://github.com/irobot-ros/ros2-performance/>`_: Another framework to evaluate ROS communications and inspired on `performance_test`. There's a decent rationale in the form of a proposal, a good evaluation of prior work and a well documented set of experiments.
- `system_metrics_collector <https://github.com/ros-tooling/system_metrics_collector/>`_: A lightweight and *real-time* metrics collector for ROS 2. Automatically collects and aggregates *CPU* % used and *memory* % performance metrics used by both system and ROS 2 processes. Data is aggregated in order to provide constant time average, min, max, sample count, and standard deviation values for each collected metric. *Deprecated*.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put the Deprecated comment at the beginning of the description to make it more visible.

Methodology for benchmarking performance in ROS 2
=================================================

In this REP, we **recommend adopting a grey-box and non-functional benchmarking approach** to measure performance and allow to evaluate ROS 2 individual nodes as well as complete computational graphs. To realize it in an architecture-neutral, representative, and reproducible manner, we also recommend using the Linux Tracing Toolkit next generation (`LTTng <https://lttng.org/>`_) through the `ros2_tracing` project, which leverages probes already inserted in the ROS 2 core layers and tools to facilitate benchmarking ROS 2 abstractions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You state the recommendation without giving a summary statement of "why" you recommend it. example: We recommend using grey box testing because of reasons 1, 2 and 3.

@ros-discourse
Copy link

This pull request has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2023-07-20/32525/1

@ros-discourse
Copy link

This pull request has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/project-proposal-ros2-tool-to-collect-topic-metrics/32790/2

@kscottz
Copy link
Contributor

kscottz commented Aug 24, 2023

Hi Everyone,

This REP was reviewed and voted on by the ROS 2 TSC in our August 2023 meeting.. The vote was in favor of rejecting REP-2014. We plan to merge REP-2014 with status "rejected" so the information is still available.

The anonymous feedback from the TSC members was that the REP contains valuable information, but does not constitute a community standard. Some TSC members suggested REP-2014 would better serve the community as part of the ROS 2 Docs or as a stand-alone blog post.

Signed-off-by: Chris Lalancette <[email protected]>
@clalancette
Copy link
Contributor

Based on the TSC decision, I've marked this REP as "Rejected", and will merge it as such.

@clalancette clalancette merged commit 18c0a69 into ros-infrastructure:master Aug 24, 2023
@ros-discourse
Copy link

This pull request has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2023-08-17/33061/4

@ros-discourse
Copy link

This pull request has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2023-08-17/33061/6

@ros-discourse
Copy link

This pull request has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2023-08-17/33061/10

@ros-discourse
Copy link

This pull request has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2023-08-17/33061/11

@ros-discourse
Copy link

This pull request has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2023-08-17/33061/12

@ros-discourse
Copy link

This pull request has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2023-08-17/33061/16

@ros-discourse
Copy link

This pull request has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2023-08-17/33061/17

@ros-discourse
Copy link

This pull request has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/ros-2-tsc-meeting-minutes-2023-08-17/33061/25

@ros-discourse
Copy link

This pull request has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/ros-2-hardware-acceleration-working-group-2023-summary-and-new-directions-for-2024/35454/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.