-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Audit QoS settings in Nav2; Create consistent internal profiles #4888
Comments
I commend the idea. When not using Best Effort. I think I of run into the problem of slow subscribers effectively live locking the publishers and other subscribers. I try to mitigate that somewhat, especially when networking is involved, say for example costmap visualization on wifi, using some features of zenoh-ros2dds-bridge (on humble). https://github.com/eclipse-zenoh/zenoh-plugin-ros2dds/blob/b662a95730f8daeaa71669406d0e00bef7898bd5/DEFAULT_CONFIG.json5#L95 Is it just me and my misguided superstition? Do you have goto tools/techniques to discover/measure/pinpoint slow subscribers? |
I believe that essentially is the behavior with using DDS' synchronous publishing (default) since the subscriber sends ACKs to the publisher, it does flow control on publication rate depending on packet loss and network bandwidth. There's some relationship going on there under the hood - which is good when you need to reliably guarantee delivery. I've recently been auditing the DDS implementations' documentation and tuning guides and this is the reason they recommend Sync publications for robotics / critical systems (where async where it is published in a separate thread is better for high rate streaming data/lower overhead; but comes at the downside of not being as reliable and as fancy tools for critical data guarantees). So no, I don't think that's probably superstition & reliable transport over WiFi is probably bad for that (and likely other) reasons. This is a benefit of the Best Effort publisher where there is no ack, flow control (but instead sends at full rate without regard for subscriber processing state to keep up or networking load).
I don't have any, but I know of some. ROS 2 Tracing was built to put tracepoints across the software so you can have timing information while things are processing. I haven't used, but I know it is well loved. There are also DDS logs that come out, but those are a firehose and I've never personally successfully debugged a problem from them given how much is going on in Nav2. I'd start with adding timer wrappers around the callbacks I'm suspicious of so that you can measure / log timing of callbacks to see what's taking longer than you expect. Its like tracing, I suppose, but personally to me easier to implement and manage for testing single areas at a time. A templated util function could be pretty easily made that itself owns the subscriber's callback via a provided lambda & it wraps a chrono start/end time to log the difference to a file / screen. |
It would be good to do some auditing on QoS settings in Nav2 and make sure that they are sensible. In particular:
Consider also adding in support for the newer features like:
It might also be good to have launch files that launch and process a discovery server to reduce network traffic.
DDS configs like transport layer, sync vs async, local host only, buffer / fragment sizes, unicast, etc should be left to the user's setup
Considerations
I think we could have a set of Nav2 specific QoS policies for "publishers", "subscribers", and "latched" so that these are portable and consistent across the code base. During the audit, rather than fix each one, we can move each to use our default profiles, unless there is a compelling reason for some to be differentiated.
We could even wrap the
create_XYZ()
factories in a Nav2 version that also does things under the hood with respect to QoS override acceptance, deadline/liveliness callbacks (nav2_utils::LifecycleNode
to handle), perhaps even lifecycle for subscription, etcThe text was updated successfully, but these errors were encountered: