You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 7, 2023. It is now read-only.
FYI, the RabbitMQ core team pretty routinely sees this monitoring tool causing unreasonably high load on nodes because it uses GET /api/queues to get all metrics of all queues and that can generate very large payloads that max out network links in the short term.
Consider 100K queues with 60 metrics each, all in a single JSON collection: that would be 60M key-value pairs. If we assume that each pair on average is 30 bytes long, that's 180 MiB of data that will require 180M * 8 ≈ 11 GBit/s to transfer.
Add a frequent check on top and you see how this tool can wreck havoc on the system it monitors.
Consider using the Prometheus format.
Prometheus metrics are scraped from each node individually and support an aggregated metrics mode specifically for this kind of problems.
The text was updated successfully, but these errors were encountered:
michaelklishin
changed the title
Extremely inefficient metric querying
Extremely inefficient metric querying can produce significant load on the monitored cluster
Sep 6, 2023
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
FYI, the RabbitMQ core team pretty routinely sees this monitoring tool causing unreasonably high load on nodes because it uses
GET /api/queues
to get all metrics of all queues and that can generate very large payloads that max out network links in the short term.Consider 100K queues with 60 metrics each, all in a single JSON collection: that would be 60M key-value pairs. If we assume that each pair on average is 30 bytes long, that's 180 MiB of data that will require 180M * 8 ≈ 11 GBit/s to transfer.
Add a frequent check on top and you see how this tool can wreck havoc on the system it monitors.
Consider using the Prometheus format.
Prometheus metrics are scraped from each node individually and support an aggregated metrics mode specifically for this kind of problems.
The text was updated successfully, but these errors were encountered: