Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Packet Loss Monitoring? #22

Open
clone1018 opened this issue Apr 17, 2021 · 4 comments
Open

Packet Loss Monitoring? #22

clone1018 opened this issue Apr 17, 2021 · 4 comments
Labels
enhancement New feature or request

Comments

@clone1018
Copy link
Member

Should we be monitoring packet loss between our Ingest / Edge servers? Should we be tracking NACK stats in prometheus as well?

@clone1018 clone1018 added the enhancement New feature or request label Apr 17, 2021
@LukeHandle
Copy link
Contributor

Interesting idea of whether use of the blackbox_exporter on every node ICMP pinging (and/or FTL or HTTP check?) every other node. The issue is really how would be helpfully use this data. We would have end to end/both ways tracking of everything (great!). But how to helpfully visualise..!

If a node or region had an issue, we would see all/most pings to it see latency/loss, and potentially pings out would never be recorded/captured. Maybe some sort of aggregation based on a target label and tracking deviations from averages?

@LukeHandle
Copy link
Contributor

https://github.com/benjojo/sping might be another interesting tool to further investigate. Handily exposes /metrics as well.

@LukeHandle
Copy link
Contributor

I had seen https://grafana.com/grafana/plugins/grafana-synthetic-monitoring-app/ before, but had assumed it was Grafana Cloud only.

In fact, it's all open source (plugin is AGPLv3, client is Apache 2), though I the reality is that it's just blackbox_exporter + logs, and pretty graphs. We could just use blackbox_exporter and the graphs as inspiration:

image

And it's the graphs that give at least some idea how we could use the data. We would have a dashboard like that ^^ with var selectors for each instance.

Nice world map would highlight issues between regions.

Would we opt for every edge tracking every ingress + every ingress tracking every edge? That's the internal flow of data, so most relevant.

I expect it'd be cheap to add more locations on diff hosting providers if we wanted an external perspective.

==

Thoughts @clone1018

@clone1018
Copy link
Member Author

Would we opt for every edge tracking every ingress + every ingress tracking every edge? That's the internal flow of data, so most relevant.

I think this sounds like a good approach!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants