Packet Loss Monitoring? #22

clone1018 · 2021-04-17T16:28:49Z

Should we be monitoring packet loss between our Ingest / Edge servers? Should we be tracking NACK stats in prometheus as well?

LukeHandle · 2021-04-17T16:39:01Z

Interesting idea of whether use of the blackbox_exporter on every node ICMP pinging (and/or FTL or HTTP check?) every other node. The issue is really how would be helpfully use this data. We would have end to end/both ways tracking of everything (great!). But how to helpfully visualise..!

If a node or region had an issue, we would see all/most pings to it see latency/loss, and potentially pings out would never be recorded/captured. Maybe some sort of aggregation based on a target label and tracking deviations from averages?

LukeHandle · 2021-04-23T08:53:15Z

https://github.com/benjojo/sping might be another interesting tool to further investigate. Handily exposes /metrics as well.

LukeHandle · 2021-05-06T16:33:04Z

I had seen https://grafana.com/grafana/plugins/grafana-synthetic-monitoring-app/ before, but had assumed it was Grafana Cloud only.

In fact, it's all open source (plugin is AGPLv3, client is Apache 2), though I the reality is that it's just blackbox_exporter + logs, and pretty graphs. We could just use blackbox_exporter and the graphs as inspiration:

And it's the graphs that give at least some idea how we could use the data. We would have a dashboard like that ^^ with var selectors for each instance.

Nice world map would highlight issues between regions.

Would we opt for every edge tracking every ingress + every ingress tracking every edge? That's the internal flow of data, so most relevant.

I expect it'd be cheap to add more locations on diff hosting providers if we wanted an external perspective.

==

Thoughts @clone1018

clone1018 · 2021-05-06T16:49:05Z

Would we opt for every edge tracking every ingress + every ingress tracking every edge? That's the internal flow of data, so most relevant.

I think this sounds like a good approach!

clone1018 added the enhancement New feature or request label Apr 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Packet Loss Monitoring? #22

Packet Loss Monitoring? #22

clone1018 commented Apr 17, 2021

LukeHandle commented Apr 17, 2021

LukeHandle commented Apr 23, 2021

LukeHandle commented May 6, 2021

clone1018 commented May 6, 2021

Packet Loss Monitoring? #22

Packet Loss Monitoring? #22

Comments

clone1018 commented Apr 17, 2021

LukeHandle commented Apr 17, 2021

LukeHandle commented Apr 23, 2021

LukeHandle commented May 6, 2021

clone1018 commented May 6, 2021