Skip to content

Commit

Permalink
make handler proces more robust
Browse files Browse the repository at this point in the history
There was a timing window when the handler grpc client was available, but before it was ready to send back metrics and if metrics were gathered there, the whole HTTP metrics response would be something like this
```
An error has occurred while serving metrics:

[from Gatherer #2] rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /tmp/EGH_ME3wf4gMCrac/service_rpc.sock: connect: no such file or directory"
```

That came from prometheus.registry.go:761, which sends back an error from the `Gatherers.Gather()` invocation which the Prometheus HTTP handler then decides to return.   What I've done is to quietly swallow errors from the handler (after logging) and simply return empty metrics objects and no error.
  • Loading branch information
dandoug committed Nov 6, 2023
1 parent 25fc2bd commit 9cab6b0
Showing 1 changed file with 5 additions and 5 deletions.
10 changes: 5 additions & 5 deletions pkg/service/process.go
Original file line number Diff line number Diff line change
Expand Up @@ -201,18 +201,18 @@ func getSocketAddress(handlerTmpDir string) string {
// Gather implements the prometheus.Gatherer interface on server-side to allow aggregation of handler metrics
func (p *Process) Gather() ([]*dto.MetricFamily, error) {
// Get the metrics from the handler via IPC
logger.Debugw("gathering metrics from handler process", "handlerID", p.handlerID)
logger.Debugw("gathering metrics from handler process", "egress_id", p.req.EgressId)
metricsResponse, err := p.grpcClient.GetMetrics(context.Background(), &ipc.MetricsRequest{})
if err != nil {
logger.Errorw("Error obtaining metrics from handler", err)
return make([]*dto.MetricFamily, 0), err
logger.Warnw("Error obtaining metrics from handler, skipping", err, "egress_id", p.req.EgressId)
return make([]*dto.MetricFamily, 0), nil // don't return an error, just skip this handler
}
// Parse the result to match the Gatherer interface
parser := &expfmt.TextParser{}
families, err := parser.TextToMetricFamilies(strings.NewReader(metricsResponse.Metrics))
if err != nil {
logger.Errorw("Error parsing metrics from handler", err)
return make([]*dto.MetricFamily, 0), err
logger.Warnw("Error parsing metrics from handler, skipping", err, "egress_id", p.req.EgressId)
return make([]*dto.MetricFamily, 0), nil // don't return an error, just skip this handler
}

// Add an egress_id label to every metric all the families, if it doesn't already have one
Expand Down

0 comments on commit 9cab6b0

Please sign in to comment.