Skip to content

Commit

Permalink
Add separate alert for HTTP gateway timeout responses
Browse files Browse the repository at this point in the history
  • Loading branch information
lkstz authored Jun 21, 2024
1 parent 28b8ab3 commit 1c2d3b7
Show file tree
Hide file tree
Showing 4 changed files with 19 additions and 3 deletions.
3 changes: 2 additions & 1 deletion charts/generic-service/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -157,7 +157,8 @@ app:
| `alerting.http.referenceInterval` | `1w` | The time interval to to compare with the sample interval to detect changes |
| `alerting.http.maxSlowdown` | `2.5` | The maximum HTTP response slowdown in the sample interval compared to the reference interval |
| `alerting.http.max4xxRatio` | `2.5` | The maximum HTTP 4xx ratio increase in the sample interval compared to the reference interval |
| `alerting.http.max5xxCount` | `0` | The maximum number of HTTP 5xx responses in the sample interval |
| `alerting.http.max5xxCount` | `0` | The maximum number of HTTP 5xx responses (except 504) in the sample interval |
| `alerting.http.maxTimeoutCount` | `0` | The maximum number of HTTP gateway timeout responses (504) in the sample interval |
| `alerting.grpc.requestsMetric` | `grpc_server_handled_total` | The name of the Prometheus metric counting gRPC requests |
| `alerting.grpc.sampleInterval` | `20m` | The time interval in which to measure gRPC responses |
| `alerting.grpc.referenceInterval` | `1w` | The time interval to to compare with the sample interval to detect changes |
Expand Down
11 changes: 10 additions & 1 deletion charts/generic-service/templates/alerts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -178,12 +178,21 @@ spec:

- alert: Http5xx
expr: |
sum(round(increase({{ include "generic-service.request-code-count-metric" . }}"5.."}[{{ .Values.alerting.http.sampleInterval }}])))
sum(round(increase({{ include "generic-service.request-code-count-metric" . }}"5.[^4]"}[{{ .Values.alerting.http.sampleInterval }}])))
> {{ .Values.alerting.http.max5xxCount }}
labels: {{- include "generic-service.alert-labels" . | nindent 12 }} critical
topic: ingress
annotations: {{- include "generic-service.alert-annotations" . | nindent 12 }} HTTP 5xx responses
description: '{{ include "generic-service.fullname" . }} gave {{"{{ $value }}"}} HTTP 5xx responses in the last {{ .Values.alerting.http.sampleInterval }}.'

- alert: HttpTimeout
expr: |
sum(round(increase({{ include "generic-service.request-code-count-metric" . }}"504"}[{{ .Values.alerting.http.sampleInterval }}])))
> {{ .Values.alerting.http.maxTimoutCount }}
labels: {{- include "generic-service.alert-labels" . | nindent 12 }} critical
topic: ingress
annotations: {{- include "generic-service.alert-annotations" . | nindent 12 }} HTTP gateway timeout responses
description: '{{ include "generic-service.fullname" . }} gave {{"{{ $value }}"}} HTTP gateway timout responses in the last {{ .Values.alerting.http.sampleInterval }}.'
{{- end }}

{{- if or (eq .Values.ingress.protocol "grpc") (eq .Values.ingress.protocol "grpcs") }}
Expand Down
7 changes: 6 additions & 1 deletion charts/generic-service/values.schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -889,7 +889,12 @@
"max5xxCount": {
"type": "number",
"default": 0,
"description": "The maximum number of HTTP 5xx responses in the sample interval"
"description": "The maximum number of HTTP 5xx responses (except 504) in the sample interval"
},
"maxTimeoutCount": {
"type": "number",
"default": 0,
"description": "The maximum number of HTTP gateway timeout responses (504) in the sample interval"
}
},
"additionalProperties": false
Expand Down
1 change: 1 addition & 0 deletions charts/generic-service/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,7 @@ alerting:
maxSlowdown: 2.5
max4xxRatio: 2.5
max5xxCount: 0
maxTimeoutCount: 0
grpc:
requestsMetric: grpc_server_handled_total
sampleInterval: 20m
Expand Down

0 comments on commit 1c2d3b7

Please sign in to comment.