Skip to content

Commit

Permalink
Update examples with setting to forget unhealthy ingestors (rackerlab…
Browse files Browse the repository at this point in the history
…s#157)

If one of the loki-write pods moves due to a different node the hash ring can
become unhealthy. This will cause logs not to be sent to the backend.  This
will further lead to the other write pods to start filling up the volumes that
they use and eventually cause dropped logs.

Example error
```
ubuntu@overseer01:~$ k -n grafana logs daemonset.apps/loki-logs --tail 2 -f
Found 37 pods, using pod/loki-logs-m9xvs
ts=2024-03-15T14:42:17.533917001Z caller=client.go:419 level=warn component=logs logs_config=grafana/loki component=client host=loki-gateway.grafana.svc.cluster.local msg="error sending batch, will retry" status=500 tenant= error="server returned HTTP status 500 Internal Server Error (500): at least 2 live replicas required, could only find 1 - unhealthy instances: 10.233.82.56:9095,10.233.82.57:9095"
ts=2024-03-15T14:44:14.190670342Z caller=client.go:419 level=warn component=logs logs_config=grafana/loki component=client host=loki-gateway.grafana.svc.cluster.local msg="error sending batch, will retry" status=500 tenant= error="server returned HTTP status 500 Internal Server Error (500): at least 2 live replicas required, could only find 1 - unhealthy instances: 10.233.82.56:9095,10.233.82.54:9095"
ts=2024-03-15T14:47:22.746099384Z caller=client.go:419 level=warn component=logs logs_config=grafana/loki component=client host=loki-gateway.grafana.svc.cluster.local msg="error sending batch, will retry" status=500 tenant= error="server returned HTTP status 500 Internal Server Error (500): at least 2 live replicas required, could only find 1 - unhealthy instances: 10.233.82.56:9095,10.233.82.57:9095"
ts=2024-03-15T14:47:22.746172806Z caller=client.go:430 level=error component=logs logs_config=grafana/loki component=client host=loki-gateway.grafana.svc.cluster.local msg="final error sending batch" status=500 tenant= error="server returned HTTP status 500 Internal Server Error (500): at least 2 live replicas required, could only find 1 - unhealthy instances: 10.233.82.56:9095,10.233.82.57:9095"
ts=2024-03-15T14:47:23.806786166Z caller=client.go:419 level=warn component=logs logs_config=grafana/loki component=client host=loki-gateway.grafana.svc.cluster.local msg="error sending batch, will retry" status=500 tenant= error="server returned HTTP status 500 Internal Server Error (500): at least 2 live replicas required, could only find 1 - unhealthy instances: 10.233.82.56:9095,10.233.82.54:9095"
ts=2024-03-15T14:47:24.644865006Z caller=client.go:419 level=warn component=logs logs_config=grafana/loki component=client host=loki-gateway.grafana.svc.cluster.local msg="error sending batch, will retry" status=500 tenant= error="server returned HTTP status 500 Internal Server Error (500): at least 2 live replicas required, could only find 1 - unhealthy instances: 10.233.82.56:9095,10.233.82.57:9095"
ts=2024-03-15T14:47:25.886090072Z caller=client.go:419 level=warn component=logs logs_config=grafana/loki component=client host=loki-gateway.grafana.svc.cluster.local msg="error sending batch, will retry" status=500 tenant= error="server returned HTTP status 500 Internal Server Error (500): at least 2 live replicas required, could only find 1 - unhealthy instances: 10.233.82.56:9095,10.233.82.54:9095"
ts=2024-03-15T14:47:29.833266958Z caller=client.go:419 level=warn component=logs logs_config=grafana/loki component=client host=loki-gateway.grafana.svc.cluster.local msg="error sending batch, will retry" status=500 tenant= error="server returned HTTP status 500 Internal Server Error (500): at least 2 live replicas required, could only find 1 - unhealthy instances: 10.233.82.56:9095,10.233.82.54:9095"
ts=2024-03-15T14:47:34.541167878Z caller=client.go:419 level=warn component=logs logs_config=grafana/loki component=client host=loki-gateway.grafana.svc.cluster.local msg="error sending batch, will retry" status=500 tenant= error="server returned HTTP status 500 Internal Server Error (500): at least 2 live replicas required, could only find 1 - unhealthy instances: 10.233.82.56:9095,10.233.82.57:9095"
ts=2024-03-15T14:47:44.494616126Z caller=client.go:419 level=warn component=logs logs_config=grafana/loki component=client host=loki-gateway.grafana.svc.cluster.local msg="error sending batch, will retry" status=500 tenant= error="server returned HTTP status 500 Internal Server Error (500): at least 2 live replicas required, could only find 1 - unhealthy instances: 10.233.82.56:9095,10.233.82.54:9095"
ts=2024-03-15T14:48:08.686557194Z caller=client.go:419 level=warn component=logs logs_config=grafana/loki component=client host=loki-gateway.grafana.svc.cluster.local msg="error sending batch, will retry" status=500 tenant= error="server returned HTTP status 500 Internal Server Error (500): at least 2 live replicas required, could only find 1 - unhealthy instances: 10.233.82.56:9095,10.233.82.54:9095"
```

Signed-off-by: Chris Blumentritt <[email protected]>
  • Loading branch information
cblument authored Mar 18, 2024
1 parent 984d125 commit 0df2e22
Show file tree
Hide file tree
Showing 3 changed files with 6 additions and 0 deletions.
2 changes: 2 additions & 0 deletions helm-configs/loki/loki-helm-minio-overrides-example.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,5 @@ minio:
loki:
auth_enabled: false
configStorageType: Secret
ingester:
autoforget_unhealthy: true
2 changes: 2 additions & 0 deletions helm-configs/loki/loki-helm-s3-overrides-example.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ minio:
loki:
auth_enabled: false
configStorageType: Secret
ingester:
autoforget_unhealthy: true
storage:
bucketNames:
chunks: < CHUNKS BUCKET NAME > # TODO: Update with relevant bucket name for chunks
Expand Down
2 changes: 2 additions & 0 deletions helm-configs/loki/loki-helm-swift-overrides-example.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ minio:
loki:
auth_enabled: false
configStorageType: Secret
ingester:
autoforget_unhealthy: true
storage:
bucketNames:
chunks: chunks
Expand Down

0 comments on commit 0df2e22

Please sign in to comment.