Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

catch and display disk usage issue in wis2box-api #674

Closed
maaikelimper opened this issue May 6, 2024 · 5 comments
Closed

catch and display disk usage issue in wis2box-api #674

maaikelimper opened this issue May 6, 2024 · 5 comments
Assignees
Labels
blocker Critical issue that should be given high priority monitoring Monitoring
Milestone

Comments

@maaikelimper
Copy link
Collaborator

While debugging and supporting users I have noted that a lack of disk space results in the error:
elasticsearch.exceptions.TransportError: TransportError(429, 'cluster_block_exception', 'index [stations] blocked by: [TOO_MANY_REQUESTS/12/disk usage exceeded flood-stage watermark, index has read-only-allow-delete block];')
which is raised inside the wis2box-api container.

This issue often goes unnoticed as the status of the wis2box-containers does not change and the error is not displayed in the Grafana Dashboard.

We should display this issue in the main Grafana dashboard and document it in the FAQ

@maaikelimper maaikelimper added the monitoring Monitoring label May 6, 2024
@maaikelimper maaikelimper added this to the sprint-015 milestone May 6, 2024
@maaikelimper maaikelimper self-assigned this May 6, 2024
@tomkralidis
Copy link
Collaborator

2024-07-24:

@maaikelimper maaikelimper added the blocker Critical issue that should be given high priority label Jul 25, 2024
@maaikelimper
Copy link
Collaborator Author

wis2box silently failing due the disk usage issue resulted in 5 days of data loss in the CMO-server, so I'm raising the priority of this issue.
I think instead of just displaying the disk usage issue we should consider disabling the wis2box as a whole when we detect that Elasticsearch stops functioning.

@tomkralidis
Copy link
Collaborator

2024-08-14:

  • PR in Issue 674 #721 (needs to resolve conflicts)
  • needs deploy/test/validate

@alimand
Copy link
Collaborator

alimand commented Aug 14, 2024

Thanks for checking and now there is no conflicts with main branch

@maaikelimper
Copy link
Collaborator Author

maaikelimper commented Aug 18, 2024

I've tested and reviewed the fix for and merged this in #729

This fix adds a new panel to Grafana to display alerts and includes an alert triggered by the elasticsearch-exporter:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker Critical issue that should be given high priority monitoring Monitoring
Projects
None yet
Development

No branches or pull requests

3 participants