Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 674 #721

Closed
wants to merge 7 commits into from
Closed

Issue 674 #721

wants to merge 7 commits into from

Conversation

alimand
Copy link
Collaborator

@alimand alimand commented Aug 5, 2024

Fix issue 674 and issue 683:
Adding new dashboard in Grafana to monitor the status of elasticsearch health by using metrics:

  1. elasticsearch_indices_settings_stats_read_only_indices(add alert)
  2. elasticsearch_filesystem_data_available_bytes
  3. elasticsearch_filesystem_data_free_bytes
    Adding disk space usage status moniring and alert in new dashboard in Grafana
    Optimize container relationship in wis2box to make sure when machine restart wis2box could bahave well in 1-2 minutes.

@alimand alimand requested a review from maaikelimper August 5, 2024 14:08
@@ -108,6 +125,14 @@ services:
<<: *logging
elasticsearch:
<<: *logging
container_name: elasticsearch
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you add another "elasticsearch"-services into the docker-compose.monitoring.yml ? I think this is a mistake and should be removed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thats an error! I removed it

Copy link
Collaborator

@maaikelimper maaikelimper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Peiwen, I see several changes in the code that I think are related to local development and should be removed. Could you please add a screenshot of the new dashboard that you made ?

@@ -86,6 +87,8 @@ services:
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
- cluster.name=es-wis2box
- xpack.security.enabled=false
ports:
- "9200:9200"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This additional port-mapping should not be required, please remove it

@@ -147,6 +150,8 @@ services:

wis2downloader:
container_name: wis2downloader
image: wis2box_project-wis2downloader
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you define the image name to the downloader image ? should be removed


- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this job do ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this part it's useless

static_configs:
- targets: ['localhost:9090']

- job_name: 'node_exporter'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the node_exporter for now

ipv4_address: 10.5.0.2
default:

node_exporter:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the node-exporter for now

@alimand
Copy link
Collaborator Author

alimand commented Aug 5, 2024

Hi Peiwen, I see several changes in the code that I think are related to local development and should be removed. Could you please add a screenshot of the new dashboard that you made ?

Yes, sure Maiike. Thanks for your correctness, let me change this part!

@alimand
Copy link
Collaborator Author

alimand commented Aug 5, 2024

Hi Peiwen, I see several changes in the code that I think are related to local development and should be removed. Could you please add a screenshot of the new dashboard that you made ?

image Of course, here is the screenshot of the new dashboard in Grafana

@maaikelimper
Copy link
Collaborator

Hi Peiwen, thank your for the screenshot.

I think you can improve this dashboard for the wis2box-users:

"alarmtracking dashboard" -> this is the title of the dashboard but you named the json file that defines it "metricsmonitoring.json" please be consistent and think of a proper title and filename , you should mention "Elasticsearch" if the metrics are based on Elasticsearch

"disk space usage" -> what is the unit ? the Y-axis shows 3 times 27 which is confusing , can you please improve this

"count of indices that have read_only_allow_delete=true" -> it is sufficient to display the current value for this metric, anything above 0 should be displayed in red

What is the difference between "available space" and "free space" ? Are they both required for the user ? Please display bytes using a human-friendly unit (Grafana has built-in option to auto-convert bytes into kB/MB)

the screenshots shows a range of "last 5 minutes" please define a sensible default for the default time-range

finally think about the usage of colors in your dashboard, for example you can set thresholds in Grafana to make a bad value display as red, while good values can be displayed in green

@alimand
Copy link
Collaborator Author

alimand commented Aug 5, 2024

Hi Peiwen, thank your for the screenshot.

I think you can improve this dashboard for the wis2box-users:

"alarmtracking dashboard" -> this is the title of the dashboard but you named the json file that defines it "metricsmonitoring.json" please be consistent and think of a proper title and filename , you should mention "Elasticsearch" if the metrics are based on Elasticsearch

"disk space usage" -> what is the unit ? the Y-axis shows 3 times 27 which is confusing , can you please improve this

"count of indices that have read_only_allow_delete=true" -> it is sufficient to display the current value for this metric, anything above 0 should be displayed in red

What is the difference between "available space" and "free space" ? Are they both required for the user ? Please display bytes using a human-friendly unit (Grafana has built-in option to auto-convert bytes into kB/MB)

the screenshots shows a range of "last 5 minutes" please define a sensible default for the default time-range

finally think about the usage of colors in your dashboard, for example you can set thresholds in Grafana to make a bad value display as red, while good values can be displayed in green

Sure, thanks for your review Maaike, let me impove it based on the comments. Will provide new version soon.

…y to user's personal emails/friendly dashboard to users
@alimand
Copy link
Collaborator Author

alimand commented Aug 11, 2024

Some updates based on suggestions and optimize:
Modify all the sugguestions part;
At each top of the dashboard, the alert list will be shown (only the alerting one)
When alert is triggered, notifications could be sent to personal emails
Test Showcase:
WechatIMG735
WechatIMG736
WechatIMG737
WechatIMG738
WechatIMG8601
WechatIMG8605

@alimand alimand requested a review from maaikelimper August 11, 2024 21:04
@alimand
Copy link
Collaborator Author

alimand commented Aug 11, 2024

Current Alerts Overview:
Purpose: Displays a list of currently active alerts.

Elasticsearch Disk Space Utilization:
Purpose: This panel shows the percentage of disk space used on the Elasticsearch node. It is an important metric to monitor the available disk space and ensure that the Elasticsearch cluster has sufficient capacity.
Alert Condition: Triggers if disk usage exceeds 80% for 5 minutes.

Available Disk Space (GB):
Purpose: This panel shows the amount of free disk space available on the block device where Elasticsearch stores its data. Monitoring this helps prevent the disk from running out of space.
Alert Condition: Triggers if free space falls below 10 GB for 5 minutes.

Elasticsearch Disk Usage Thresholds:
Purpose: This panel shows the high and low disk watermarks configured for Elasticsearch in kilobytes (KB). These watermarks determine when Elasticsearch should start taking action to prevent nodes from running out of disk space, such as stopping new data allocations.
Disk Read and Write Activity:
Purpose: This panel displays the rate of filesystem read and write operations per second. It helps to monitor how frequently data is being read from or written to the disk, which can be indicative of disk performance or potential bottlenecks.

Read-Only Indices Count:
Purpose: This setting is often used to prevent further writes to indices that are running out of disk space.
Alert Condition: Triggers if any read-only indices are detected (count > 0).

make sure there is no conflict with main branch
remove the whitespace
@alimand
Copy link
Collaborator Author

alimand commented Aug 16, 2024

updates shown in issue-674-new branch

@alimand alimand closed this Aug 16, 2024
@tomkralidis tomkralidis deleted the issue-674 branch August 16, 2024 11:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants