Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add write_load to _cat/shards #117947

Open
henrikno opened this issue Dec 3, 2024 · 1 comment
Open

Add write_load to _cat/shards #117947

henrikno opened this issue Dec 3, 2024 · 1 comment
Labels
:Data Management/Stats Statistics tracking and retrieval APIs >enhancement Team:Data Management Meta label for data/management team

Comments

@henrikno
Copy link
Contributor

henrikno commented Dec 3, 2024

Description

I was looking at a cluster where a couple of nodes were working really hard, and other weren't, so a case of imbalanced shards. And I was looking for a way to try to figure out which shards might be contributing the most to the load of the particular nodes that were showing 90%+ CPU.
My first approach was to capture index stats with shard level stats twice a few minutes apart and diff them. And e.g. sort by the difference of in indexing_index_time_in_millis. It worked ok, but requires multiple API calls and a script to compute the diff.
But then I noticed there's already a write_load on a per shard level. I've only seen it on e.g. node level and data-stream level, but this correlates with the highest indexing time per shard, and the top shards by write load matches the nodes that have high CPU.

Made a script that calls GET /_stats/docs,indexing,merge?level=shards and just shows write load and sort by it:

python3 shard_write_load.py
                                                                                index shard  primary                   node  write_load
                                             .ds-traces-apm-default-2024.12.03-001590    10    False A0z-R5vDSvOl8ywdKdHAJA      1.9748
                                             .ds-traces-apm-default-2024.12.03-001590     9     True A0z-R5vDSvOl8ywdKdHAJA      1.9660
                                             .ds-traces-apm-default-2024.12.03-001590     2    False ZyoEhPYfTJa3vi8jsNQzJg      1.5078
           .ds-metrics-elasticsearch.stack_monitoring.index-default-2024.11.29-000118     1     True 7U_93ikgR6WRtfnf0KiCmg      1.4968
           .ds-metrics-elasticsearch.stack_monitoring.index-default-2024.11.29-000118     2    False gS7uMutcRYKmRkA0AAD86w      1.4273
                                             .ds-traces-apm-default-2024.12.03-001590     6     True gS7uMutcRYKmRkA0AAD86w      1.4027
                                             .ds-traces-apm-default-2024.12.03-001590    11    False gS7uMutcRYKmRkA0AAD86w      1.4007

Indeed the top nodes here are the ones with high CPU, and now it's easier to see which indices/shards to move.

It would be awesome if we had this as a column in _cat/shards I could just add ?s=write_load:desc to make this easier.

@henrikno henrikno added >enhancement needs:triage Requires assignment of a team area label labels Dec 3, 2024
@dnhatn dnhatn added the :Data Management/Stats Statistics tracking and retrieval APIs label Dec 4, 2024
@elasticsearchmachine elasticsearchmachine added Team:Data Management Meta label for data/management team and removed needs:triage Requires assignment of a team area label labels Dec 4, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Stats Statistics tracking and retrieval APIs >enhancement Team:Data Management Meta label for data/management team
Projects
None yet
Development

No branches or pull requests

3 participants