-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Cluster manager node attempts to calculate indices cache metric #305
Comments
Essentially, when we have a dedicated cluster manager node, we should skip collecting all metrics along dimension(see The Dedicated Cluster Manager is already an overloaded node w.r.t Performance Analyzer RCA, disabling metric collection for a set of metrics will help bring down the overall footprint of the PA-RCA component. Identify a Dedicated Master Node: |
This would require changing the code logic in In case of |
As mentioned in the comment above and in #308, these errors can be prevented by limiting certain RCA node executions to certain OS node roles. Note that these propose a framework change, which will not prevent "incorrect" usage of tags and framework itself, and exceptions like ones from the issue description would still be possible. Those scenarios should also be gracefully handled so let this remain a separate issue from 308. |
Mostly agree with the proposal mentioned here ie to not collect these metrics for dedicated cluster manager node. But we should also consider non dedicated cluster manager nodes as well, right? As it doesn't make here as well. Identification of cluster manager nodes should be possible in this package as already have mechanisms in place where RCA has domain level info ie which nodes are of cluster manager or data roles. RCA framework itself relies on it. |
@sgup432 you're right. I talked about that in #308. Currently we have to choose between no-effect dedicated Cluster Manager execution if we want correctness, and no execution at non dedicated Cluster Manager nodes if we want performance, #308 proposes framework change in order to achieve both correctness and performance without tradeoffs. |
As the node's decision for which tags to apply comes down to RCA
It is important to note that solutions from points 2 and 3 currently have indistinct feasibility and consequences because of this higher level of automation that wasn't implemented nor planned by the original creators of the framework, but left to be hand configured. |
What is the bug?
We saw below errors in PA log in one of the master/clusterManager node.
It attempts to calculate indices cache related metrics which is not present on cluster manager nodes. PA plugin does not write this data into shared memory.
PA plugin write below data in data node but not for cluster manager node. As expected.
How can one reproduce the bug?
Steps to reproduce the behavior:
What is the expected behavior?
We should not calculate indices cache related metrics at cluster manager node. Might require changes around ReaderMetricProcessor.
What is your host/environment?
Do you have any screenshots?
If applicable, add screenshots to help explain your problem.
Do you have any additional context?
Add any other context about the problem.
The text was updated successfully, but these errors were encountered: