-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Investigate long execution time of eventlogs #1461
Comments
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]> Contributes to NVIDIA#1461 AppSparkMetricsAnalyzer was calling `aggregateSparkMetricsBySql` twice. This code change eleiminates this redundancy to save CPU time and memory allocations.
Thanks @amahussein for investigating into this! QQ: |
Good question! |
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]> Contributes to #1461 AppSparkMetricsAnalyzer was calling `aggregateSparkMetricsBySql` twice. This code change eleiminates this redundancy to save CPU time and memory allocations.
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]> Contrinutes to NVIDIA#1461 This commit improves the implementation of aggregation accross raw metrics by replacing the builtin scala collections with accumulators.
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]> Contributes to NVIDIA#1461 This commit improves the implementation of aggregation accross raw metrics by replacing the builtin scala collections with accumulators.
* Optimize implementation of getAggregateRawMetrics in core-tools * address reviews and fix issues in aggregateDiagnostic Contributes to #1461 This commit improves the implementation of aggregation accross raw metrics by replacing the builtin scala collections with accumulators. --------- Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]> Contributes to NVIDIA#1461 Adds an InPlace median finding to improve the performance of the metric aggregates. We used to sort a sequence to create StatisticsMetrics which turned out to be very expensive in large eventlogs.
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]> Fixes NVIDIA#1461 Adds an InPlace median finding to improve the performance of the metric aggregates. We used to sort a sequence to create StatisticsMetrics which turned out to be very expensive in large eventlogs. Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
Describe the bug
After we addressed memory configuration in #1382, we need to narrow down to find where does the core-tool spend most of the time analyzing the eventlogs
Analysis result for an eventlog that takes up to 28 minutes:
jvm Args:
total CPU: 1,695,427ms
total time: 11,919,747 ms
total allocation: 6.14 TB
Methods with highest resource utilizations ProfileMain_2024_12_13_162335.zip:
getAggRawMetrics
: 1,396,931 ms | 82% of all | 5.78 TB (94 % of all)aggregateSparkMetricsByJob
: 498,101 ms (29% parent, 36% total) | 2.06 TB (36 parent, 34% all).aggregateSparkMetricsBySql
: 448,030 ms (32% of parent, 27% of all) | 1.86 TB (32% parent, 30%).Tasks
The text was updated successfully, but these errors were encountered: