Deduplicate calls to aggregateSparkMetricsBySql #1464
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Signed-off-by: Ahmed Hussein (amahussein) [email protected]
Contributes to #1461
AppSparkMetricsAnalyzer was calling
aggregateSparkMetricsBySql
twice. This code change eleiminates this redundancy to save CPU time and memory allocations.aggregateSparkMetricsBySql
was responsible for more than 53% of total CPU time. This code change cashes the value then pass it to the second method.Running the same eventlog in the issue description, the performance show the following results:
total CPU: 1,290,033ms -> improved by 24%
total time: 9,096,942 ms -> improved by 23.6%
total allocation: 4.28 TB -> improved by 30.2%
getAggregateRawMetrics
: CPU Time -> 954,620 (improved by 31%)| 3.93 TB ()aggregateSparkMetricsBySql
: 496,760 ms; 35% of total, 48% of parent | 1.86 TB (44% of all; 47% of parent)aggregateSparkMetricsByJob
: 496,560 ms; 38% total, 52% of parent | 2.06 TB (48% of all, 53% of parent)ProfileMain_2024_12_13_194959.zip