Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analysis of Emission Pipeline Top 20% and Bottom 80% #1098

Open
TeachMeTW opened this issue Dec 4, 2024 · 3 comments
Open

Analysis of Emission Pipeline Top 20% and Bottom 80% #1098

TeachMeTW opened this issue Dec 4, 2024 · 3 comments

Comments

@TeachMeTW
Copy link

Introduces two distinct performance analysis methods to evaluate function-level metrics within our emission dataset. The objective is to identify which functions significantly impact performance and which do not, enabling targeted optimizations and improvements.

Analysis Types

1. Individual Entry Categorization

  • Purpose: Categorizes each individual data.reading entry into Top 20% or Bottom 80% based on the 80th percentile within each data.name group.
  • Use Case: Identifies specific high-impact executions of functions, allowing for pinpointing problematic instances.

Features

  • Exclusions: Specific functions are excluded as they are parent of smaller functions and provide no insights:
    • TRIP_SEGMENTATION/segment_into_trips
    • TRIP_SEGMENTATION/segment_into_trips_dist/loop
  • Sorting: Both Top 20% and Bottom 80% categories are sorted in descending order of data.reading for easy identification of high-impact entries.

2. Aggregated Entry Categorization

  • Purpose: Aggregates data.reading metrics (both sum and mean) for each data.name and categorizes the aggregated values into Top 20% and Bottom 80% based on their respective 80th percentiles.
  • Use Case: Determines which functions are consistently resource-intensive on average or cumulatively, providing a broader view of performance impact.

Features

  • Aggregation Types:
    • Sum Aggregation: Total data.reading per function.
    • Mean Aggregation: Average data.reading per function.
  • Sorting: Both Top 20% and Bottom 80% categories are sorted in descending order of aggregated data.reading.

bottom80_function_level_individual_sorted.csv
bottom80_function_level_mean_sorted.csv
bottom80_function_level_sum_sorted.csv
top20_function_level_individual_sorted.csv
top20_function_level_mean_sorted.csv
top20_function_level_sum_sorted.csv

@TeachMeTW
Copy link
Author

From both mean and sum, the same functions show up in the top 20% and bottom 80% albeit a different order. I will go ahead and prune them from the pipeline.

@shankari
Copy link
Contributor

shankari commented Dec 17, 2024

@TeachMeTW please put the results in graphs or inline tables so that we don't have to keep downloading files to see the results.

Where are the scripts that you used to generate this split? Please link to them here.
Without transparency in the scripts, you have no proof of the results

@shankari
Copy link
Contributor

I also don't see any updates in here after including the time filter. The top 20% contains only the dist filter results.
And the files were uploaded "two weeks ago" which was the time that the dist filter analysis was done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants