Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removed Bottom 80% functions from latest snapshot. #998

Merged
merged 2 commits into from
Dec 17, 2024

Conversation

TeachMeTW
Copy link
Contributor

@TeachMeTW TeachMeTW commented Dec 5, 2024

Summary

Removes the bottom 80% of timers within functions where recorded times were consistently below 0.5 seconds, rendering the data effectively meaningless.

Changes

  • Removed low-value timers:

    • Affected areas:
      • Dist Filter
      • General Trip Segmentation
    • All removed timers had times < 0.5s.
      • The following timers were removed:
    data.name average_reading
    TRIP_SEGMENTATION/handle_out_of_order_points 0.021091872
    TRIP_SEGMENTATION/segment_into_trips_dist/has_trip_ended 0.014075215
    TRIP_SEGMENTATION/segment_into_trips_dist/post_loop 0.008481669
    TRIP_SEGMENTATION/get_time_range_for_segmentation 0.007696437
    TRIP_SEGMENTATION/segment_into_trips_dist/check_transitions_post_loop 0.00346295
    TRIP_SEGMENTATION/get_filters_in_df 0.001910879
    TRIP_SEGMENTATION/segment_into_trips_dist/handle_trip_end 0.000763951
    TRIP_SEGMENTATION/segment_into_trips_dist/continue_just_ended 0.000531671
    TRIP_SEGMENTATION/segment_into_trips_dist/mark_valid 0.000481227
    TRIP_SEGMENTATION/segment_into_trips_dist/get_last_trip_end_point 0.00035556
    TRIP_SEGMENTATION/get_time_series 1.906338366e-05
    TRIP_SEGMENTATION/create_time_filter 6.114703865e-06
    TRIP_SEGMENTATION/create_dist_filter 3.921490133e-06
    TRIP_SEGMENTATION/segment_into_trips_dist/set_new_trip_start_point 1.380765752e-06

See: e-mission/e-mission-docs#1098

Rationale

  • Data irrelevance: Timers with times < 0.5s provided no actionable insights or meaningful data for analysis.

Notes

  • This update currently applies only to Dist Filter and General Trip Segmentation.
  • Further analysis and additional removals will follow after receiving more data from production.

Testing

  • Visual Inspection
  • Ran pipeline on a test user

Next Steps

  • Analyze new production data to identify and remove additional low-value timers.

@shankari
Copy link
Contributor

shankari commented Dec 7, 2024

@TeachMeTW I don't want to see

Ran pipeline on a test user

I want to see what you checked when you ran the pipeline on a test user and what you expected to see.

Same for "visual inspection"

@shankari
Copy link
Contributor

shankari commented Dec 7, 2024

All removed timers had times < 0.5s.

Why is the cutoff 0.5 seconds? Where are the notebooks for this analysis checked in?

@TeachMeTW
Copy link
Contributor Author

All removed timers had times < 0.5s.

Why is the cutoff 0.5 seconds? Where are the notebooks for this analysis checked in?

The cut off is not explicity 0.5 seconds, it is just an observation I made

@TeachMeTW
Copy link
Contributor Author

Summary of Changes

  • Removed time filters as part of this update.
  • Added a new removal step in the distance filter after the discovery of TRIP_SEGMENTATION/segment_into_trips_dist/get_transition_df in the dataset.

These changes improve alignment with the latest dataset insights and enhance the accuracy of the segmentation process.

@TeachMeTW
Copy link
Contributor Author

Procedure:

Found overlap in the bottom 80% in both datasets for uniformity.

data.name cc_ebike smart_commute Notes
TRIP_SEGMENTATION/segment_into_trips_dist/get_transition_df 0.1217199190416269 0.060158295 new discovery
TRIP_SEGMENTATION/segment_into_trips_time/get_filtered_points_pre_ts_diff_df 0.061023128 0.030414367 removed
TRIP_SEGMENTATION/segment_into_trips_time/get_transition_df 0.048967199 0.02199962 removed
TRIP_SEGMENTATION/segment_into_trips_dist/post_loop 0.009637999 0.00584949 already removed in 1st round
TRIP_SEGMENTATION/segment_into_trips_time/calculations_per_iteration 0.009030038 0.007722907 removed
TRIP_SEGMENTATION/get_time_range_for_segmentation 0.008812385 0.00703603 already removed in 1st round
TRIP_SEGMENTATION/segment_into_trips_dist/check_transitions_post_loop 0.005481929 0.005358234 already removed in 1st round
TRIP_SEGMENTATION/handle_out_of_order_points 0.004389608 0.002818483 already removed in 1st round
TRIP_SEGMENTATION/segment_into_trips_time/post_loop 0.001586465 0.001302009 removed
TRIP_SEGMENTATION/segment_into_trips_time/filter_bogus_points 0.000983266 0.001009628 removed
TRIP_SEGMENTATION/segment_into_trips_dist/handle_trip_end 0.000659627 0.000634671 already removed in 1st round
TRIP_SEGMENTATION/segment_into_trips_dist/mark_valid 0.00039257 0.000374603 already removed in 1st round
TRIP_SEGMENTATION/segment_into_trips_dist/get_last_trip_end_point 0.000344171 0.000319264 already removed in 1st round
TRIP_SEGMENTATION/segment_into_trips_dist/continue_just_ended 0.000306323 9.399714569250744e-05 already removed in 1st round
TRIP_SEGMENTATION/get_filters_in_df 0.000283253 0.000289175 already removed in 1st round
TRIP_SEGMENTATION/get_time_series 1.9164622159294e-05 1.903555732077131e-05 already removed in 1st round
TRIP_SEGMENTATION/create_dist_filter 4.974840094848555e-06 4.705851770018878e-06 already removed in 1st round
TRIP_SEGMENTATION/create_time_filter 4.684489195038672e-06 4.562600324415181e-06 already removed in 1st round
TRIP_SEGMENTATION/segment_into_trips_dist/set_new_trip_start_point 1.5871754537026088e-06 1.2169592082500458e-06 already removed in 1st round

@shankari
Copy link
Contributor

@TeachMeTW I am merging this for now, but once we are done with identifying the set of timers that are relevant, I would like you to rebase the commits so that we don't have a lot of churn caused by adding and removing timers.

Since the timers use with, the indentation of the lines that are being timed changes, which results in a lot of back and forth changes for the "fluff" timers instead of what should be a NOP. I am concerned that we will inadvertently delete an important line (similar to e-mission/e-mission-phone#1189) and end up with a subtle and hard-to-investigate regression.

@shankari shankari merged commit 4034533 into e-mission:master Dec 17, 2024
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Tasks completed
Development

Successfully merging this pull request may close these issues.

2 participants