Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce Timing Measurement Scope to Bottom 80% of Tracked Functions #996

Closed
wants to merge 2 commits into from

Conversation

TeachMeTW
Copy link
Contributor

Summary

Focuses on reducing the scope of timing measurements to only the bottom 80% of tracked functions. The changes remove measurement for functions contributing less significantly to overall execution time.

Changes

  • Functions removed from timing measurement (bottom 80%):

    • TRIP_SEGMENTATION/create_dist_filter
    • TRIP_SEGMENTATION/create_time_filter
    • TRIP_SEGMENTATION/get_data_df
    • TRIP_SEGMENTATION/get_filters_in_df
    • TRIP_SEGMENTATION/get_time_range_for_segmentation
    • TRIP_SEGMENTATION/get_time_series
    • TRIP_SEGMENTATION/handle_out_of_order_points
    • TRIP_SEGMENTATION/segment_into_trips_dist/check_transitions_post_loop
    • TRIP_SEGMENTATION/segment_into_trips_dist/continue_just_ended
    • TRIP_SEGMENTATION/segment_into_trips_dist/get_transition_df
    • TRIP_SEGMENTATION/segment_into_trips_dist/mark_valid
    • TRIP_SEGMENTATION/segment_into_trips_dist/post_loop
    • TRIP_SEGMENTATION/segment_into_trips_dist/set_new_trip_start_point
  • Retained function timings for:

    • ACCURACY_FILTERING
    • CLEAN_RESAMPLING
    • CREATE_COMPOSITE_OBJECTS
    • CREATE_CONFIRMED_OBJECTS
    • EXPECTATION_POPULATION
    • JUMP_SMOOTHING
    • LABEL_INFERENCE
    • MODE_INFERENCE
    • STORE_USER_STATS
    • USER_INPUT_MATCH_INCOMING
    • TRIP_SEGMENTATION/segment_into_trips_dist/get_filtered_points_df

Context

Currently, only the dist_filter function was triggered in the staging dataset. I'll test locally to determine if the time_filter function can be triggered in additional scenarios.

The focus of this PR is exclusively on functions in the bottom 80% of tracked execution times. An exploration of the top 20% will follow in a subsequent PR.

Testing Plan

  • Conduct local testing to confirm:
    • Functionality is unaffected by the removal of timing measurements.
    • Triggers for time_filter and other functions behave as expected in local and staging environments.

Next Steps

  • Monitor performance in staging with reduced scope.
  • Prepare a follow-up PR to address timing measurements for the top 20% of tracked functions.

- Removed timing for less significant functions in TRIP_SEGMENTATION pipeline
- Focused retained measurements on key contributors
- Prepare for local testing to validate `create_time_filter` triggering
- Will explore top 20% timing optimizations in a follow-up commit
- Removed tracking for additional functions identified as low-impact during iOS and Android local testing.
- Retained tracking for key contributors to overall execution time.
- Suggested broader retesting on staging, production, or different datasets to validate changes.
@TeachMeTW
Copy link
Contributor Author

Follow-up: Refine Timing Measurement for Additional Functions

Summary

Performed additional local testing with both iOS and Android users to refine timing measurement further. Identified and removed more unneeded tracking functions. Suggest retesting on staging, production, or a different dataset to validate the changes more broadly.

Changes

  • Functions removed from timing measurement:

    • TRIP_SEGMENTATION/segment_into_trips_dist/continue_just_ended
    • TRIP_SEGMENTATION/segment_into_trips_dist/get_last_trip_end_point
    • TRIP_SEGMENTATION/segment_into_trips_dist/handle_trip_end
    • TRIP_SEGMENTATION/segment_into_trips_time/filter_bogus_points
    • TRIP_SEGMENTATION/segment_into_trips_time/get_filtered_points_pre_ts_diff_df
    • TRIP_SEGMENTATION/segment_into_trips_time/get_transition_df
    • TRIP_SEGMENTATION/segment_into_trips_time/post_loop
  • Retained function timings for:

    • ACCURACY_FILTERING
    • CREATE_COMPOSITE_OBJECTS
    • CREATE_CONFIRMED_OBJECTS
    • EXPECTATION_POPULATION
    • JUMP_SMOOTHING
    • LABEL_INFERENCE
    • SECTION_SEGMENTATION
    • STORE_USER_STATS
    • TRIP_SEGMENTATION/create_places_and_trips
    • TRIP_SEGMENTATION/segment_into_trips_dist/get_filtered_points_df
    • TRIP_SEGMENTATION/segment_into_trips_dist/has_trip_ended
    • TRIP_SEGMENTATION/segment_into_trips_dist/loop
    • TRIP_SEGMENTATION/segment_into_trips_time/calculations_per_iteration
    • USERCACHE
    • USER_INPUT_MATCH_INCOMING

Context

  • Local tests with iOS and Android users confirmed several low-significance functions that no longer require tracking.
  • Suggested retesting on staging, production, or with a different dataset to ensure these changes generalize across environments and data.

Next Steps

  • Deploy to staging or production for broader testing.
  • Collect additional feedback and refine tracking scope as needed.
  • Continue optimization for top contributing functions in subsequent iterations.

@TeachMeTW
Copy link
Contributor Author

Data Name Data Reading
TRIP_SEGMENTATION 72.293984
TRIP_SEGMENTATION/segment_into_trips 60.819502
TRIP_SEGMENTATION/segment_into_trips_time/loop 51.063456
MODE_INFERENCE 47.423461
TRIP_SEGMENTATION/segment_into_trips_time/has_trip_ended 24.049044
CLEAN_RESAMPLING 21.492011
TRIP_SEGMENTATION/segment_into_trips_time/calculations_per_iteration 16.452914
SECTION_SEGMENTATION 11.086053
CREATE_CONFIRMED_OBJECTS 8.868512
TRIP_SEGMENTATION/segment_into_trips_dist/loop 8.255366
TRIP_SEGMENTATION/create_places_and_trips 5.120428
TRIP_SEGMENTATION/segment_into_trips_dist/has_trip_ended 4.809482
JUMP_SMOOTHING 4.493899
CREATE_COMPOSITE_OBJECTS 2.728828
USER_INPUT_MATCH_INCOMING 0.750176
TRIP_SEGMENTATION/segment_into_trips_time/get_transition_df 0.713448
USERCACHE 0.635678
TRIP_SEGMENTATION/segment_into_trips_time/get_filtered_points_pre_ts_diff_df 0.444665
LABEL_INFERENCE 0.226110
TRIP_SEGMENTATION/segment_into_trips_dist/get_filtered_points_df 0.217923
EXPECTATION_POPULATION 0.081470
ACCURACY_FILTERING 0.010735
TRIP_SEGMENTATION/segment_into_trips_dist/get_last_trip_end_point 0.014056
TRIP_SEGMENTATION/segment_into_trips_dist/handle_trip_end 0.029570
STORE_USER_STATS 0.005631
TRIP_SEGMENTATION/segment_into_trips_dist/continue_just_ended 0.004914
TRIP_SEGMENTATION/segment_into_trips_time/filter_bogus_points 0.000875
TRIP_SEGMENTATION/segment_into_trips_time/post_loop 0.000045

Insights:

  1. Loops Dominate Time Usage:

    • The entries related to loops (segment_into_trips_time/loop and segment_into_trips_dist/loop) have some of the highest readings.
    • This is expected as loops perform repeated operations, and smaller time increments compound into significant overall time.
  2. High Time Usage Outside Loops:

    • segment_into_trips_time/has_trip_ended is a notable contributor to time usage despite operating outside the loop.
    • It may need instrumentation to analyze and optimize its operations as it takes significant time for a non-loop operation.
  3. Opportunities for Optimization:

    • Loops could be optimized further by reducing unnecessary operations or improving data structures to minimize iteration overhead.
    • Instrumentation for has_trip_ended might reveal redundant calculations or inefficiencies.
  4. Smaller Contributors:

    • While smaller contributors like get_filtered_points_df and get_transition_df take less time, their cumulative impact should be reviewed in the broader context of system efficiency. Their values differ based on the dataset.

@TeachMeTW TeachMeTW closed this Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant