Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Research Request - switch dask.delayed to dask.from_map #1299

Closed
2 tasks done
tiffanychu90 opened this issue Nov 21, 2024 · 0 comments · Fixed by #1316
Closed
2 tasks done

Research Request - switch dask.delayed to dask.from_map #1299

tiffanychu90 opened this issue Nov 21, 2024 · 0 comments · Fixed by #1316
Assignees
Labels
research request Issues that serve as a request for research (summary and handoff) tooling Work related to the management of our tooling and shared modules

Comments

@tiffanychu90
Copy link
Member

tiffanychu90 commented Nov 21, 2024

Complete the below when receiving a research request, and continue to add to this issue as you receive additional details and produce deliverables. Be sure to also add the appropriate project-level label to this issue (eg gtfs-rt, DLA).

Research Question

Single sentence description: The concatenation of segment speeds (lots of segments + geometry) over the last 2 years is taking quite awhile to produce year averages. The concatenation relies on dask.delayed, but the docs indicate there's a dask.from_map syntax that could be more desirable to use.

Detailed description:

  • Start with the rt_segment_speeds/scripts/quarter_year_averages.py script and see if these can move to dask.from_map
    • The year averages currently take ~25 min, which feel too long for squashing down a gdf. segment_geometry is present and merged in every single date we have, but this is not necessarily desirable...we should dedupe more efficiently.
  • Take a look at time_series_utils to see if we can switch out the concatenation step and generalize a bit more to take any processed dataframe going into GTFS digest

Data sources

  • Cal-ITP data sources: GTFS analytics pipeline...start with outputs that go into GTFS digest / longer term averages

Deliverables

Utility functions + updated scripts

@tiffanychu90 tiffanychu90 added research request Issues that serve as a request for research (summary and handoff) tooling Work related to the management of our tooling and shared modules labels Nov 21, 2024
@tiffanychu90 tiffanychu90 self-assigned this Nov 21, 2024
@tiffanychu90 tiffanychu90 linked a pull request Dec 3, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
research request Issues that serve as a request for research (summary and handoff) tooling Work related to the management of our tooling and shared modules
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant