Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: lazy taskgraph generation, multifills for dask-boost-histograms #125

Closed
wants to merge 2 commits into from

Conversation

lgray
Copy link
Collaborator

@lgray lgray commented Feb 9, 2024

This is mostly just logging for posterity, since it shows there is at least one solution to the issue. I'll get the problematic code to @martindurant as well so that we can properly characterize it.

So far:

  • build task graphs for dask-boost-histograms only when asked for, caching result (delicate!)
  • prototype of tuples of arguments into dask-boost-histogram fills that allow multiple fills to happen in a single staged layer

This appears to have some nice scaling benefits, but we are figuring out why.

Largely posting this PR to demonstrate what solves memory and task-graph problems when approaching ~O(50k) fills.
Not a real solution yet.

@lgray lgray marked this pull request as draft February 9, 2024 23:26
@lgray lgray force-pushed the lazy_hist_taskgraph branch from b83222b to e9eaa54 Compare February 9, 2024 23:28
@lgray
Copy link
Collaborator Author

lgray commented Feb 9, 2024

Example of multi-fill syntax:

axes_fill_info_dict = {
    dense_axis_name : dense_variables_array_with_cuts["lep_chan_lst"][sr_cat][dense_axis_name],
    "weight"        : tuple(masked_weights),
    "process"       : histAxisName,
    "category"      : sr_cat,
    "systematic"    : tuple(wgt_var_lst),
}
hout[dense_axis_name].fill(**axes_fill_info_dict)

Here showing a fill where we pass multiple weights corresponding to systematic variations.
This takes a taskgraph that was ending with 6GB memory usage (per dataset) and brings it to O(1GB), similarly the time to build the task graph is significantly reduced. ~1600 fill calls down from ~41k, many fewer layers, etc.

@lgray
Copy link
Collaborator Author

lgray commented Feb 13, 2024

Multifill moved to #126, which supersedes this PR

@lgray lgray closed this Feb 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant