-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: when adding dask_histgram.boost.Histograms delay creation of task graph and use multi-source tree reduction #126
Conversation
@martindurant in any case between this one and #125 we have some things that improve the situation when filling lots of histograms. I have a feeling this + #125 (the multi-fill part) will address most of the issues analysis users are running into. This alteration definite improves the structure of the resulting task graph, but it doesn't mitigate time to construct many thousands of variations. The multi-fill thing in the other draft PR takes care of that rather handily, though, and results in rather snappy performance. Getting the interface reasonable may be a challenge. If we can figure out how to make multiple fills turn into something like the multi-fill interface behind the scenes that would also work quite well. |
Example of multi-fill syntax: axes_fill_info_dict = {
dense_axis_name : dense_variables_array_with_cuts["lep_chan_lst"][sr_cat][dense_axis_name],
"weight" : tuple(masked_weights),
"process" : histAxisName,
"category" : sr_cat,
"systematic" : tuple(wgt_var_lst),
}
hout[dense_axis_name].fill(**axes_fill_info_dict) Here showing a fill where we pass multiple weights corresponding to systematic variations. |
After talking to some users it seems another way to do this that's ~reasonable is to pass a function that specifies exactly the filling we would like to do, and we can pass a list of the arguments to each fill call. |
Of course you can do that, but it's circumventing our public API. Fine if this is really the exception... but dealing with the repeated fill (or repeated with_field) seems like it might be worthwhile. |
Yeah there's a few patterns that come out of this. It is a bit chicken and egg. |
if... the question is, which is easier: dealing with forcing users into doing their own batching, or making our own batching mechanism? I haven't reviewed the code in this PR yet, but the fact that it already exists means that the latter might be the case. Are there other cases of repeatedly applying the same operation that this pattern would apply to? Or other cases where "bury these in a single map_partitions" is the best current advice we can give? |
The other-other choice is to write an optimization layer that fuses the histogram fills after the fact into something reasonable (and thus assembles the batches fills). but then you pay in time calling awkward functions to do all the fills.... I think the pattern to search for is actually straightforward. |
This doesn't help with graph build time or memory use during build. |
We need to dig into why you're not seeing the memory improvements I see with this PR. I think it might be macos. But yeah it won't remove the time-domain part of the problem unless the awkward array broadcast_and_apply time is improved. |
Batching with_field calls should, but not hists |
Lemme give a simple |
@martindurant this one is largely well-focused now. Please give it a try. It looks like with this we can avoid or do a much better job with batching the histogram fills together. |
for more information, see https://pre-commit.ci
@martindurant is there anything we should add to this one. I think it's a clear improvement on what's there and doesn't rock the boat too much. |
It also paves the way for pooling the fills when we generate the task graph (which yet needs study). |
@martindurant here's the real first try at multifill. It works for dask-awkward arrays. Here's a recipe that should work to see the stalling issues and recursion error: |
This is definitely a prototype - but outlines a nice pattern allows the interface to scale a littler further.
This results in significantly simpler graphs for single histograms with lots of fills (i.e. over systematic uncertainty categories).
Now, instead of multiple layers of tree reduces for boost histograms there is just one that aggregates over all
hist-on-block
operations that are generated on each histogram fill. i.e. This can handle a tree reduce over multiple input collections.Memory use is a little bit less. Graph is pleasantly more clean.
before:
after:
It may not look like a big diff with a smaller graph but it becomes very apparent as you increase the number of fills.
This PR also now implements multi-fill syntax.With a more concise implementation this appears to not be necessary.