-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: use fast uproot.num_entries
method instead of TTree.num_entries
#1197
base: master
Are you sure you want to change the base?
Conversation
@pfackeldey I was wondering because I find preprocessing slow (many minutes for large datasets), did you try it and have seen speed improvement? |
@ikrommyd I have not measured it, because I don't have an analysis setup to do so. If you can, I'd be happy to see the difference! |
uproot.num_entries
method instead of TTree.num_entries
uproot.num_entries
method instead of TTree.num_entries
So the reason I used TTree rather than Since we're already opening the file and deserializing the TTree metadata to get the base form and greatest common basket offsets anyway on line 69 of preprocess. IIUC, this also deserializes the I'd be happy to see a benchmark that shows this change is faster though! |
That being said - if you want to make a version of this that's optimized for when all you're doing is just getting the number of entries (i.e. |
Hi @lgray, |
I'll leave it open if you want to try to the one optimized pathway I mentioned. If you don't then I'll close it. |
uproot.num_entries
is a fast path to get the number of events. It should always be faster thanTTree.num_entries
. This may become noticeable for preprocessing very large filesets.The nanoevents factory should use
uproot.num_entries
aswell (at some point), in this PR I just removed the overhead of calculating it twice.