Import time #245

henryiii · 2020-05-21T01:18:17Z

I've noticed that something that changed recently has caused the import time to skyrocket. (This could even just be in my branch). Running this in Python 3.7 with python -X importtime shows a whopping 10+ seconds devoted to particle.particle.literals!

The text was updated successfully, but these errors were encountered:

henryiii · 2020-05-21T01:34:20Z

Loading the nuclei table takes 10 seconds. @eduardo-rodrigues we need to address this before releasing. You could load the entire numpy library almost 100 times in this timeframe. Particle was designed to load a few hundred particles in a table, not 6,500. (though the speed per object is rather dismal...)

henryiii · 2020-05-21T01:53:57Z

Ran py-spy and the flame graph showed that all the time was in in the search for an existing particle. I've moved the internal cls._table to a set instead of a list and used MyPy to help me fix all the necessary changes. (Particle was always designed to be hashable via PDGID number). It now runs in 0.1 second instead of 10 seconds and worked the first time (after the MyPy assistance).

#245

eduardo-rodrigues · 2020-05-21T08:56:07Z

@henryiii, these timing issues you are seeing is something I had sorted out a while back, see https://github.com/scikit-hep/particle/pull/231/commits and 55486ea specifically. I'm very surprised you are seeing the same kind of issues again. It was all working fine. I have no idea what happened in between the developments. Good anyway that you find another solution! I will look closer at the updates today ...

eduardo-rodrigues · 2020-05-21T08:58:14Z

Just confirming - on my Windows 3.7.7 local installation all goes fast. Hmm …

eduardo-rodrigues · 2020-05-21T09:05:43Z

Loading the nuclei table takes 10 seconds. @eduardo-rodrigues we need to address this before releasing. You could load the entire numpy library almost 100 times in this timeframe. Particle was designed to load a few hundred particles in a table, not 6,500. (though the speed per object is rather dismal...)

The issue with timing was indeed something I saw when adding information on nuclei, hence the discussion at the time since trouble for information that >95% of our users will not care about was a no-go. Indeed there are many isotopes added …

BTW, I'm working on a complete rewrite of the data files, which will be much easier to maintain and rather comprehensive. This relates also to #118 and will be much nicer. News as soon as I find more time to add the new data per categories - quarks, leptons, light mesons, charm baryons, etc.; you get it.

eduardo-rodrigues · 2020-05-21T09:06:31Z

Ran py-spy and the flame graph showed that all the time was in in the search for an existing particle. I've moved the internal cls._table to a set instead of a list and used MyPy to help me fix all the necessary changes. (Particle was always designed to be hashable via PDGID number). It now runs in 0.1 second instead of 10 seconds and worked the first time (after the MyPy assistance).

I have not played with this so if you have some commands to teach me and give it a go, I would be very interested. Thanks a lot!

henryiii · 2020-05-21T12:53:47Z

The problem was that I had to remove the file chaining that was being done, and that removed your speed hack. I might be able to restore the speed boost there too, and get even more! (Greedy cackling). I will check. The set nicely speeds up user code too, while the old hack only helped the original file loading.

eduardo-rodrigues · 2020-05-21T13:05:34Z

Agreed that using set is neat :-).

Yep, we may be able to be even faster ...

henryiii · 2020-05-21T13:42:32Z

Set checking is order 1; I can even drop the initial check for append entirely. (If you have multiple particles with the same PDGID, the last one will always be present). There's no performance difference for turning it off at all.

* Some ignores for backports for typing * Fixing a bunch of unclosed file warnings * Starting on mypy support * Adding MyPy support to PDGID * Full mypy support added to pdgid/functions As a result, anything that supports __int__ can now be used in the functions; this includes Particle directly! * Adding pre-commit changes * Basic types for Particle (with several skips * Fix for extra __main__ being dumped in the wrong place * Fix for Python 2 * Fix for loading from zipfiles * Make typing optional for the ZipApp * Prepare to push ZipApp as well * ZipApp requires Python 3.7+ * ZipApp compression and earlier Python 3 support * Minor optimizations * No need to skip a file that is not present * Change internal table to a set. 100x faster load, faster searches. Fixes #245 * Add GHA badge * Update docs/CHANGELOG.md * Update docs/CHANGELOG.md * Update docs/CHANGELOG.md * Update docs/CHANGELOG.md * Update docs/CHANGELOG.md * Remove old performance addition - not needed with sets! * Mention PDGID works on any SupportsInt Co-authored-by: Eduardo Rodrigues <eduardo.rodrigues@cern.ch>

henryiii added a commit that referenced this issue May 21, 2020

Change internal table to a set. 100x faster load, faster searches. Fixes

c9574e3

#245

henryiii mentioned this issue May 21, 2020

(internal) Typing support #244

Merged

henryiii closed this as completed in #244 May 21, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Import time #245

Import time #245

henryiii commented May 21, 2020

henryiii commented May 21, 2020 •

edited

Loading

henryiii commented May 21, 2020 •

edited

Loading

eduardo-rodrigues commented May 21, 2020

eduardo-rodrigues commented May 21, 2020

eduardo-rodrigues commented May 21, 2020

eduardo-rodrigues commented May 21, 2020

henryiii commented May 21, 2020

eduardo-rodrigues commented May 21, 2020

henryiii commented May 21, 2020

Import time #245

Import time #245

Comments

henryiii commented May 21, 2020

henryiii commented May 21, 2020 • edited Loading

henryiii commented May 21, 2020 • edited Loading

eduardo-rodrigues commented May 21, 2020

eduardo-rodrigues commented May 21, 2020

eduardo-rodrigues commented May 21, 2020

eduardo-rodrigues commented May 21, 2020

henryiii commented May 21, 2020

eduardo-rodrigues commented May 21, 2020

henryiii commented May 21, 2020

henryiii commented May 21, 2020 •

edited

Loading

henryiii commented May 21, 2020 •

edited

Loading