-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
General speedup #78
Comments
I just ran the examples/rmg/methlyformate case under the profiler. It ran for ~17 hours on my iMac before crashing (with issue #77). Looking into the first of these: Function called...
ncalls tottime cumtime
model.py:1250(updateUnimolecularReactionNetworks) -> 11 0.000 0.002 __init__.py:1584(warning)
225 0.001 0.056 __init__.py:1594(info)
44 0.000 0.000 __init__.py:1602(debug)
65 0.004 0.004 pdep.py:231(merge)
3289804 40.901 1361.483 pdep.py:352(update)
214354576 41.754 41.754 {isinstance}
43678774188 2486.316 2486.316 {len}
280 0.002 0.002 {method 'format' of 'str' objects}
1172 0.011 0.011 {method 'getEquilibriumConstant' of 'rmgpy.reaction.Reaction' objects}
2344 0.165 0.165 {method 'getRateCoefficient' of 'rmgpy.reaction.Reaction' objects}
282 0.000 0.000 {method 'insert' of 'list' objects}
933 1.681 1.681 {method 'remove' of 'list' objects}
160 0.000 0.000 {sum} the actual updating is done in pdep.py:352(update) which only takes 2% total time (1,361 seconds). The rest of it is presumably the loops to remove reverse reactions from the core (for example, |
I've tried to make updateUnimolecularReactionNetworks() go faster by reducing both the merging of partial pdep networks and the removal of reverse pdep reactions to O(N) instead of O(N^2). It's hard to tell which of these was causing it to be slow - maybe both? - so I tried to address each of them. Either way I've removed many of the calls to len() so hopefully it won't be called 43 billion times any more. I've tested the changes on the 1,3-hexadiene model, and the generated model doesn't appear to have changed. See my new pdep branch if you want to try it. |
Excellent. Look forward to trying it. |
I ran it again on your pdep branch. I suggest we merge that branch to master and look for some more fruit to pick! New profile data is here |
This gives a three-fold speed up in the methylformate test case! See issue #78 for details.
Excellent. Looks like "pdep is slow" may no longer be true. Not bad for ~2 hours of work. The most obvious thing in the current profile is the solver. I'm starting to consider making this a pure Fortran code (very much like dasslAUTO in RMG-Java) and wrapping it with f2py or similar. The thinking is that any Python overhead becomes magnified a lot in the ODE solve because of all the function calls required. An ancillary benefit is that we could build DASSL right into the solver and thereby eliminate the separate PyDAS requirement. One downside is that this would be a step away from using something like Cantera, and could make it harder to add multiple reactor types in the future if that is ever something we want. I don't think the current approach would scale well to multiple reactor types anyway. There's probably more low-hanging fruit in the model enlargement code, but since all the boxes are currently blue it would be hard to see with this example until we make the solver faster. Parallelization is probably not too far away, but since we could in principle do that to RMG-Java as well, it probably makes sense to do a bit more single-process optimization first. |
Perhaps now is a good time to reconside using Cantera, as we did in early versions of RMG-Py? I think Ray has done lots of work on it recently - perhaps ask him what he thinks. Parallelizing the ODE solves should be relatively simple, as each reaction system is independent. Getting a generally robust kinetic model requires many reaction systems, so this would be a decent help. |
I recently ran a Python vs. Java speed comparison for the graph isomorphism algorithm. To do this I ported the current isomorphism algorithm from Python/Cython to Java. The speed test involves 3000 randomly-generated graphs of the same size, with no semantic information. The timing results (fastest of 4 runs in each case):
You can immediately see why I was originally excited about Cython, as the speed gain it affords is tremendous (~50x!) Interestingly, there appears to be a noticeable time penalty of ~25% in maintaining a pure Python implementation. Unfortunately, the Java implementation is consistently ~2x faster than the Cython implementation. Note that I designed the new graph isomorphism algorithm to minimize the most expensive calls into the Python/C API, which means that it is basically tailored to making the Cython version as fast as possible. However, when you look at the generated C code from Cython, there are still a lot of places with significant overhead. For example, the code for iterating over the vertices
expands to (in rough pseudocode)
In general we would want all of these extra checks, but in specific cases such as this we know that they are not necessary. For instance, the above could be simplified greatly by assuming that the list does not change within the for loop, that all items are valid vertices, and that the list items do not get deleted within the for loop. Cython cannot do this for us, unfortunately, so we would probably have to hand-code (or maybe use typed memoryviews?) to make up the difference between Cython and Java. I don't think this would need to be done much, just in the absolutely critical bits of code. A summary of some thoughts:
P.S. For the record, I am running Python 2.7.3, Cython 0.16, and Java 1.7 on a 64-bit machine running Arch Linux. |
Latest profiling data for a ~8 hour run on methylformate in the neighbourhood of aeb076a. It reached:
Updating RMG execution statistics...:
|
Hopefully 44f742b speeds up the chemkin duplicate checking. I can't see how running the simulations can be trivially parallelized without multiplying the memory requirements by the number of processors (currently each simulation needs the entire core and edge), and as we're currently constrained almost as much by memory as by CPU, that's a problem. I have some ideas, but they'll take a little more effort. The Chemkin and HTML outputting is still a little slow, but we're already talking single-digit percentage improvements. |
Previously, every time we wrote a chemkin file (twice per iteration) we checked every reaction in it for to see if it was a duplicate, in a method that scaled with N^2 and ended up taking almost a third of a 7-hour RMG-Py run (See issue #78). This commit changes it so that reactions being added to the core in an enlarge step are checked for duplicate status at that time, and the entire set is checked after adding a seed mechanism or reaction library. This way we should be sure to catch all the duplicates, but only need to check new reactions once at each iteration, instead of *all* reactions twice per iteration. The saveChemkinFile method can now be called without checking for duplicates. Hopefully this speeds things up. If called without the 'checkForDuplicates=False' option, then saveChemkinFile still does the check, so that other utilities (eg. for merging reaction sets) should not be messed up (nor sped up) by this.
NB. profiling tutorial for cython http://docs.cython.org/src/tutorial/profiling_tutorial.html |
Job for Cineole finally completed. This job took 3 days (after going from a restart, so we're cheating a bit with the saved QM thermo files.) Final model enlargement: It's a bit surprising that we're spending quite a bit of time on saving the pickle file (20%). This is troubling- can we make this faster? |
Do we know how fast Java would be for the Cineole job? Based on my experience with JP10, I would expect less than a day, maybe even less than 6 hours. It might be good to run it and see. |
If you can't make things faster, you can try doing them less often. Saving restart is a good example for this type of "optimization". It may be possible to cythonize some of the pickle helper functions, and/or make it save fewer derived properties and recalculate more when unpicking. (Which we do less often). I had a look at Molecule.copy(deep=True) the other day. It seemed pretty well cythonized, so not a lot of low hanging fruit there. May be able to squeeze a little out. Or try to think of ways to avoid needing it. |
This issue is being automatically marked as stale because it has not received any interaction in the last 90 days. Please leave a comment if this is still a relevant issue, otherwise it will automatically be closed in 30 days. |
This "issue" is to collect thoughts and efforts to speed up RMG-Py.
The text was updated successfully, but these errors were encountered: