-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of Parallelization to analysis.gnm #4700
Conversation
Added GNM
Added GNM client into tests
Added Change to Changelog
added name to changelog
Hello @talagayev! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
Comment last updated at 2024-09-09 17:42:45 UTC |
Linter Bot Results:Hi @talagayev! Thanks for making this PR. We linted your code and found the following: Some issues were found with the formatting of your code.
Please have a look at the Please note: The |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #4700 +/- ##
===========================================
- Coverage 93.87% 93.84% -0.04%
===========================================
Files 173 185 +12
Lines 21422 22494 +1072
Branches 3979 3980 +1
===========================================
+ Hits 20110 21109 +999
- Misses 858 931 +73
Partials 454 454 ☔ View full report in Codecov by Sentry. |
overall the tests all work for the exception of with the error being: |
@orbeckst would you be able to review here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Azure has timeout errors in multiprocessing
=================================== FAILURES ===================================
_ Test_Backends.test_all_backends_give_correct_results[square-iterable1-answer1] _
...
self = <multiprocessing.popen_fork.Popen object at 0x7efe2d29ba30>, flag = 0
def poll(self, flag=os.WNOHANG):
if self.returncode is None:
try:
> pid, sts = os.waitpid(self.pid, flag)
E Failed: Timeout >200.0s
I am going to assume that this is related to #4209 so I don't think that this PR can do anything about it.
All other tests pass for me. I didn't see a failure for test_gnm_SVD_fail
.
Changes
For actual changes:
- add a versionchanged to the GNMAnalysis class to state that parallelization is now enabled
- change the entry in CHANGELOG
Comment on test duration increase
Note: The GNM tests are some of the slowest ones that are running:
============================= slowest 50 durations =============================
9.90s call MDAnalysisTests/analysis/test_gnm.py::test_gnm[client_GNMAnalysis0]
8.92s call MDAnalysisTests/analysis/test_base.py::test_AnalysisFromFunction[client_AnalysisFromFunction1-None-50-2-25]
8.81s call MDAnalysisTests/analysis/test_base.py::test_AnalysisFromFunction[client_AnalysisFromFunction1-20-50-2-15]
8.79s call MDAnalysisTests/analysis/test_base.py::test_AnalysisFromFunction[client_AnalysisFromFunction1-20-50-None-30]
8.67s call MDAnalysisTests/analysis/test_base.py::test_AnalysisFromFunction[client_AnalysisFromFunction1-None-None-2-49]
8.57s call MDAnalysisTests/analysis/test_gnm.py::test_gnm[client_GNMAnalysis1]
8.24s call MDAnalysisTests/coordinates/test_pdb.py::test_conect_bonds_all
8.08s call MDAnalysisTests/analysis/test_contacts.py::TestContacts::test_n_initial_contacts[datafiles1-41814]
8.06s call MDAnalysisTests/analysis/test_gnm.py::test_closeContactGNMAnalysis[client_GNMAnalysis1]
7.67s call MDAnalysisTests/analysis/test_gnm.py::test_closeContactGNMAnalysis_weights_None[client_GNMAnalysis1]
7.36s call MDAnalysisTests/analysis/test_encore.py::TestEncore::test_ces_error_estimation_ensemble_bootstrap
7.11s call MDAnalysisTests/analysis/test_gnm.py::test_closeContactGNMAnalysis[client_GNMAnalysis0]
6.99s call MDAnalysisTests/analysis/test_gnm.py::test_closeContactGNMAnalysis_weights_None[client_GNMAnalysis0]
6.93s call MDAnalysisTests/utils/test_pickleio.py::test_fileio_pickle
6.14s call MDAnalysisTests/coordinates/test_gro.py::TestGROLargeWriter::test_writer_large
5.70s call MDAnalysisTests/analysis/test_gnm.py::test_closeContactGNMAnalysis[client_GNMAnalysis2]
5.64s call MDAnalysisTests/analysis/test_gnm.py::test_gnm_run_step[client_GNMAnalysis1]
Previously (from PR #4682), GNM was slow but not the slowest.
============================= slowest 50 durations =============================
9.49s call MDAnalysisTests/analysis/test_base.py::test_AnalysisFromFunction[client_AnalysisFromFunction1-20-50-None-30]
9.02s call MDAnalysisTests/analysis/test_base.py::test_AnalysisFromFunction[client_AnalysisFromFunction1-20-50-2-15]
8.90s call MDAnalysisTests/analysis/test_base.py::test_AnalysisFromFunction[client_AnalysisFromFunction1-None-None-2-49]
8.63s call MDAnalysisTests/coordinates/test_pdb.py::test_conect_bonds_all
8.58s call MDAnalysisTests/analysis/test_base.py::test_AnalysisFromFunction[client_AnalysisFromFunction1-None-50-2-25]
8.11s call MDAnalysisTests/analysis/test_contacts.py::TestContacts::test_n_initial_contacts[datafiles1-41814]
7.41s call MDAnalysisTests/analysis/test_encore.py::TestEncore::test_ces_error_estimation_ensemble_bootstrap
6.97s call MDAnalysisTests/utils/test_pickleio.py::test_fileio_pickle
6.26s call MDAnalysisTests/coordinates/test_gro.py::TestGROLargeWriter::test_writer_large
5.79s call MDAnalysisTests/analysis/test_rms.py::TestRMSD::test_weights_mass_is_mass_weighted[client_RMSD1]
5.73s call MDAnalysisTests/analysis/test_rms.py::TestRMSD::test_rmsd_atomgroup_selections[client_RMSD1]
5.60s call MDAnalysisTests/analysis/test_gnm.py::test_closeContactGNMAnalysis_weights_None
5.44s call MDAnalysisTests/analysis/test_gnm.py::test_closeContactGNMAnalysis
It looks as if the tests in parallel take longer than the serial tests — possibly setup and then competition with the parallel runners.
Yes for this PR I didn't add the |
As for the other classes, since I tried also adding the parallelization implementation to them, is there a certain way to say when they are unparallelizable? since I am quite hesitant to mark them as unparallelizable, since maybe I could overlook an option to implement the parallelization with some adjustments. |
moved the parallelization below Issue MDAnalysis#4158
Addiditon of versionchanged with the mention of the support for multiprocessing and dask backends
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am ok with not testing parallelization in test_gnm_SVD_fail
.
Can you please update the versionchanged (see my suggestion)?
Otherwise looking ready to me.
Co-authored-by: Oliver Beckstein <[email protected]>
Nice, then we can leave it like this for the tests :) Yes adjusted it, yes was just copy-pasting from the RMSD class versionchanged :) |
If you're unsure, start a discussion on the issue. There are plenty of people around who can look at an algorithm and given an opinion. Perhaps sometimes we come up with an idea how to do it. For example, RMSF is in principle parallelizable with split-apply-combine, one just has to do some extra work (see https://www.mdanalysis.org/pmda/api/rmsf.html ) but at the moment only serial is enable mdanalysis/package/MDAnalysis/analysis/rms.py Lines 756 to 757 in 7618e05
|
Ah I see, that sounds good, since I was sure for the implementation for this class and in so that it has the following case which is written in the
Thus I would tend to mark it as unparallelizable and create a PR for it :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks ready to me.
The Azure failure is a multiprocessing timeout failure and likely not due to this PR.
A concern is the increased run time of the tests but that's a general thing with the parallel tests, I think.
Thanks @talagayev ! |
Happy to help :) |
* Fixes MDAnalysis#4672 * Changes made in this Pull Request: - Parallelization of the class GNMAnalysis in analysis.gnm.py - Addition of parallelization tests (including fixtures in analysis/conftest.py) - update of CHANGELOG --------- Co-authored-by: Oliver Beckstein <[email protected]>
Fixes #4672
Changes made in this Pull Request:
GNMAnalysis
inanalysis.gnm.py
CHANGELOG
PR Checklist
Developers certificate of origin
📚 Documentation preview 📚: https://mdanalysis--4700.org.readthedocs.build/en/4700/