-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/links dict #443
base: master
Are you sure you want to change the base?
Fix/links dict #443
Conversation
Based on SARS-COV2 Spike protein it gives about 40% speedup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's an impressive speed increase!
I do have few questions/comments. How do you feel about moving this to Molecule
instead? And I think there's a few (small) issues with orders of operations. Finally, I don't see why the "scheduled changes" need to be instance variables, rather than local variables.
for edge in self.current_link.edges: | ||
molecule.add_edge(match[edge[0]], match[edge[1]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does nothing, since links must be isomorphic to molecule anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess so but wasn't 100% sure.
for link in links: | ||
self.current_link = link |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not pass link
as argument to the functions/methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mehh I thought it was a clever choice today but I see the point
self.add_edges(molecule, match) | ||
|
||
# we remove the nodes scheduled for removal | ||
molecule.remove_nodes_from(self.nodes_to_remove) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add interactions first, then remove nodes. That will be slower, but it could be link 1 adds an interaction on a node that link 2 removes.
self.applied_links = defaultdict(dict) | ||
self.current_match = None | ||
self.current_link = None | ||
self.nodes_to_remove = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a fan. Why not create local dicts, and update those with the results of the methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because you've to hand down 4 extra variables each time ... There must be a more pythonic way to clear them though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed_nodes = self.replace_attributes(molecule, link, match)
nodes_to_remove.extend(removed_nodes)
And update for dicts, etc.
Co-authored-by: Peter C Kroon <[email protected]>
Co-authored-by: Peter C Kroon <[email protected]>
Co-authored-by: Peter C Kroon <[email protected]>
Co-authored-by: Peter C Kroon <[email protected]>
Co-authored-by: Peter C Kroon <[email protected]>
Co-authored-by: Peter C Kroon <[email protected]>
Co-authored-by: Peter C Kroon <[email protected]>
Co-authored-by: Peter C Kroon <[email protected]>
@pckroon How about making interactions a custom dict that essentially emulates defaultdict(list) in interface except when |
Sure, but don't underestimate how much work it's going to be. I think the cleanest way is not to mess with dict methods, but rather to make an actual def __hash__(self):
return hash((self.atoms, reversed(self.atoms)) This means that in that dict I can ask for current_bond = mol.interactions['bonds'][Bond(1, 2)] Thinking about this I don't really like this yet. Currently you do What's your take? As for maintaining compatibility with the currently existing libraries I suggest turning |
@pckroon I think the problem with the organisation of interactions is the following: We don't actually know, how they will be most useful outside of the martinize2 context, which is in fact very limited. We only add or remove interactions, but don't actually do anything else with them. In lieu of having to finish the paper, I propose we stick with the local dict and implement a special comparison function that handles the edge cases for now. Not so pretty but pretty safe ;) We have some other project related to polyply development and other vermouth based applications that actually do operations on the interactions, select and filter them. In #329 I've raised this issue before to some extend even though I wasn't sure what would be the best thing to do. I think we can move the discussion in this PR to a issue and take the time to thoroughly think this through. Good ideas in my opinion are:
|
Agreed. If you could collect required/desired features from a polyply point of view that would be great. In the meantime I'll reopen #329 with some more ideas I had on the bike |
This PR implements a temporary interactions dict instead of using the replace_interaction function in DoLinks. The benefit is that do-links does not iterate over the complete interactions list every time an interaction is added. In addition it now automatically takes care of symmetric interactions. I also refactored the code a little bit to make it a little more readable and added doc-strings.
The cysteine problem is fixed in this PR as well. In the end the most simple fix is to allow the oder match to be true whenever a '*' wildcard is involved irrespective of the reisds. That might seem counterintuitive, however, the graph matching basically makes sure that we never find a node connected to itself, which is the only thing this check checks. I've tested in on the relent PDB as well and it produces the correct number of disulphide bridges. I see how this could be considered dangerous. In that case I think it would be best to use the mol_idx for making sure the nodes are different rather than the chain ID, which is really only used for proteins.