Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/links dict #443

Open
wants to merge 15 commits into
base: master
Choose a base branch
from
Open

Fix/links dict #443

wants to merge 15 commits into from

Conversation

fgrunewald
Copy link
Member

This PR implements a temporary interactions dict instead of using the replace_interaction function in DoLinks. The benefit is that do-links does not iterate over the complete interactions list every time an interaction is added. In addition it now automatically takes care of symmetric interactions. I also refactored the code a little bit to make it a little more readable and added doc-strings.

The cysteine problem is fixed in this PR as well. In the end the most simple fix is to allow the oder match to be true whenever a '*' wildcard is involved irrespective of the reisds. That might seem counterintuitive, however, the graph matching basically makes sure that we never find a node connected to itself, which is the only thing this check checks. I've tested in on the relent PDB as well and it produces the correct number of disulphide bridges. I see how this could be considered dangerous. In that case I think it would be best to use the mol_idx for making sure the nodes are different rather than the chain ID, which is really only used for proteins.

@fgrunewald fgrunewald requested a review from pckroon July 6, 2022 13:49
@fgrunewald
Copy link
Member Author

Based on SARS-COV2 Spike protein it gives about 40% speedup

Copy link
Member

@pckroon pckroon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an impressive speed increase!
I do have few questions/comments. How do you feel about moving this to Molecule instead? And I think there's a few (small) issues with orders of operations. Finally, I don't see why the "scheduled changes" need to be instance variables, rather than local variables.

vermouth/processors/do_links.py Outdated Show resolved Hide resolved
vermouth/processors/do_links.py Show resolved Hide resolved
vermouth/processors/do_links.py Outdated Show resolved Hide resolved
vermouth/processors/do_links.py Outdated Show resolved Hide resolved
vermouth/processors/do_links.py Outdated Show resolved Hide resolved
Comment on lines +373 to +374
for edge in self.current_link.edges:
molecule.add_edge(match[edge[0]], match[edge[1]])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does nothing, since links must be isomorphic to molecule anyway?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess so but wasn't 100% sure.

for link in links:
self.current_link = link
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not pass link as argument to the functions/methods?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mehh I thought it was a clever choice today but I see the point

self.add_edges(molecule, match)

# we remove the nodes scheduled for removal
molecule.remove_nodes_from(self.nodes_to_remove)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add interactions first, then remove nodes. That will be slower, but it could be link 1 adds an interaction on a node that link 2 removes.

vermouth/processors/do_links.py Outdated Show resolved Hide resolved
Comment on lines +404 to +407
self.applied_links = defaultdict(dict)
self.current_match = None
self.current_link = None
self.nodes_to_remove = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a fan. Why not create local dicts, and update those with the results of the methods?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because you've to hand down 4 extra variables each time ... There must be a more pythonic way to clear them though

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed_nodes = self.replace_attributes(molecule, link, match)
nodes_to_remove.extend(removed_nodes)

And update for dicts, etc.

@fgrunewald
Copy link
Member Author

@pckroon How about making interactions a custom dict that essentially emulates defaultdict(list) in interface except when __getitem__ etc. methods get an additional keyword? In general making interactions dict custom would also be a way to implement the custom type based atom-key selection. For example, for angles (A, B, C) would return the same value as (C, B, A) and something like (C, B, A) in interactions_dict would evaluate to true as well as using (A, B, C). However, for other types like VS or exclusions a different evaluator would be used. It's kinda complicated but could make the API quite simple to use and robust. Of course it kinda limits interactions again to GROMACS style, but we could have a class creator method that potentially sets types with special key considerations??

@pckroon
Copy link
Member

pckroon commented Jul 13, 2022

would you be open to doing this in this PR?

Sure, but don't underestimate how much work it's going to be.

I think the cleanest way is not to mess with dict methods, but rather to make an actual Interaction(tuple) (and SymmetricInteraction(Interaction)) class, to be subclassed as needed. Then give those classes the appropriate __eq__ and __hash__ methods. Different interactions can have more intelligent methods, think of dihedrals with versions, or vsites. For the hash, I think the best way would be e.g.

def __hash__(self):
    return hash((self.atoms, reversed(self.atoms))

This means that in that dict I can ask for Bond(1, 2) or Bond(2, 1) and get back the same item.
Example in code:

current_bond = mol.interactions['bonds'][Bond(1, 2)]

Thinking about this I don't really like this yet. Currently you do mol.interactions['bonds'][(1, 2)], and then get back an Interaction object. As I propose you'd use an Interaction object as key, which begs the question what the value should be. (InteractionParameters?)
Implementation-wise I do think this is cleaner and easier to maintain then subclassing (default)dict with an intelligent __getitem__. Subclassing dict results in a nicer API though. How would this dict decide which comparison to use?

What's your take?

As for maintaining compatibility with the currently existing libraries I suggest turning Molecule.interactions into a property, which simply yields from the appropriate things.
For future compat with interaction types we don't know about (yet), you could either make thin subclasses of Interaction, or just use Interaction itself.

@fgrunewald
Copy link
Member Author

fgrunewald commented Jul 14, 2022

@pckroon I think the problem with the organisation of interactions is the following: We don't actually know, how they will be most useful outside of the martinize2 context, which is in fact very limited. We only add or remove interactions, but don't actually do anything else with them. In lieu of having to finish the paper, I propose we stick with the local dict and implement a special comparison function that handles the edge cases for now. Not so pretty but pretty safe ;)

We have some other project related to polyply development and other vermouth based applications that actually do operations on the interactions, select and filter them. In #329 I've raised this issue before to some extend even though I wasn't sure what would be the best thing to do. I think we can move the discussion in this PR to a issue and take the time to thoroughly think this through.

Good ideas in my opinion are:

  • making interactions a property of molecule
  • having them collected in a way that is hashable by the atom-keys/indices
  • making the InteractionsTuple an actual class with potential subclasses

@pckroon
Copy link
Member

pckroon commented Jul 14, 2022

I propose we stick with the local dict and implement a special comparison function that handles the edge cases for now.

Agreed.

If you could collect required/desired features from a polyply point of view that would be great. In the meantime I'll reopen #329 with some more ideas I had on the bike

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants