Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iterator method for interactions dict #329

Open
fgrunewald opened this issue Dec 13, 2020 · 5 comments
Open

iterator method for interactions dict #329

fgrunewald opened this issue Dec 13, 2020 · 5 comments
Labels
easy One banana problem enhancement

Comments

@fgrunewald
Copy link
Member

Currently the interactions dict needs to be iterated over in a two-fold fashion (i.e. first over inter_types the the interactions themselves). API wise I think we would greatly benefit from a iter_over_ineractions method that simply returns inter_type, interaction when looped over. Perhaps it can also be useful for book-keeping if there is a special order this function loops over the dictionary. @pckroon @jbarnoud any thoughts on this?

@fgrunewald fgrunewald added enhancement easy One banana problem labels Dec 13, 2020
@jbarnoud
Copy link
Collaborator

I do not mind the additionnal method if there is a use for it. At first glance, it looks a bit wastefull to create such a very redundant tuple at every iteration, and to deal with the book keeping required to check if the type changed; but that may be a matter of use case. All of that to ask: what is your use case?

@fgrunewald
Copy link
Member Author

Hi the test case is pretty common for what I do. Basically if one wants to use the vermouth molecule to do some computations, you frequently encounter loops over bond, angle etc. interactions. Most of the loops look the following nature:

for inter_type in molecule.interactions:
     for interaction in molecule.interactions[inter_type]:
             DISPATCH_COMPUTATION[inter_type](interaction)

At the same time the information that really matters are the inter_type and what is contained in the interaction tuple. Considering that the molecule has a special iterator (i.e. molecule.atoms) only for looping over node-keys, I thought having something for interactions is also justified. In addition there is the "problem" that the order over which the dict is iterated is not defined. I think we could gain some significant speed-up both in vermouth but also poyply or other projects by defining a normal order over which the interaction dict is looped over (i.e. first bonds, constraints, then angles and higher order interactions). I'd propose the following strategy:

class InteractionDict(dict):

          def __iter__(self):
          self.max_interactions = max_interactions # maybe get this when creating the instances and updating when adding
         # somehow we can make sure here that only types that are in the dict are mentioned
          self.normal_order = ["bonds", "constraints" "angles", ...]

          def __next__(self):
          idx = 0
          for inter_type in self.normal_order:
                for interaction in self[inter_type]:
                     if idx < self.max_interactions:
                        # return something like this; can also be just inter_type and interaction tuple
                        return inter_type, interaction.atoms, interaction.parameters, interaction.meta 

@pckroon
Copy link
Member

pckroon commented Jan 18, 2021

Why did this get closed?

I don't see the immediate advantage of this, but I don't hate it either. Creating a separate class is definitely overkill though.

class Molecule(nx.Graph):
    ...
    def interactions_iter(self):
        order = self.sort_interactions()  # Something like this already exists! See __str__.
        for i_type in order:
            for interaction in self.interactions[i_type]:
                yield i_type, interaction

No need to unpack the interaction, Especially when in the far far future we may turn Interaction into a class capable of evaluating energies (stares into the distance)

@fgrunewald
Copy link
Member Author

I closed this because I didn't really see it go anywhere in near future and it seemed just polluting the issues for now. But I am already using vermouth to evaluate interactions both in polyply but also the mapping program for AA COG that I wrote. But I hacked a new class to overcome the issues here.

@pckroon
Copy link
Member

pckroon commented Jul 14, 2022

Ok, in light of #443 it's time to reopen this.

To summarize the discussion there: we need to be able to index interactions by atom indices in O(1) to quickly figure out which interactions already exist, and which are to be added. This means that the interactions need to be a dict with atom indices as keys. The catch is that some interactions (e.g. bonds) are symmetric, and this dict should somehow be clever enough to take that into account when determining membership.

IMO the best way of doing that is to promote the namedtuple Interaction to an actual class (is this really needed actually? Or can we subclass namedtuple instances already?), and subclass it to create a SymmetricInteraction which has appropriate __eq__ and __hash__ methods (subclass further as appropriate). That way, you have your dict keys. Dicts also need values though. We could split interactions into interaction atoms and parameter classes/objects/..., but that feels icky. It's more natural to talk about an interaction as a whole. We could do something like mol.interactions[Bond(atoms=(3, 5), params=None)] = Bond(atoms=(5, 3), params=['1', '0.2', '100']), but now you're duplicating the atoms attribute, and you as programmer need to make sure everything remains sane. Not ideal either.

Sidenote: "let's use a set?" This doesn't work, since you'll be able to quickly see if a bond between atoms 3 and 5 exists, you won't be able to retrieve the associated parameters.

To recap: we want a dict mapping atom indices in a symmetry-aware way to interactions which also contain/describe the atoms.
I posit that we rarely, if ever, want to do mol.interactions[Bond((3, 5))] = Bond((3, 5), params=...). Instead, we want to do mol.add_interaction(Bond((3, 5), params=...) (or replace, or remove. Add logic for version as appropriate). If mol.interactions is an object with a __getitem__, but without a __setitem__ method we can produce the appropriate key/value pair in an add_interaction method (or we mangle __setitem__ beyond recognition). One of the ideas I had along this line ended up at https://docs.python.org/3/library/weakref.html, where the keys could be actual references to the atom attributes, or vice-versa.

Or, we go for the simple option where key = value. That sounds relatively clean?

To summarize:

  • Create SymmetricInteraction class with appropriate __eq__ and __hash__ methods. Subclass that into Bond, Angle, etc.
  • Molecule._interactions becomes a normal, flat, dict. By convention key == value, both are instances of Interaction.
  • Molecule.interactions can be used to iterate over _interactions in a sane order, and also as a dict to test membership and get interaction parameters.

Example code:

class SymmetricInteraction(Interaction):
    def __hash__(self):
        return hash((self.__class__.__name__, frozenset((self.atoms, tuple(reversed(self.atoms))), self.version))

    def __eq__(self, other):
        return hash(self) == hash(other)
molecule.add_interaction(Bond(3, 5, params=['1', '0.2', '140'])
molecule.add_interaction(Angle(1, 3, 5, params=['1', '45', '25'])

bond_params = molecule.interactions[Bond(5, 3)].params
if Angle(5, 3, 1) in molecule.interactions: ...

for interaction_with_params in molecule.interactions: ...

Advantages:

  • Symmetry rules defined at the level of interactions
  • Clean and fast
  • You only need to provide parameters when adding interactions. For retrieval the atom indices (and version) suffice.

Drawbacks:

  • Interaction equality doesn't take parameters into account. Bond(3, 5, params=['1', '0.2', '140']) == Bond(3, 5, params=['2', '0.25', '140'])
  • Change in API, since interactions is currently a dict[str, list[Interaction]], while I here propose a dict[Interaction, Interaction]. We could keep this intact with a property and using a different name than interactions.
  • __setitem__ of interactions is not hidden away, and it is possible to do molecule._interactions[Bond(3, 5)] = Angle(1, 3, 4, params=...). Don't do that then! If we really want we can sculpt the external API (e.g. mol.interactions) such that this is not possible.

@pckroon pckroon reopened this Jul 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
easy One banana problem enhancement
Projects
None yet
Development

No branches or pull requests

3 participants