Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add export to RDKIT to allow for chemdraw style visualization. #1137

Merged
merged 23 commits into from
Oct 20, 2023

Conversation

chrisiacovella
Copy link
Contributor

@chrisiacovella chrisiacovella commented Jul 31, 2023

PR Summary:

This adds in a function to export to an RDKIT RWMol (See issue #1138). This will allow a user to display a chemdraw style version of an atomistic compound. To enable structures to be drawn with correct bond order, we can now store the bond order in the bond graph as a data attribute. If you initialize a compound with RDKIt (i.e., from smiles) it will set the bond order into the bond list. the add_bond routine now accepts bond order as an argument, so one could manually define a molecule with correct bond order as well. If bond order is not defined, the data will be set to 'default' (which to_rdkit will assume means bond order 1). I didn't want to set this to 1 so as to make it clear to a user that this value was not actually set during construction.

I've added in tests to check the conversion and ensure that bond order is properly propagated when, e.g., we clone a compound. The default behavior is not changed (e.g., calling bonds() will just return the pair unless return_bond_order is set to true), so this should not break any existing functionality.

PR Checklist


  • Includes appropriate unit test(s)
  • Appropriate docstring(s) are added/updated
  • Code is (approximately) PEP8 compliant
  • Issue(s) raised/addressed?

@chrisiacovella chrisiacovella changed the title Rdkit vis Add export to RDKIT to allow for chemdraw style visualization. Jul 31, 2023
mbuild/compound.py Fixed Show fixed Hide fixed
@codecov
Copy link

codecov bot commented Jul 31, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (7142f41) 87.13% compared to head (0dd8fa6) 87.19%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1137      +/-   ##
==========================================
+ Coverage   87.13%   87.19%   +0.06%     
==========================================
  Files          62       62              
  Lines        6457     6490      +33     
==========================================
+ Hits         5626     5659      +33     
  Misses        831      831              
Files Coverage Δ
mbuild/compound.py 97.13% <100.00%> (+0.01%) ⬆️
mbuild/conversion.py 89.80% <100.00%> (+0.39%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@chrisiacovella chrisiacovella added WIP Work in Progress, not ready for merging enhancement labels Jul 31, 2023
@chrisiacovella
Copy link
Contributor Author

I was just thinking that since force overlap can create a bond, I need to update that function to allow bond order to be set when calling force overlap to create a bond. Again, mBuild won't necessarily do anything with bond order at this point, but I think adding the ability to set this consistently.

Since this isn't very high priority, I think adding in a simple "infer_bond_order" could be useful; I tried to use the function in rdkit that supposedly does this, but ran into an error, so I still need to look more closely into that.

I haven't yet tried to convert a non-atomistic particle to rdkit. I feel as this is doable, since we will explicitly set the bonds (i.e., these aren't a function of atomic species or something) and in the function I'm overwriting the names anyway to include an index. Basically just need to add some logic to know what to do if element isn't set and the name can't be parsed into an element.

Copy link
Member

@daico007 daico007 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Only some minor comments about type check and where the conversion method itself should stay.

Comment on lines 1270 to 1274
if bond_order is None:
bond_order = "default"
self.root.bond_graph.add_edge(
particle_pair[0], particle_pair[1], bo=bond_order
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we need a type check here to make sure the bond_order provided is of the right type (not bogus string)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think hard checking these would be useful. The options should be:
"default", "single", "double", "triple", "aromatic", "unspecified". It seems the current options are:
"default", 1.0, 2.0, 3.0, 1.5, 0.0.
I think being cleaner with using all strings would be nice, since we wouldn't have to deal with odd behavior for floats vs ints.

Also, these options should be all listed in the doc strings to_rdkit(), regardless of what they are.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a check for bond_order using the types you suggested.

mbuild/compound.py Outdated Show resolved Hide resolved
mbuild/compound.py Show resolved Hide resolved
mbuild/compound.py Outdated Show resolved Hide resolved
3.0: Chem.BondType.TRIPLE,
1.5: Chem.BondType.AROMATIC,
0.0: Chem.BondType.UNSPECIFIED,
"default": Chem.BondType.SINGLE,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should raise a single warning here that the "default" bond with be converted to single bond in rdkit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, it's not clear to me what the use of having "default" is with the changes discussed of how to specify bond order types. Do we need to have "default" as an option here, or in add_bond if the default is always a single bond?

@daico007
Copy link
Member

daico007 commented Aug 1, 2023

I think we can just have the rdkit for atomistic system for now (initial check for particle element as you have now should be sufficient). The conversion to CG systems could follow later. I think the force_overlap will call the add_bond method itself, so for now it will be set as "default", but having the option to set the bond order when calling force_overlap would be nice (and not too hard to implement I believe).

@@ -1749,6 +1757,59 @@
return pybelmol


def to_rdkit(compound):
"""Create an RDKit RWMol from an mBuild Compound."""
rdkit = import_("rdkit")

Check notice

Code scanning / CodeQL

Unused local variable

Variable rdkit is not used.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc strings can probably get added here as well. Should add that we're returning Chem.RWmol, and maybe some examples of useful things to do with the mol and references to rdkit.

Possible examples.

from rdkit.Chem import Draw
img = Draw.MolToImage(temp_mol) # draw a 2D structure
Chem.MolToSmiles(temp_mol) # get smiles string
Chem.RemoveHs(temp_mol) # remove Hydrogens
# list of possible rdkit writers
# link to other working with molecules info (https://www.rdkit.org/docs/GettingStartedInPython.html#working-with-molecules)

Copy link
Contributor

@chrisjonesBSU chrisjonesBSU Oct 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I only added the draw example and the link since mBuild has functionality to remove particles and convert a compound to smiles.

@chrisjonesBSU
Copy link
Contributor

@chrisiacovella I'm taking over this PR :)

I've addressed some of the comments from @daico007 and @CalCraven. I haven't looked closely yet at the unit tests, but I can work on adding a few more (and making sure the changes made aren't breaking the ones you added)

@chrisiacovella
Copy link
Contributor Author

Thanks @chrisjonesBSU I'll take a look at this. Not going to lie...I thought we merged this a while ago!

@chrisjonesBSU
Copy link
Contributor

Ok, I think this should be ready to go. I added a unit test for the missing lines in to_rdkit that checked for particle elements and threw an error if none are found. It seems like converting to rdkit should only be done for atomistic systems, so I added a note about that in to_rdkit in Compound

@chrisjonesBSU chrisjonesBSU removed the WIP Work in Progress, not ready for merging label Oct 19, 2023
@chrisjonesBSU chrisjonesBSU linked an issue Oct 19, 2023 that may be closed by this pull request
@chrisiacovella
Copy link
Contributor Author

Thanks for finishing this off. Everything looks good. I think we are ready to merge.

Copy link
Contributor

@CalCraven CalCraven left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Chris, this looks done to me as well.

@daico007 daico007 merged commit c426c3a into mosdef-hub:main Oct 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add to_rdkit functionality
4 participants