Post translational modifications example/tutorial #18
Replies: 16 comments
-
Hi, Automated handling of modified amino acids is one of the major items on our roadmap, however it's probably a few months out at least. This is a deceptively difficult area. We will likely take the approach of using the bespoke workflow to re-parameterize the entire AA+PTM molecule, and then merge it into an unmodified protein backbone. However, this will require
In the shorter term, it may be possible to do this by manually (or in a semi-automated fashion). I'd think that the trickiest part of doing this by hand would be figuring out how to patch in the bonds, angles, and torsions at the interface of the OpenFF ligand and AMBER protein with reasonable values. An first pass at an automated solution to this might
I don't think I'll have time to implement the above algorithm any time soon, but we'd very much appreciate a PR if you're feeling up to the task :-) |
Beta Was this translation helpful? Give feedback.
-
Hi J-wags, Thank you so much for your answer. I am relieved to hear that I wasn't missing something embarrassingly obvious! I am glad to hear that this capability is under active development and really excited to see how it turns out :) With respect to your automated solution, it looks like a great workaround! I think I follow it, except for this step:
Would Molecule.is_isomorphic() be the right tool for this job? This seems like this thread covers the usage? Thank you! |
Beta Was this translation helpful? Give feedback.
-
Also, apologies that the full documentation for |
Beta Was this translation helpful? Give feedback.
-
If you do decide to head this direction, @DNA2RNA , please keep us posted as having a worked example would probably help us get something halfway decent working. As a postdoc, I did something like this with GAFF/AMBER FFs and tleap so I know the general idea works. Here, the interface region should be easier because you basically can just use the small molecule FF for any protein atoms it needs to cover, so you don't have to worry there will be parameters MISSING (though parameter quality won't have been assessed). (Tangentially it's also worth noting that Hideaki Fujitani used to use GAFF for simulations of whole proteins and ligands with somewhat reasonable results, so in that sense, doing something that patches small molecule parameters onto a protein is not necessarily a crazy thing to do.) |
Beta Was this translation helpful? Give feedback.
-
Hi Folks, Thank you so much for your recommendations! I just wanted to write a quick note that I am pecking away at this. So far I have managed to:
It looks like the next steps are to parse the serialized system and then map the amino acid portion onto the amino acid scaffold (unless such a parser already exists?). Thank you again! PS: Just in case it's helpful to anyone, I've copy-pasted the standard AMBER FF and PDB chunk for ASN as well as an OpenFF parameterized equivalent below. Here is the SMILE code for ASN-NAG:
Here is a chunk of PDB for a regular ASN and the corresponding AMBER XML:
I have copied the corresponding PDB and Serialized system for ASN-NAG below:
|
Beta Was this translation helpful? Give feedback.
-
Congrats on the progress you've made!
I'd be a bit wary of starting with PDB -- A SMILES contains more information than a PDB (like bond orders), so something in the toolchain would have to be using a heuristic to do the PDB --> SMILES conversion. An illustrative example is that a carbon bound to an oxygen could either have the SMILES
Ahh, I see. There are two parameter-generation strategies that could be pursued here: A) Treat the entire modified AA as a small molecule, just pulling in backbone/peptide bond parameters from the parent FF B would seem to be more accurate to me, since it uses more of the original protein FF. But achieving either would be an accomplishment. @davidlmobley may have some input on which of these is preferable. He's on vacation this week so we may not hear from him for a while There are also two parameter application strategies that could be pursued:
My initial comment was assuming approach 2, though now I see you might be pursuing approach 1. Either one should work in the long run, and in fact openforcefield/openff-toolkit#676 may have some hints for approach 1.
Yes, I think this is right. I'd strongly recommend using ParmEd for this -- Handling the XML directly will be really hard. Two important questions to answer are:
|
Beta Was this translation helpful? Give feedback.
-
Sorry for the huge delay, but what the normal approach is for modified sidechains, etc., is to chop the sidechain off of the backbone, cap with a methyl, then charge/parameterize the small molecule. Then remove the methyl and graft it back on to the backbone. However, I think this roughly describes both approaches (A) AND (B). When I've confronted this before, I've never had to make the choice between those two approaches (or gotten to make the choice) because my modified sidechain used GAFF types and teh backbone used AMBER protein FF types and I didn't have a way to switch one set of types to the other set of types, even if I wanted to. Here, I suppose, one has more liberty. I think the approach you would canonically want to use is to use the protein parameters (which supposedly are better) for as much as possible, and use the small molecule parameters more sparingly -- at least, I believe that'd be the AMBER philosophy. This would primarily apply to dihedrals. |
Beta Was this translation helpful? Give feedback.
-
Hi Folks, Sorry for the delay! I have been slowly pecking away at this. I have managed to parameterize canonical amino acids as well as post-translationally modified amino acids then used networkx to match the atoms up to one another. Credit goes to these folks: https://github.com/Networks-Learning/nevae/blob/master/nevae_rl/convert_to_nx.py However, I am having trouble with some of the more basic topology editing. Specifically, I am having trouble with the mechanics of deleting atoms/bonds and forming new bonds between custom ligands and openMM amino acids. I got the sense that this requires use of ParmEd and openMM? I had trouble finding any examples. Do any good examples come to mind? In the meantime, I used the approach described here as a temporary measure: openmm/openmm#2731
Thanks! |
Beta Was this translation helpful? Give feedback.
-
This is really cool. I'm not that much of an expert in modifying ParmEd systems, but could try to reproduce your work if you attach the files for your example above (you should be able to drag+drop into the comment text box). Two breadcrumbs that I can provide, though:
I kinda get the sense that OpenMM will be easiest for this task after ParmEd has combined the systems (so basivally in the way that you're showing above). In my experience, ParmEd gets hung up on atom types -- Like, if I try to modify a bond, it sometimes tries to modify all bonds of that type and/or all bonds between atoms of those types, which is really confusing. It becomes even moreso for angles and torsions, which involve even more atom types.
Also, not directly related to your approach, but maybe an easier alternative: We've released our first shot at representing AMBER's ff14SB in offxml format. It can be found here. This opens up a new approach of simply loading both Parsley and ff14SB into the same force field, and letting the toolkit handle everything. However, I'm pretty sure this will fail on protein structures, since it'll need charges for the unnatural AA, and will attempt to run AM1-BCC on the whole molecule. So one would need to at least generate a |
Beta Was this translation helpful? Give feedback.
-
There's probably a way to do it with only OpenMM, but this would be the route I take as somebody more familiar with ParmEd. I can't say for sure which route is better ahead of time. (Something built off of the Unlike the linked OpenMM issue, I think you actually do want to make a new bond on the topology - that is to say, the chemical connectivity graph, without physics - before adding a force in. I think the place to put this would be in between the steps where you combine the ParmEd structures and convert them back into OpenMM world.
This should handle adding a new bond between atoms in the case that you know which atoms you want to make a new bond between, you know what parameters to use for that bond, etc. If you're already in a state in which the non-natural amino acids are parametrized, and a ligand is parametrized, I think this is the last remaining step? As Jeff hints at above, doing too much topology modification with ParmEd gets tricky, but if you're just trying to add a single bond in, I'd be fairly hopeful that could work. https://parmed.github.io/ParmEd/html/api/parmed/parmed.html?highlight=bondtype#parmed.BondType |
Beta Was this translation helpful? Give feedback.
-
Thank you both so much @j-wags, @davidlmobley, and @mattwthompson! The Parmed approach worked!!! This is super exciting! Using that approach, I was able to:
Here is the basic approach I am taking:
After loading up the system comes a bunch of regex and text parsing in order to sift through the ParmEd system (still a work in progress). As stated above, the basic approach is to loop through the ParmEd bonds, regex match the amino acids and ligands we want to link, figure out the indices of those bonds in the ParmEd system, stuff all of that into a dictionary, then loop through that dictionary and apply the bonds and delete the extra hydrogens. Here is the critical piece of code, which is a copy-paste of @mattwthompson example code with a few extra bits tacked on:
From there, everything follows the standard workflow for OpenMM/OpenFF simulations. Et voila! MD sims of spike protein with post translational modifications of arbitrary molecules: Referencing an earlier comment, it looks like the challenge ahead is parsing the system and keeping track of which bonds/atoms to add/delete. The working code requires close supervision. What could be helpful would be something kind of like PDBfixer.py (PTMfixer.py perhaps?). |
Beta Was this translation helpful? Give feedback.
-
Sorry for the slow response -- it's been a crazy few weeks. This is super cool, and I'm glad it's starting to work for you! One question I have is where this combined system is getting the angle and torsion parameters that involve the new bond. It's possible that no such angle and torsion parameters are being assigned. This would cause there to be missing physics at the connection points, which could be visible in the trajectories as unphysical angle bending or twisting around the new bond. It may be hard to spot since this effect would be subtle, especially since sterics will keep anything too-obviously-weird from happening. But this case would be trouble because the underlying energetics would be badly inaccurate (ie. entirely absent). Unfortunately, the solution to this might be somewhat hard. One way or another, it'll require parameterizing a fragment that includes the new bond, and a substantial amount of the chemical environment around it.
This would be really neat. We usually start off new functionality like this in a notebook so that it's easy for lots of folks to tinker with, and package it into a more stable tool as time goes on. |
Beta Was this translation helpful? Give feedback.
-
Hello, I am trying to simulate modified gelatin strands and have run into much of the same problems. Using SMIRNOFF template generator, a .XML file was generated for the parameters of the custom residue based on the non-polymerized amino acid and now the issue is to patch this new forcefield with the existing AMBER forcefield. I was wondering how if it would be possible to get an estimate on the release date of the openff-toolkit release 1.0? Will this include biopolymer forcefield support which can handle parameter generation for non-canonical residues? Thank you in advance. |
Beta Was this translation helpful? Give feedback.
-
Hi @sncr0, The OpenFF Toolkit release with biopolymer support will be version 0.11.0. I'm hoping to have an alpha out in the next few weeks, followed by a release candidate which will be available for at least a month, and then finally the full release. The 0.11.0 toolkit release will provide the machinery to apply OFFXML force fields to biopolymers. In terms of protein force fields, we currently have a port of the AMBER ff14sb to OFFXML format here: https://github.com/openforcefield/amber-ff-porting/releases. In the future (tentatively late 2022 if things go well), Open Force Field will release the Rosemary force field, which will be applicable to both small molecules and biopolymers, and this will provide valence terms for modified amino acids. In the shorter term, you'll be able to load both the |
Beta Was this translation helpful? Give feedback.
-
Hi j-wags, How far along are you with the development of biopolymer support for the OpenFF Toolkit (0.11.0)? We were able to generate library charges for non-canonical amino acids in the meanwhile, but we would still be interested in an example notebook of the recommended workflow as described in your previous post. Is it possible to give an indication when this will be available?
We tried to load in In case anything is unclear or extra input is required, feel free to ask. Thank you in advance for your help. |
Beta Was this translation helpful? Give feedback.
-
I'll let @j-wags handle the question the alpha testing notebook, but
This is the right thing to do - we're actually soon going to have a release of our ff14SB port that does exactly that. |
Beta Was this translation helpful? Give feedback.
-
Hi Folks,
Is your feature request related to a problem? Please describe.
Feature/documentation request
We are running simulations of various SARS-CoV-2 proteins. There are certain simulations in which a covalent bond between a standard amino acid and an OpenFF parameterized ligand might be necessary. Examples include the spike protein glycosylations or covalent inhibitors.
Describe the solution you'd like
Parameterizing either a molecule that gets added onto an amino acid as a post translational modification or parameterizing the entire modified residue is clearly possible. However, in both cases, these molecules need to be covalently coupled to the standard amino acids. A simple example showing how to setup such a simulation with Amber would be very helpful.
These examples were particularly helpful: https://github.com/openforcefield/openforcefield/tree/master/examples/using_smirnoff_with_amber_protein_forcefield
Describe alternatives you've considered
I have considered parameterizing the entire post translationally modified residue, then converting it to a system, then serializing the system to XML, then attempting to manually reformat the ligand system's XML into Amber format protein XML file and saving it as amber14/protein_posttransmod.ff14SB.xml. This seemed rather error prone.
Another option that seems possible (and which might also benefit from a simple or easier to find example) could be plugging OpenFF into OpenMM's residue generator.
Another option could be LibraryChargeHandler?
Beta Was this translation helpful? Give feedback.
All reactions