Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of the Martini3 Go-model #550

Merged
merged 102 commits into from
Dec 14, 2023
Merged

Implementation of the Martini3 Go-model #550

merged 102 commits into from
Dec 14, 2023

Conversation

fgrunewald
Copy link
Member

@fgrunewald fgrunewald commented Oct 8, 2023

work in progress but almost finished;

To Do

  • Write the atomtypes file for the Go Potential
  • Check why MAD and Poma server + Martinize2 yield different Go models
  • Check IDRs are correctly biased
  • Try implement Chris's dihedral fix for pure IDPs
  • align ouput file names with CreateGoVirt script
  • fix the CLI arguments to pass all necessary information for the IDP/IDR functionality
  • write tests
  • write integration tests
  • make gmx topfile writer a thing
  • have the domain argument like for En

@pckroon I've added the missing Go functionality so far that it can now read a contact map, but not do the contact analysis. Because the functionality missing from the Go code concerns interactions written to the atomtypes/nonbondparams directives it is not a processor in the original sense. Hence I've dumped the files in an extra toplevel directory. Does that make sense or should we rather have it as a processor?

@Lp0lp Can you generate a reference contact map using the CreateGoVirt script for lysozyme? The canonical AA structure can be found here. Unfortunately, it doesn't run on Mac (maybe just my Mac). Do you know if the format of the contact map in terms of columns is always the same?

Also this project has a hard deadline October 10.

Copy link
Member

@pckroon pckroon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, great! I'm very happy to have this functionality in vermouth/martinize2.

I do think it should be a Processor (which can live in its own folder, see dssp), the question is what to do with the non-bonded parameter info. I'm tempted to overengineer this, but YAGNI. So maybe just dump them in a format that's convenient for you in either System or Forcefield. I think my preference would be in System (analogous to Molecule.interactions). Add a comment to that to indicate we'll engineer something proper when we need to touch non-bonded parameters and atom types for a second time.

align ouput file names with CreateGoVirt script

This is super optional to me. I'm more than happy to make backward breaking changes wrt that script.

bin/martinize2 Outdated Show resolved Hide resolved
bin/martinize2 Outdated Show resolved Hide resolved
bin/martinize2 Outdated Show resolved Hide resolved
bin/martinize2 Outdated Show resolved Hide resolved
bin/martinize2 Outdated Show resolved Hide resolved
vermouth/rcsu/get_go.py Outdated Show resolved Hide resolved
vermouth/rcsu/get_go.py Outdated Show resolved Hide resolved
vermouth/rcsu/get_go.py Outdated Show resolved Hide resolved
vermouth/rcsu/get_go.py Outdated Show resolved Hide resolved
vermouth/rcsu/get_go.py Outdated Show resolved Hide resolved
@pckroon pckroon added the hacktoberfest-accepted Accepted Hacktoberfest contribution label Oct 9, 2023
@Lp0lp
Copy link
Collaborator

Lp0lp commented Oct 9, 2023

Heyo! I've generated the contact map for lysozyme using the old, new, and locally executable contact map generator. Annoyingly, the format is not mega-consistent across the 3 versions. They have pretty different headers (not a big deal, I think) but more annoying is their substantially different column layouts. I think the create_goVirt script can only use the map from the old webserver currently.

On top of that... besides the different file layout the output is also quite different..... but that is a whole other deal...

I've attached below the maps for you to have a look at:
exec.txt
webservernew.txt
webserverold.txt

EDIT: Also note that this is probably why MAD gives different go-models. Since mad uses the locally executable version of the contact map generator, which besides having a different layout seems to also spit out a different map altogether.

@Lp0lp
Copy link
Collaborator

Lp0lp commented Oct 9, 2023

align ouput file names with CreateGoVirt script

This is super optional to me. I'm more than happy to make backward breaking changes wrt that script.

The file names out of creategovirt are a mess, if we can clean them up now then all the better I think.

@Lp0lp
Copy link
Collaborator

Lp0lp commented Oct 9, 2023

They have pretty different headers (not a big deal, I think) but more annoying is their substantially different column layouts. I think the create_goVirt script can only use the map from the old webserver currently.

I just revised the contact maps again, the executable and old webservers are actually consistent. I was wrongly looking at the first table when it is the second one (line 5454 and onwards) that the script reads. This residue-residue contact table is in the same format as the executable it just has more stuff printed out before it. IMO, we drop support for the output out of the new webserver for now, and I can send them an email later asking them to make it more consistent perhaps.

@fgrunewald
Copy link
Member Author

@pckroon why do we want the atomtypes to be dumped into system or molecules? They can be written to file directly because they are coupled to molecule i.e. two molecules cannot have the same set of atomtypes

@pckroon
Copy link
Member

pckroon commented Oct 11, 2023

Because it fits better into the general workflow imho. It also makes the include statements easier in the top file. I do understand your point though.

@fgrunewald
Copy link
Member Author

@Lp0lp Last but not least I'm refactoring the water biasing a little, such that it can be applied without the Go model. I think that's also the original idea of the IDPs. One question I have is whether you think someone would only want to bias IDRs in a structured protein and NOT the rest of the protein (i.e. applying idp fix only). My idea was a CLI like so:

martinize2 -water-bias -> applies the auto water bias
martinize2 -water-bias-eps -> you can set the epsilon for all sec-struc types including an "idr" key for IDPs/IDRs
martinize2 -id-regions -> define which regions are IDRs

In this scenario, any IDR biasing overwrites the default auto biasing. If we have default setting for the sec-struc biases the only tricky part would be to only apply biases to the IDP regions because that would require setting the specific biases to 0.

Any thoughts?

@pckroon
Copy link
Member

pckroon commented Oct 16, 2023

all sec-struc types including an "idr" key for IDPs/IDRs

What do you mean by this? Would it make sense to allow users to set which sec-struct types to apply the bias to as well (with a sane default)?

@fgrunewald
Copy link
Member Author

fgrunewald commented Oct 16, 2023

like as follows:

-water-bias-eps H:2.1 C:2.3 idr:2.1

This means applying a bias of 2.1 kJ/mol to helix but 2.3 to coil. All intrinsically disordered regions as defined by the region argument are biased with 2.1. Regions always superseded the other assignments.

@pckroon
Copy link
Member

pckroon commented Oct 16, 2023

Alright, that looks cool.

@Lp0lp
Copy link
Collaborator

Lp0lp commented Oct 16, 2023

@Lp0lp Last but not least I'm refactoring the water biasing a little, such that it can be applied without the Go model. I think that's also the original idea of the IDPs.

Perfect! Would be helpful to also be able to combine the water biasing with elastic networks for peptides, etc.

One question I have is whether you think someone would only want to bias IDRs in a structured protein and NOT the rest of the protein (i.e. applying idp fix only). My idea was a CLI like so:

martinize2 -water-bias -> applies the auto water bias
martinize2 -water-bias-eps -> you can set the epsilon for all sec-struc types including an "idr" key for IDPs/IDRs
martinize2 -id-regions -> define which regions are IDRs

In this scenario, any IDR biasing overwrites the default auto biasing. If we have default setting for the sec-struc biases the only tricky part would be to only apply biases to the IDP regions because that would require setting the specific biases to 0.

Could be a possibility for some I guess... If I understood correctly you define 4 assignments (helix, coil, strand and idr) with specific eps' and where idr overrides the other 3. What is there to stop you from setting helix coil and strand to 0 with -water-bias-eps and leaving only the IDR with a eps != 0? That would let users only apply the bias to the IDP region, no? Bit ugly CLI wise and writes a few useless BB-W interactions with a bunch of (harmless, I think) zeros though.

Copy link
Member

@pckroon pckroon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll continue reviewing later today

bin/martinize2 Outdated Show resolved Hide resolved
bin/martinize2 Show resolved Hide resolved
bin/martinize2 Show resolved Hide resolved
pckroon and others added 2 commits December 6, 2023 11:16
Copy link
Member

@pckroon pckroon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went over the code again and still found some things. Most are very small, some are really just nitpicks.
I also noticed some of my old comments didn't get addressed or resolved, could you also go back and have a look at those.

As for the signature of the GoPipeline, leave it as is for now.

vermouth/gmx/topology.py Show resolved Hide resolved
vermouth/gmx/topology.py Show resolved Hide resolved
Comment on lines 141 to 150
# First we write the atomtypes directive
if "atomtypes" in system.gmx_topology_params:
_path = itp_paths.pop()
write_atomtypes(system, _path, C6C12)
include_string += f'\n #include "{_path}"'
# Next we write the nonbond_params directive
if "nonbond_params" in system.gmx_topology_params:
_path = itp_paths.pop()
write_nonbond_params(system, _path, C6C12)
include_string += f'\n #include "{_path}"\n'
Copy link
Member

@pckroon pckroon Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we ever reach a conclusion on this? My original comment mentioned generating the _path s based on moltypes. IIRC the outcome was that we can't do that since things like nbparams are system-wide attributes.

vermouth/processors/water_bias.py Outdated Show resolved Hide resolved
vermouth/processors/water_bias.py Outdated Show resolved Hide resolved
vermouth/tests/test_water_bias.py Outdated Show resolved Hide resolved
vermouth/tests/test_water_bias.py Outdated Show resolved Hide resolved
vermouth/tests/test_water_bias.py Outdated Show resolved Hide resolved
vermouth/tests/test_vs_generation.py Outdated Show resolved Hide resolved
vermouth/tests/test_vs_generation.py Outdated Show resolved Hide resolved
fgrunewald and others added 3 commits December 11, 2023 09:15
changes not relating to tests

Co-authored-by: Peter C Kroon <[email protected]>
Copy link
Member

@pckroon pckroon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

vermouth/gmx/topology.py Show resolved Hide resolved
@fgrunewald fgrunewald merged commit 71710c5 into master Dec 14, 2023
7 of 8 checks passed
@fgrunewald fgrunewald deleted the Go branch December 14, 2023 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hacktoberfest-accepted Accepted Hacktoberfest contribution
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants