Skip to content

Commit

Permalink
Merge pull request #635 from marrink-lab/martinize2_tutorials
Browse files Browse the repository at this point in the history
Martinize2 tutorials
  • Loading branch information
pckroon authored Nov 19, 2024
2 parents 83409f1 + 876f48c commit 778aa88
Show file tree
Hide file tree
Showing 7 changed files with 593 additions and 1 deletion.
187 changes: 187 additions & 0 deletions doc/source/tutorials/basic_usage.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,187 @@
===========
Basic usage
===========

Overly basic usage
==================

Without any other additions, ``martinize2`` can take your protein, and make a ready coarse
grained model with some martini
forcefield:

``martinize2 -f protein.pdb -o topol.top -x cg_protein.pdb``

This command will (try) and convert your protein from the atomistic input to one
in the Martini3 force field.

Other force fields are available to convert your protein to. To view them you
can use ``martinize2 -list-ff``.

Minimal physical reality usage
==============================

Our knowledge of proteins and Martini tells us that we need to add some more
information to the topology to account for secondary structure.

Let martinize2 deal with secondary structure for you
----------------------------------------------------

Martinize2 can deal with secondary structure intelligently using dssp in one of two ways:

1) Use a dssp executable installed on your machine
2) Use the dssp implementation in `mdtraj <https://mdtraj.org/1.9.4/api/generated/mdtraj.compute_dssp.html>`_

To explain these more:

1) If you have dssp installed locally, note that martinize2 is only validated for particular versions of dssp.
Currently the versions supported are 2.2.1 and 3.0.0.
If a non-validated version is used, a warning will be raised and nothing is written.
If you know what you're doing and are confident with what's been produced, you can override the warning
with the ``-maxwarn`` flag. Otherwise, dssp can be used using the ``-dssp`` flag in martinize.

Where you have a local installation, replace ``/path/to/dssp`` in the following command with the
location of your installation:

``martinize2 -f protein.pdb -o topol.top -x cg_protein.pdb -ff martini3001 -dssp /path/to/dssp``

2) If you have `mdtraj <https://mdtraj.org/1.9.4/api/generated/mdtraj.compute_dssp.html>`_ installed in
your python environment, the ``-dssp`` flag can be left blank, and martinize2 will attempt to use
dssp from there:

``martinize2 -f protein.pdb -o topol.top -x cg_protein.pdb -ff martini3001 -dssp``

If you want to check how the secondary structure has been assigned, martinize2 will write the
secondary structure sequence into the header information of the output topology files, along
with the input command used.

User knows best
---------------

If you already know the secondary structure of your protein and don't want to worry about
dssp assigning it correctly, the ``-ss`` flag can be used instead.

The ``-ss`` flag must be one of either:

1) The same length as the number of residues in your protein, with a dssp code for each residue
2) A single letter (eg. '``H``'), which will apply the same secondary structure parameters to your entire protein.

For example:

``martinize2 -f protein.pdb -o topol.top -x cg_protein.pdb -ss HHHHHHHHHH``

will read the ``HHHHHHHHHH`` dssp formatted string, indicating that there are exactly 10 residues in the
input pdb which should all be treated as part of a helix. If the string provided does not contain the same
number of residues as the input file, an error will be raised.

Alternatively in this case, as we know everything should be a helix, the same result can be achieved through
using a single letter as described above:

``martinize2 -f protein.pdb -o topol.top -x cg_protein.pdb -ss H``

If instead we needed to assert that only the first five residues are in a helix, and the final five are coiled,
we must we the full length string again:

``martinize2 -f protein.pdb -o topol.top -x cg_protein.pdb -ss HHHHHCCCCC``

Other basic features of martinize2
==================================

Side chain fixing
-----------------

One important addition for the generation of correct Martini protein topologies is side chain fixing.
Side chain fixing is essential for ensuring correct sampling of side chain dynamics, and involves adding
additional bonded terms into the structure of the protein, relating to the angles and dihedrals around
side chain and backbone atoms. For further information on the background and motivation for these terms,
please read the paper by `Herzog et. al <https://pubs.acs.org/doi/full/10.1021/acs.jctc.6b00122>`_.

From martinize2 version ≥ 0.12.0, side chain fixing is done automatically. For martinize2 ≤ 0.11.0,
side chain fixing must be done for the martini 3 forcefield manually:

``martinize2 -f protein.pdb -o topol.top -x cg_protein.pdb -ff martini3001 -dssp -scfix``

You can check the version of martinize2 that you have installed by running:

``martinize2 --version``

In martinize2 ≥ 0.12.0, side chain fixing is done by default. If you want to turn this behaviour off
for the forcefield that you're using, the `-noscfix` flag may be used instead.

Secondary and tertiary structure considerations
-----------------------------------------------

The examples given on this page show how to generate basic coarse grained topologies for the
martini force field using martinize2. Martinize2 has many further features to
transform your simulation from a physically naive coarse grained model to one that really
reproduces underlying atomistic behaviour. One of the most important considerations in this
is how to treat secondary structure in the absence of hydrogen bonding. Two such methods
are described in both the
`Martini protein tutorial <https://cgmartini.nl/docs/tutorials/Martini3/ProteinsI/>`_, which
should be an essential route in to conducting simulations with the martini force field.

We cover the documentation of these features in greater detail in the pages about
:doc:`Elastic Networks <elastic_networks>` and :doc:`Gō models <go_models>`.

Cysteine bridges
----------------

If your protein contains cysteine bridges, martinize2 will attempt to identify linked residues
and add correct Martini parameters between them. When bridged residue pairs
are identified, a constraint of length 0.24 nm will be added between the side chains of the two
residues.

The `-cys` flag can read one of two types of argments. The default value `-cys auto` will look
for pairs of residues within a short cutoff. This is assumed by default, so if your protein
contains disulfide bridges at the correct distance, then they'll be found automatically just using:

``martinize2 -f protein.pdb -o topol.top -x cg_protein.pdb -ff martini3001 -dssp``

You can check if the correct bridges have been identified and added in the `[ constraints ]` directive
of the output itp file. Disulfide bonds are written at the top of the directive like so::

[ constraints ]
5 25 1 0.24 ; Disulfide bridge
30 50 1 0.24 ; Disulfide bridge

Alternatively if you need to assert the identification of the bridges over a distance that isn't
automatically identified, a distance in nm can be supplied to `-cys`, e.g.:

``martinize2 -f protein.pdb -o topol.top -x cg_protein.pdb -ff martini3001 -dssp -cys 5``

will look for cysteines within 5 nm of each other and apply the same disulfide bond as before.

Molecule naming
---------------

By default, the molecules in your system will be named `molecule_{0..n}`
(*i.e.* `molecule_0`, `molecule_1`, etc.), where `n` is the number of molecules you have in the system.

To change this, the `-name` flag can be used so that your the prefix is specified how you want. For
example, the heterotetrameric protein `Sorbitol dehydrogenase <https://www.rcsb.org/structure/1pl7>`_
would produce 4 molecules called `molecule_0`, `molecule_1`, `molecule_2`, and `molecule_3`, each
with their own `.itp` file (unless the `-merge` flag was used). If one were to use this topology
along with proteins martinized separately, there will be multiple molecules called `molecule_0` in
the system, which will raise an error in gromacs at the preprocessing (grompp) stage. To solve this,
the `-name` flag can be used:

``martinize2 -f protein.pdb -o topol.top -x cg_protein.pdb -ff martini3001 -dssp -name SDH``

In this case, the molecules will be renamed to `SDH_{0..n}` to indicate that they form Sorbitol
dehydrogenase. Note that this feature is also important when the Gō model is used to ensure
unique atom names for the Gō sites are generated.

Citations
---------

At the end of the execution of martinize2, the final output log writes general information with
requests to citate relevant papers. Martinize2 collects paper citation information dynamically
based on what features have been used, such as force fields, extra parameters,
how secondary structure has been determined, and so on. For posterity and to ensure ease of
reference, the same paper citations are also printed to the header information of the output
topology files.

As the correct references are collected dynamically, all the papers printed here by martinize2
should be cited, to ensure that relevant authors and features are credited. Please do so!
Martinize2 is both free and open source, and continued citations help us to keep it this way.


Binary file added doc/source/tutorials/elastic_examples.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
136 changes: 136 additions & 0 deletions doc/source/tutorials/elastic_networks.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
================
Elastic Networks
================

The first method applied to maintain the secondary and tertiary structure
of Martini proteins was `Elastic Networks <https://doi.org/10.1021/ct9002114>`_.
In Martini and other coarse-grained models, extra restraints are necessary to
retain folded protein structure in the absense of hydrogen bonding. As the
`Martini protein tutorial <https://cgmartini.nl/docs/tutorials/Martini3/ProteinsI/>`_
suggests, try simulating a protein without them and seeing what happens!

Elastic networks are formed by finding contacts between protein backbone
beads within a particular cutoff distance, and applying harmonic bonds between them,
restraining them at the initial distance throughout your simulation.

Elastic networks are applied in Martinize2 as per the section of the help::


-elastic Write elastic bonds (default: False)
-ef RB_FORCE_CONSTANT
Elastic bond force constant Fc in kJ/mol/nm^2 (default: 700)
-el RB_LOWER_BOUND Elastic bond lower cutoff: F = Fc if rij < lo (default: 0)
-eu RB_UPPER_BOUND Elastic bond upper cutoff: F = 0 if rij > up (default: 0.9)
-ermd RES_MIN_DIST The minimum separation between two residues to have an RB the default value is set by the force-field. (default: None)
-ea RB_DECAY_FACTOR Elastic bond decay factor a (default: 0)
-ep RB_DECAY_POWER Elastic bond decay power p (default: 1)
-em RB_MINIMUM_FORCE Remove elastic bonds with force constant lower than this (default: 0)
-eb RB_SELECTION Comma separated list of bead names for elastic bonds (default: None)
-eunit RB_UNIT Establish what is the structural unit for the elastic network. Bonds are only created within a unit. Options are molecule, chain,
all, or aspecified region defined by resids, with followingformat: <start_resid_1>:<end_resid_1>, <start_resid_2>:<end_resid_2>...
(default: molecule)

Let's have a look at how to combine these in more detail.


Basic usage
-----------
Without any further consideration, an elastic network can be added to your martinize2 command easily:

``martinize2 -f protein.pdb -o topol.top -x cg_protein.pdb -ff martini3001 -dssp -elastic``

which will apply the default harmonic bond constant (700 kJ/mol/nm^2) between non-successive BB beads
which are < 0.9 nm apart in space.

NOTE! For proteins in martini 3, the default constant is 700 kJ/mol/nm^2. For proteins in martini 2,
a value of 500 kJ/mol/nm^2 may be more appropriate.


Customising cutoffs
-------------------

If your system requires more of a custom cutoff to better reproduce the dynamics of your protein,
the region for the force to be applied in can be customised using the ``-el`` and ``-eu`` flags:

``martinize2 -f protein.pdb -o topol.top -x cg_protein.pdb -ff martini3001 -dssp -elastic -el 0.1 -eu 0.5``

In this example, the elastic network will only be applied between backbone beads which are between 0.1 and 0.5 nm
apart.

Using decays
------------

The strength of the elastic bond can be tuned with distance using an exponential decay function,
which uses the ``-ea`` and ``-ep`` flags as input parameters:


.. math::
:label: decay
decay = e^{(- f * ((x - l) ^ p)}
where:

- ``l`` = lower bound (``-el``)
- ``f`` = decay factor (``-ea``)
- ``p`` = decay power (``-ep``)

Combining parameters
--------------------


.. image:: elastic_examples.png

Here we see several examples of how the strength of elastic network harmonic bonds can be tuned.

The first example uses default parameters (albeit a shorter cutoff), such that a constant force is
applied between all backbone beads within the cutoff.

The second and third examples use a slightly longer cutoff, and apply a gentle decay function
to the strength of the network. In the first case, the decay is applied naively, and as such its
strength decays from 0 distance. In the second case, combining the decay with a lower cutoff means that
for backbone beads that are close the elastic network strength is constand, but is lower between pairs slightly
further away.

The fourth example shows a similar function to the second example, but with a longer cutoff and a stronger decay.
(note the form of the exponential decay above)

The fifth example adds an additional parameter ``-em`` into the function. As described in the help, if forces are
calculated to be lower than this force, they are removed and set to zero. Note how the input values are almost identical
to the fourth example, which would otherwise get cutoff at 0.9 nm. Because the decay function reduces the force below
the minimum before the cutoff, it overrides it and the force is zeroed before the upper cutoff anyway.


Defining structural units
-------------------------

By default, martinize2 will look at the structure given in the input file, and construct a distance-based elastic
network, filtered by each molecule. This behaviour is controlled by the `-eunit` flag. If you have multiple molecules
within your input file and would like the way the elastic network is written to be changed, this can be achieved
through different specifications as described in the help above.

For example:
``martinize2 -f protein.pdb -o topol.top -x cg_protein.pdb -ff martini3001 -dssp -elastic -eunit chain``

will limit the unit to individual chains in the system. *i.e.* chain A of your protein will *not* have any elastic
bonds with chain B, and so on.

Conversely,
``martinize2 -f protein.pdb -o topol.top -x cg_protein.pdb -ff martini3001 -dssp -elastic -eunit all``

will write elastic bonds between every molecule in your system in the positions that have been found.

Finally:

``martinize2 -f protein.pdb -o topol.top -x cg_protein.pdb -ff martini3001 -dssp -elastic -eunit 1:100 150:200``

Will write elastic networks internally between residues 1 to 100, and residues 150 to 200, but *not* between either of
these domains, nor between either of these domains and residues 101 to 149.


Visualising elastic networks
----------------------------

If you want to look at your elastic network in VMD to confirm that it's been constructed in the
way that you're expecting, the `MartiniGlass <https://github.com/Martini-Force-Field-Initiative/MartiniGlass>`_
package can help write visualisable topologies to view.
Loading

0 comments on commit 778aa88

Please sign in to comment.