Skip to content

Commit

Permalink
Merge pull request #24 from jmenglund/develop
Browse files Browse the repository at this point in the history
Add the ability to manually set base frequences
  • Loading branch information
jmenglund authored Jul 11, 2019
2 parents e82c45b + afd9ee2 commit 39ad307
Show file tree
Hide file tree
Showing 5 changed files with 88 additions and 44 deletions.
18 changes: 14 additions & 4 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,20 +5,30 @@ All notable changes to this project will be documented in this file.
This project adheres to `Semantic Versioning <http://semver.org/>`_.


v0.7.0 - 2019-07-11
-------------------

* Added flags ``-f`` and ``--freqs`` to manually set base frequences.
As a consequence, the flags ``-o`` and ``--out-format`` have to be
used when specifying the output format.

`View commits <https://github.com/jmenglund/predsim/compare/v0.6.0...v0.7.0>`_


v0.6.0 - 2019-05-05
-------------------

Added
~~~~~

* Ability to write used trees to a file with the ``--trees-file`` option.
* Ability to write used trees to a file with the ``--trees-file`` flag.
* Simple test data for various substitution models.


Changed
~~~~~~~

* Command-line option ``-o`` and ``--output-format`` replaced with ``-f`` and
* Command-line flags ``-o`` and ``--output-format`` replaced with ``-f`` and
``--format``, respectively.
* The file with seed numbers may now contain more numbers than being used.
* Content is written to output after completing each simulation (instead
Expand Down Expand Up @@ -66,8 +76,8 @@ v0.4.0 - 2019-03-09
Added
~~~~~

* The command-line options ``-o`` and ``--output-format`` now allow
writing output to either the PHYLIP or the NEXUS format.
* The command-line flags ``-o`` and ``--output-format`` now allow
writing output to either Phylip or Nexus format.


Changed
Expand Down
20 changes: 12 additions & 8 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ predsim
|Build-Status| |Coverage-Status| |PyPI-Status| |License| |DOI-URI|

**predsim** is a simple command-line tool for simulating predictive
datasets from `MrBayes <http://mrbayes.sourceforge.net>`_ output files.
datasets from `MrBayes' <http://mrbayes.sourceforge.net>`_ output files.
Datasets can be simulated under the GTR+G+I substitution model or any nested
variant available in MrBayes (JC69, HKY85 etc.). The code is contained
within a single module that can be imported using Python's import mechanism.
Expand Down Expand Up @@ -65,9 +65,9 @@ Usage
.. code-block::
$ predsim --help
usage: predsim [-h] [-V] [-l N] [-g N] [-s N] [-n N] [-f {nexus,phylip}]
[-p FILE] [--seeds-file FILE] [--commands-file FILE]
[--trees-file FILE]
usage: predsim [-h] [-V] [-l N] [-f #A #C #G #T] [-g N] [-s N] [-n N]
[-o {nexus,phylip}] [-p FILE] [--seeds-file FILE]
[--commands-file FILE] [--trees-file FILE]
pfile tfile
A command-line utility that reads posterior output of MrBayes and simulates
Expand All @@ -81,12 +81,15 @@ Usage
-h, --help show this help message and exit
-V, --version show program's version number and exit
-l N, --length N sequence lenght (default: 1000)
-f #A #C #G #T, --freqs #A #C #G #T
base frequences (overrides any base frequences in
MrBayes' output)
-g N, --gamma-cats N number of gamma rate categories (default: continuous)
-s N, --skip N number of records (trees) to skip at the beginning of
the sample (default: 0)
-n N, --num-records N
number of records (trees) to use in the simulation
-f {nexus,phylip}, --format {nexus,phylip}
-o {nexus,phylip}, --out-format {nexus,phylip}
output format (default: "nexus")
-p FILE, --seqgen-path FILE
path to a Seq-Gen executable (default: "seq-gen")
Expand All @@ -95,9 +98,10 @@ Usage
--commands-file FILE path to output file with commands used by Seq-Gen
--trees-file FILE path to output file with trees used by Seq-Gen
* It is recommended that you use the ``--commands-file`` and the ``--trees-file``
options to check the input given to Seq-Gen.
* If base frequences are missing from MrBayes' output, these must be set manually
with the ``-f`` (or ``--freqs``) flag.
* It is recommended that you use the ``--commands-file`` and ``--trees-file``
flags to check the input given to Seq-Gen.


Running the tests
Expand Down
56 changes: 34 additions & 22 deletions predsim.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@

__author__ = 'Markus Englund'
__license__ = 'MIT'
__version__ = '0.6.0'
__version__ = '0.7.0'


def main(args=None):
Expand All @@ -41,7 +41,7 @@ def main(args=None):

result_iterator = iter_seqgen_results(
simulation_input, seq_len=parser.length, gamma_cats=parser.gamma_cats,
seqgen_path=parser.sg_filepath)
basefreqs=parser.basefreqs, seqgen_path=parser.sg_filepath)

if parser.out_format == 'nexus':
schema_kwargs = {'schema': 'nexus', 'simple': False}
Expand Down Expand Up @@ -75,6 +75,12 @@ def parse_args(args):
parser.add_argument(
'-l', '--length', action='store', default=1000, type=int,
help='sequence lenght (default: 1000)', metavar='N', dest='length')
parser.add_argument(
'-f', '--freqs', action='store', type=float, nargs=4,
help=(
'base frequences (overrides any base frequences '
'in MrBayes\' output)'),
metavar=('#A', '#C', '#G', '#T'), dest='basefreqs')
parser.add_argument(
'-g', '--gamma-cats', action='store', type=int,
help='number of gamma rate categories (default: continuous)',
Expand All @@ -88,7 +94,7 @@ def parse_args(args):
help='number of records (trees) to use in the simulation',
metavar='N', dest='num_records')
parser.add_argument(
'-f', '--format', default='nexus', choices=['nexus', 'phylip'],
'-o', '--out-format', default='nexus', choices=['nexus', 'phylip'],
help='output format (default: "nexus")', dest='out_format')
parser.add_argument(
'-p', '--seqgen-path', default='seq-gen', type=str,
Expand All @@ -111,7 +117,7 @@ def parse_args(args):
help='path to a MrBayes p-file', metavar='pfile')
parser.add_argument(
'tfile_path', action=StoreExpandedPath, type=is_file,
help='path to a MrBayes t-file', metavar='tfile', )
help='path to a MrBayes t-file', metavar='tfile')

return parser.parse_args(args)

Expand Down Expand Up @@ -200,35 +206,41 @@ def kappa_to_titv(kappa, piA, piC, piG, piT):
return titv


def get_seqgen_params(mrbayes_params):
def get_seqgen_params(mrbayes_params, basefreqs=None):
"""
Adapt MrBayes parameter values for use with Seq-Gen.
Paramters
---------
mrbayes_prams : dict
mrbayes_params : dict
Parameter values from a single row in a MrBayes p-file.
basefreqs : list of floats
Frequences for the four nucleotides A, C, G, and T to use
if missing from MrBayes output.
Returns
-------
seqgen_params : dict
"""
seqgen_params = {}
try:
seqgen_params['state_freqs'] = (
str(mrbayes_params['pi(A)']) + ',' +
str(mrbayes_params['pi(C)']) + ',' +
str(mrbayes_params['pi(G)']) + ',' +
str(mrbayes_params['pi(T)']))
except KeyError:
seqgen_params['state_freqs'] = '0.25,0.25,0.25,0.25'

if basefreqs is None:
try:
basefreqs = [
float(mrbayes_params['pi(A)']),
float(mrbayes_params['pi(C)']),
float(mrbayes_params['pi(G)']),
float(mrbayes_params['pi(T)'])]
except KeyError as exc:
msg = (
'Base frequences must be provided since they '
'are not present in MrBayes\' output.')
raise KeyError(msg) from exc

seqgen_params = {'state_freqs': ','.join([str(v) for v in basefreqs])}

try:
seqgen_params['ti_tv'] = kappa_to_titv(
float(mrbayes_params['kappa']),
float(mrbayes_params['pi(A)']),
float(mrbayes_params['pi(C)']),
float(mrbayes_params['pi(G)']),
float(mrbayes_params['pi(T)']))
float(mrbayes_params['kappa']), *basefreqs)
except KeyError:
pass
try:
Expand Down Expand Up @@ -267,11 +279,11 @@ def combine_simulation_input(tree_list, p_dicts, rng_seeds=None):


def iter_seqgen_results(
simulation_input, seq_len=1000, gamma_cats=None,
simulation_input, seq_len=1000, gamma_cats=None, basefreqs=None,
seqgen_path='seq-gen'):
"""Iterate over multiple simulations."""
for tree, p_dict, rng_seed in simulation_input:
seqgen_params = get_seqgen_params(p_dict)
seqgen_params = get_seqgen_params(p_dict, basefreqs=basefreqs)
result = simulate_matrix(
tree, seq_len=seq_len, rng_seed=rng_seed,
seqgen_path=seqgen_path, **seqgen_params)
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@

setup(
name='predsim',
version='0.6.0',
version='0.7.0',
description=(
'Command-line tool for simulating predictive datasets '
'from MrBayes\' output.'),
Expand Down
36 changes: 27 additions & 9 deletions test_predsim.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,8 @@ def test_equal_basefreqs(self):
assert get_seqgen_params(self.d1) == self.d2

def test_missing_basefreq(self):
assert get_seqgen_params(self.d3) == self.d2
with pytest.raises(KeyError):
get_seqgen_params(self.d3)


@seqgen_required
Expand Down Expand Up @@ -253,14 +254,17 @@ def test_parser(self):
seeds_filepath = get_testfile_path('seeds_1.txt')

parser = parse_args([
'-l', '2', '-s', '1', '-g', '5', '-n', '1', '-f', 'phylip',
'-p', 'sg', '--seeds-file', seeds_filepath,
'-l', '2', '-s', '1',
'-g', '5', '--freqs', '0.25', '0.25', '0.25', '0.25',
'-n', '1', '--out-format', 'phylip', '-p', 'sg',
'--seeds-file', seeds_filepath,
'--commands-file', self.commands_fo.name,
'--trees-file', self.trees_fo.name,
pfile_path, tfile_path])
assert parser.length == 2
assert parser.skip == 1
assert parser.gamma_cats == 5
assert parser.basefreqs == [0.25, 0.25, 0.25, 0.25]
assert parser.num_records == 1
assert parser.out_format == 'phylip'
assert parser.sg_filepath == 'sg'
Expand Down Expand Up @@ -338,7 +342,7 @@ def test_hky_outfile(self, outfile_arg):

def test_hky_phylip(self, capsys):
main([
'-l', '2', '-s', '2', '-f', 'phylip',
'-l', '2', '-s', '2', '--out-format', 'phylip',
'--seeds-file', get_testfile_path('seeds_1.txt'),
get_testfile_path('hky.p'),
get_testfile_path('hky.t')])
Expand All @@ -351,16 +355,30 @@ def test_hky_phylip(self, capsys):
'jc',
'jc_gamma',
'jc_propinvar',
'jc_invgamma',
'gtr'])
def test_subst_model(self, capsys, model_string):
'jc_invgamma'])
def test_jc(self, capsys, model_string):
pfile_path, tfile_path, expected_path = get_model_testfile_paths(
model_string, num_records=1, ext='.nex')
with open(expected_path, 'r') as fo:
expected_out = fo.read()
main([
'-l', '2', '-n', '1', '--seeds-file',
get_testfile_path('seeds_1.txt'), pfile_path, tfile_path])
'-l', '2', '-n', '1',
'--freqs', '0.25', '0.25', '0.25', '0.25',
'--seeds-file', get_testfile_path('seeds_1.txt'),
pfile_path, tfile_path])
out, err = capsys.readouterr()
assert out == expected_out
assert err == ''

def test_gtr(self, capsys):
pfile_path, tfile_path, expected_path = get_model_testfile_paths(
'gtr', num_records=1, ext='.nex')
with open(expected_path, 'r') as fo:
expected_out = fo.read()
main([
'-l', '2', '-n', '1',
'--seeds-file', get_testfile_path('seeds_1.txt'),
pfile_path, tfile_path])
out, err = capsys.readouterr()
assert out == expected_out
assert err == ''

0 comments on commit 39ad307

Please sign in to comment.