- DESCRIPTION
- AVAILABILITY
- INSTALLATION
- EXAMPLE ANALYSES
- GETTING STARTED
- RUN
- DOCUMENTATION
- CITING
- FURTHER READING
- COPYRIGHT AND LICENSE
sowhat
automates the SOWH test, a statistical test of phylogenetic topologies using a parametric bootstrap. It works on amino acid, nucleotide, and binary character state datasets.
A peer-reviewed manuscript describing sowhat
is available at Systematic Biology: http://sysbio.oxfordjournals.org/content/early/2015/07/30/sysbio.syv055.abstract
sowhat
includes several features that provide flexibility and aid in the interpretation and assessment of SOWH test results, including:
- The test is performed with the adjustment suggested by Susko 2014 (http://dx.doi.org/10.1093/molbev/msu039).
- Partitions, including partitions by codon position, can be used.
- Missing data (gaps in alignment) are propagated from the original dataset to the simulated dataset.
- Confidence intervals are estimated for the p-value, which helps the investigator assess if a sufficient number of bootstrap replicates have been sampled.
sowhat
is in active development. Please use with caution. We appreciate hearing about your experience with the program via the issue tracker.
https://github.com/josephryan/sowhat (click the "Download ZIP" button at the bottom of the right column).
sowhat
is available in a docker conatiner (thanks to @xqua for troublshooting). To load a container with sowhat
and the required dependencies, use the following:
docker pull shchurch/sowhat
Create a conda environment called sowhat
conda create sowhat
conda activate sowhat
conda install -c conda-forge perl-app-cpanminus
conda install -c bioconda seq-gen
cpanm Statistics::R
Download a fresh distribution and cd to the directory (eg. sowhat-1.0) and install sowhat.
cd sowhat-1.0/
perl Makefile.PL
make
make test
make install
You will need to activate this environment whenever you want to run sowhat
To install sowhat
and documentation, use the following:
perl Makefile.PL
make
make test
sudo make install
To install without root privelages try:
perl Makefile.PL PREFIX=/home/myuser/scripts
make
make test
sudo make install
You can install SOWHAT and all the required dependencies listed above on a clean Ubuntu 15.04
machine with the following commands (executables will be placed in /usr/local/bin
):
sudo apt-get update
sudo apt-get install -y r-base-core cpanminus unzip gcc git
sudo cpanm Statistics::R
sudo cpanm JSON
sudo Rscript -e "install.packages('ape', dependencies = T, repos='http://cran.rstudio.com/')"
cd ~
git clone https://github.com/josephryan/sowhat.git
cd `sowhat`/
# To work on the development branch (not recommended) execute: git checkout -b Development origin/Development
sudo ./build_3rd_party.sh
perl Makefile.PL
make
make test
sudo make install
Note that build_3rd_party.sh
installs some dependencies from versions that are cached in
this repository. They may be out of date.
Additional information on system requirements and dependencies are listed below.
Several test datasets are provided in the examples/
directory. To run example analyses
on these datasets, execute:
./examples.sh
See examples.sh
and the resulting test.output/
directory for more on the specifics of
sowhat
use.
Warning: Some of the examples take time (especially those that use Garli). For a quick example run make test
and see the output in the test.output
directory.
Format: non-interleaved PHYLIP format
This can be DNA, amino acid, or binary characters. Often, you would have performed phylogenetic analyses on this alignment and recovered a result that was in conflict with an a priori hypothesis.
Format: Newick format
The constraint tree represents a hypothesis that you would like to compare to the ML tree or some alternative hypothesis. In most cases you will want a tree that is mostly unresolved except for the clade being tested.
For example if your ML tree showed a sister relationship between two taxa 'A' and 'B' and you want to compare this result to topology with a sister relationship between 'A' and 'C,' you would create the following constraint tree:
((A,C),B,D,E,F);
Note that the relationship B
, D
, E
, and F
is unresolved.
The only other required parameter when using RAxML is
--raxml_model
This option can specify any of the models that are available to RAxML. Running sowhat
with the option --raxml_model=available
will provide a list of all possible models.
Other RAxML parameters (including number of threads) can be specified with the option:
--rax
for example:
--rax='/usr/local/bin/raxmlHPC-PTHREADS -T 20'
See examples.sh
for examples of sowhat
commands.
By default sowhat
samples 1000 bootstrap replicates. This can be adjusted with --reps=[sample size]
. A sufficient sample size can be assessed by checking the reported confidence interval around the p-value.
The results of the SOWH test are included in a file called sowhat.results.txt
, which can be found in the directory specified with the --dir
option.
At the bottom of sowhat.results.txt
is a p-value representing the probability that the test statistic would be observed under the null hypothesis.
A run that has been cut short can be restarted using the --restart
option. In this case the null distribution will be recalculated iteratively using the previously simulated samples in the null distributions. Only the most recent two generations of sequence simulation and tree estimation will be reperformed to prevent any errors from an unfinished tree estimation.
Additional outputs include
- detailed information on the model used for simulating new alignments in the file
sowhat.model.txt
- information on the null distribution in
sowhat.distribution.txt
- the trace file for the run is printed to
sowhat.trace.txt
- program files printed to a directory
sowhat_scratch
. Within this directory, the files ending in...i.0.0
represent the initial search of the empirical alignment file.
Results can be printed to a file sowhat.results.json
using the option
--json
The SOWH test can take a lot of time, especially on datasets where a single tree search can take many hours. Threads can be incorporated into raxml as described above with the --rax options
, which can speed up the tree searches considerably.
In some cases, though, the user may want to further parallelize the sowhat
test. The following option allows a user to run the tree searches on simulated datasets simultaneously, for example on a cluster.
To use this option, you must specify the following options:
--print_tree_scripts --reps=[sample size, default=1000]
The initial two tree searches on the observed data will be performed. Subsequently sowhat
will generate simulated alignments and print a series of scripts to execute the tree searches to the folder [--dir]/sowhat_scratch/tree_scripts/
.
Each of these scripts must be executed externally, and can be run simultaneously. After they have all been completed, the user reruns sowhat
with the following options:
--print_tree_scripts --reps=[same number of reps] --restart
One note: if the inital sample size is too low (the confidence interval around the p-value indicates that the results are not definitive), the user can generate additional tree scripts by rerunning the sowhat
command with the following options:
--print_tree_scripts --reps=[some higher number of reps] --restart
sowhat will not calculate the statistics until the number of tree scripts specified in the number of reps have been executed successfully.
See this page for descriptions of additional options and how to use more complex models.
sowhat
--constraint=NEWICK_CONSTRAINT_TREE
--aln=PHYLIP_ALIGNMENT
--name=NAME_FOR_REPORT
--dir=DIR
[--debug]
[--garli=GARLI_BINARY_OR_PATH_PLUS_OPTIONS]
[--garli_conf=PATH_TO_GARLI_CONF_FILE]
[--help]
[--initial]
[--json]
[--max]
[--raxml_model=MODEL_FOR_RAXML]
[--nogaps]
[--partition=PARTITION_FILE]
[--pb=PB_BINARY_OR_PATH_PLUS_OPTION
[--pb_burn=BURNIN_TO_USE_FOR_PB_TREE_SIMULATIONS]
[--plot]
[--ppred=PPRED_BINARY_OR_PATH_PLUS_OPTIONS]
[--print_tree_scripts]
[--rax=RAXML_BINARY_OR_PATH_PLUS_OPTIONS]
[--reps=NUMBER_OF_REPLICATES]
[--resolved]
[--rerun]
[--restart]
[--runs=NUMBER_OF_TESTS_TO_RUN]
[--seqgen=SEQGEN_BINARY_OR_PATH_PLUS_OPTIONS]
[--treetwo=NEWICK_ALTERNATIVE_TO_CONST_TREE]
[--usepb]
[--usegarli]
[--usegentree=NEWICK_TREE_FOR_SIMULATING_DATA]
[--version]
Extensive documentation is embedded inside of sowhat
in POD format and
can be viewed by running any of the following:
`sowhat` --help
perldoc `sowhat`
man `sowhat` # available after installation
A peer-reviewed manuscript describing sowhat
is available at Systematic Biology:
Church, Samuel H., Joseph F. Ryan, and Casey W. Dunn. "Automation and Evaluation of the SOWH Test with SOWHAT" Systematic Biology 2015 Nov;64(6):1048-58. doi: 10.1093/sysbio/syv055
Also see the file sowhat
.bibtex
Goldman, Nick, Jon P. Anderson, and Allen G. Rodrigo. "Likelihood-based tests of topologies in phylogenetics." Systematic Biology 49.4 (2000): 652-670. doi:10.1080/106351500750049752
Swofford, David L., Gary J. Olsen, Peter J. Waddell, and David M. Hillis. Phylogenetic inference. (1996): 407-514. http://www.sinauer.com/molecular-systematics.html
We have tested sowhat
on OS X 10.9, OS X 10.10, Ubuntu Server 10.04 (Amazon ami-d05e75b8), and Ubuntu Desktop 15.04. It will likely work on a variety of other Unix-like operating systems.
The dependencies listed below are required by sowhat
. They must be installed and
available in the appropriate PATH
. If they are not installed already, follow the
installation instructions in the links provided for each tool. We have tested sowhat
with the indicated dependency versions. Other versions may be incompatible, and should be
used with caution. These external tools are the result of a considerable amount of work by other investigators, please also cite them when you cite sowhat
.
Phylogenetic programs:
General system tools:
- Perl, which comes with most operating systems
- R
- The Statistics::R Perl module.
Statistics::R
has additional requirements, as described at http://search.cpan.org/dist/Statistics-R/README. Use thelocal::lib
option to installStatistics::R
withoutsudo
. Use the boostrap method found at http://search.cpan.org/~haarg/local-lib-2.000004/lib/local/lib.pm for installation information. Once local::lib has been installed, and with R installed, install the Statistics::R package as you would normally. The use local::lib option must be activated in the program as well. - The IPC::Run Perl module is currently needed for
make test
to work correctly (optional).
To use more alternative models, you will need to install the following optional dependency:
- GARLI, v2.01.1067 (optional)
- PhyloBayes
To print results to a json file, you will need to install the following optional dependency:
- The JSON Perl module.
Copyright (C) 2015 Samuel H. Church, Joseph F. Ryan, Casey W. Dunn
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program in the file LICENSE. If not, see http://www.gnu.org/licenses/.