plot-ks.pl

This script searches for paralogous genes in a given transcriptome and then creates a histogram of the estimated number of synonymous substitions per synonymous site (Ks) between paralogs.

Dependencies

TransDecoder
- Used to determine likely open reading frames in a given transcriptome and its corresponding protein translation.
Blat
- Used to determine likely paralogs.
KaKs_Calculator
- Used to estimate the number of synonymous substitions per synonymous site (Ks) between paralogs.
R
- Plotting and statistics software used to generate the actual Ks plot.

Script Workflow

Open reading frames and their corresponding translations are first determined using TransDecoder. Once the likely proteome has been obtained, a self-blat (proteome is both the query and the database) is then performed to identify likely paralogs present within the transcriptome. The sequences of each likely set of paralogs are then reverse translated back to their original nucleotide sequence. These nucleotide alignments are then input into KaKs_Calculator to estimate Ks. Once all Ks values have been calculated, R is used to create a histogram which depicts the distribution of Ks between paralogs present in the transcriptome.

Script Usage & Settings

Usage

This script requires a single transcriptome in FASTA format in order to run, the following is an example of proper script invocation:

plot-ks.pl transcriptome.fa --out-dir my_ks_plot

To allow for faster modifications of an output plot, redundant calculations can be avoided by specifying the name of a directory previously generated by usage. For instance, if you previously had run the script with the above invocation but realized after its creation that you would likely to narrow down the Ks of the output plot, you could use the following invocation to avoid unneccessary recalculations:

plot-ks.pl my_ks_plot --ks-max 1

Command Line Options

For further fine-tuning of the script, the following options can also be specified:

Option Flag(s)	Option Descripton	Default
-m, --model	model used by KaKs_Calculator to determine Ks	YN
-l, --min-length	the minimum alignment length of paralogous sequences	300 nucleotides
-x, --exclude-zero	used to exclude Ks = 0 from plot, useful for unfiltered Trinity assemblies	N/A
-b, --bin-size	size of bins used in histogram of Ks plot, set to 0 for a density plot	Ks = 0.05
-o, --out-dir	name of the directory to store output files in	"plot-ks-" + Unix time of script invocation)
--ks-min	lower boundary for x-axis of Ks plot	Ks = 0
--ks-max	upper boundary for x-axis of Ks plot	Ks = 3
--pfam-cpus	the number of CPUs to let hmmscan using during pfam analysis	current number of free CPUs
--pfam-search	full path to pfam binary for usage in pfam search	none
-T, --n-threads	the maximum number of CPUs to use concurrently	current number of free CPUs
-h, --help	display help and exit	N/A

Output Files

The names of the output files of this script varies depending on the settings it was run with, the following identifiers will be used as placeholders which represent the actual values of the specified command line options at script run time:

$MODEL = -m, --model
$MIN_LENGTH = -l, --min-length
$BIN_SIZE = -b, --bin-size
$KS_MIN = --ks-min
$KS_MAX = --ks-max

Assuming that an input file with the name 'transcriptome.fa' was specified to the script, the following files can be found in the output directory upon successful completion of the script:

TransDecoder output:
- transcriptome.fa.transdecoder.bed:
- transcriptome.fa.transdecoder.cds: contains the nucleotide sequences of detected ORFS
- transcriptome.fa.transdecoder.gff3:
- transcriptome.fa.transdecoder.mRNA: essentially the input transcriptome
- transcriptome.fa.transdecoder.pep: contains the protein translation of the .cds file, this file is the one used in the self-blat
self-blat.pslx: contains the output of the self-blat
paralogs-t$MIN_LENGTH.atx: contains the sequences of identified paralogs, used as input for KaKs_Calculator
paralogs-t$MIN_LENGTH-m$MODEL.kaks: contains the estimated Ks as well as additional information calculated for each identified pair of paralogs by KaKs_Calculator
paralogs-t$MIN_LENGTH-m$MODEL.csv: contains only the estimated Ks from the previous file
transcriptome-t$MIN_LENGTH-m$MODEL-b$BIN_SIZE-range[$KS_MIN-$KS_MAX].pdf: the actual histogram generated by this script.
- if run with --bin-size 0, '-b$BIN_SIZE-' will appear as '-density-'
- if run with --exclude-zero (and assuming $KS_MIN still equals 0), '-range[0-$KS_MAX]' will appear as '-range(0-$KS_MAX]'
symlink to the input FASTA files

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
README.md		README.md
plot-ks.pl		plot-ks.pl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

plot-ks.pl

Dependencies

Script Workflow

Script Usage & Settings

Usage

Command Line Options

Output Files

About

Releases

Packages

Languages

nkarimi/plot-ks

Folders and files

Latest commit

History

Repository files navigation

plot-ks.pl

Dependencies

Script Workflow

Script Usage & Settings

Usage

Command Line Options

Output Files

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages