-
Notifications
You must be signed in to change notification settings - Fork 55
MLST calling with ARIBA
ARIBA can be used for MLST using the typing schemes from PubMLST. A list of available species can be obtained by running
ariba pubmlstspecies
Download the data (in this example, Staphylococcus aureus
) using pubmlstget
:
ariba pubmlstget "Staphylococcus aureus" get_mlst
Note that a few species have dual typing schemes each. For example:
-
Escherichia coli#1
: Achtman's seven-gene scheme -
Escherichia coli#2
: Pasteur Institute's eight-gene scheme
See Issue 185 for how to make a customised MLST database.
Then run MLST using ARIBA with:
ariba run get_mlst/ref_db reads_1.fq reads_2.fq ariba_out
where reads_1.fq
and reads_2.fq
are the paired reads files for your sample.
The two important files are mlst_report.tsv
and mlst_report.details.tsv
.
mlst_report.tsv
is a summary of the allele calls and identified sequence type. The format is like this:
ST gene1 gene2 gene3
42 1 4 7
where in this case the sequence type is identified as 42.
A star next to any call indicates that there was some uncertainty. For example:
ST gene1 gene2 gene3
42* 1* 4 7
A star is added if any heterozygous SNPs are detected, the percent of the gene called or percent identity is less than 100, or there is more than one contig in the assembly.
mlst_report.details.tsv
has more details on each allele call. For example, the file corresponding to the previous report could look like this:
gene allele cov pc ctgs depth hetmin hets
gene1 1* 100.00 99.8 1 28.9 . .
gene2 4 100.00 100.0 1 45.9 . .
gene3 7 100.0 100.0 1 54.3 . .
where the columns are as follows.
- gene: the name of the gene
- allele: the allele called
- cov: percent of the gene that was assembled
- pc: percent identity between the gene and assembly
- ctgs: number of contigs in the assembly
- depth: mean read depth of the contig(s)
- hetmin: minimum(max allele depth as a percent of total depth), across all identified heterozygous SNPs. e.g. for the example below where the hets column is
30,10.25,10,5
, this would be100 * min(30/(30+10), 25/(25+10+5)) = 62.5
. - hets: a list of the heterozygous SNP depths. For example
30,10.25,10,5
corresponds to two heterozygous SNPs, the first with read depths 30 and 10, and the second with depths 25, 10, and 5.
All other output files are as described in the run page.