TBtypeR

TBtypeR is an R package for accurate and sensitive quantification of mixtures of M. tuberculosis (MTB) strains from whole genome sequencing data. TBtypeR excels as detecting low-frequency mixed infections that other tools struggle to detect, maintaining a sensitivity of ~85% for minor strain frequencies of 2.5%, and ~55% for minor strain frequencies of 1%. TBtypeR is implemented as a standalone R package and as part of a end-to-end Nextflow pipeline, TBtypeNF.

Performance

Extensive benchmarking of TBtypeR against available tools for MTB mixture detection is detailed in our preprint on MedRxiv. TBtypeR has the highest accuracy in prediction of minor strain fractions, with other tools unable to accurately detect or quantify mixtures below 5%:

The Nextflow Pipeline: TBtypeNF

TBtypeNF is a Nextflow pipeline for TBtypeR that takes FASTQ files as input and performs FASTQ preprocessing with fastp, read alignment with BWA-MEM, variant calling with BCFtools, and quality control report generation using SAMtools and mosdepth. Variant calls from BCFtools are then passed to TBtypeR to identify MTBC lineages and mixtures. The output is an HTML report with detected MTBC strains and mixtures frequencies.

TBtypeNF requires a sample manifest in TSV format with column names “sample”, “fastq1” and “fastq2” - see example manifest.

Requirements

Nextflow (≥ 22.03.0)
Singularity (Apptainer) or Docker

Usage Example

# download example manifest
wget https://raw.githubusercontent.com/bahlolab/TBtypeR/main/TBtypeNF/resources/lung_example_manifest.tsv -O my_manifest.tsv
# run the nextflow pipeline
nextflow run bahlolab/TBtypeR/TBtypeNF/main.nf -r main -profile singularity --manifest my_manifest.tsv

TBtypeNF Parameters

Parameter	Description	Default Value
manifest	Input sample manifest	null
id	Run identifier, for naming output files	‘TBtypeNF-run’
outdir	Output files directory	‘output’
publish_bams	Save BAM files to output directory	false
fast	Run FastTBtypeNF workflow	false
max_mix	Maximum number of strains in a mixture to be detected	3
min_mix_prop	Minimum mixture proportion to be detected	0.005

FastTBtypeNF

A faster workflow which skips alignment and quality control reporting is implemented by using Fastlin to generate allele counts for use by TBtypeR. This is generally 10x faster to run and gives near identical results. To use this mode supply the parameter --fast on the command line, e.g.:

nextflow run bahlolab/TBtypeR/TBtypeNF/main.nf -r main -profile singularity --manifest my_manifest.tsv --fast

The R package: TBtypeR

The easiest way to use TBtypeR is through the TBtypeNF pipeline. However, additional parameters and customisation is available by using the R package directly. TBtypeR can be installed with devtools as follows:

devtools::install_github("bahlolab/TBtypeR")

Data Requirements

The sensitivity of TBtypeR to detect low-frequency mixtures is dependant on sequencing coverage. Based on our testing, we recommended to have an average coverage of at least 20x for detection of 5% mixtures and above, of at least 40x for mixtures down to 2.5%, and of at least 60x for mixtures down to 1%.

It is recommended to either use TBtypeNF or BCFtools Call to generate VCF files for TBtypeR. VCF files generated by other software may be compatible if the following conditions are met:

Reference Genome: TBtypeR expects variants to be called angainst the H37Rv reference genome. The chromosome must be named either “AL123456.3” or “NC_000962.3”.
AD Format Field: TBtypeR requires the allelic depth to be stored in the VCF format field AD, consistent with BCFtools call output.

Example usage of TBtypeR:

library(tidyverse)
library(TBtypeR)

# replace with path to your VCF file
vcf_filename <- system.file('vcf/example.vcf.gz', package = 'TBtypeR')

tbtype_result <- 
  # generate TBtypeR results
  tbtype(vcf = vcf_filename) %>% 
  # filter TBtypeR results
  filter_tbtype(max_phylotypes = 3) %>%
  # unnest data so there is 1 row per identified Mtb strain in each sample
  unnest_mixtures()

tbtype_result %>% 
  select(sample_id, n_phy, mix_phylotype, mix_prop) %>% 
  knitr::kable()

sample_id	n_phy	mix_phylotype	mix_prop
SRR13312530	2	4.2.1	0.8579
SRR13312530	2	4.3.3	0.1421
SRR13312531	1	4.3.3	1.0000
SRR13312533	2	4.3.3	0.9192
SRR13312533	2	4.2.1	0.0808

Visualise mixtures:

tbtype_result %>% 
  ggplot(aes(x = sample_id,
             y = mix_prop,
             fill = mix_phylotype)) +
  geom_col() +
  coord_flip() +
  labs(x = 'Sample ID',
       y = 'Minor Strain Fraction (%)', 
       fill = 'Sublineage') +
  theme(text = element_text(size = 6))

Detailed usage guides for the tbtype and filter_tbtype functions are available in the package documentation by running help(tbtype) or help(filter_tbtype).

Citation

TBtypeR is published as a preprint on MedRxiv:

Munro, J. E., Coussens, A. K., & Bahlo, M. (2024). TBtypeR: Sensitive detection and sublineage classification of low-frequency Mycobacterium tuberculosis complex mixed infections. medRxiv, 2024-06

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
R		R
README_files/figure-gfm		README_files/figure-gfm
TBtypeNF		TBtypeNF
data-raw		data-raw
data		data
inst		inst
man		man
misc		misc
publication		publication
.Rbuildignore		.Rbuildignore
DESCRIPTION		DESCRIPTION
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TBtypeR

Performance

The Nextflow Pipeline: TBtypeNF

Requirements

Usage Example

TBtypeNF Parameters

FastTBtypeNF

The R package: TBtypeR

Data Requirements

Citation

About

Releases 1

Packages

Languages

License

bahlolab/TBtypeR

Folders and files

Latest commit

History

Repository files navigation

TBtypeR

Performance

The Nextflow Pipeline: TBtypeNF

Requirements

Usage Example

TBtypeNF Parameters

FastTBtypeNF

The R package: TBtypeR

Data Requirements

Citation

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages