Skip to content
Brendan Daisley edited this page Mar 11, 2024 · 1 revision

Description:

This function performs taxonomic classification steps by searching query Sanger sequences against specified database of interest. Takes CSV input files, extracts FASTA-formatted query sequences and performs global alignment against specified database of interest via Needleman-Wunsch algorithm by wrapping the –usearch_global command implemented in VSEARCH. Default taxonomic rank cutoffs for 16S rRNA gene sequences are based on Yarza et al. 2014, Nat Rev Microbiol.

Usage:

isoTAX(
  input = NULL,
  export_html = TRUE,
  export_csv = TRUE,
  quick_search = TRUE,
  db = "16S",
  db_path = NULL,
  iddef = 2,
  phylum_cutoff = 75,
  class_cutoff = 78.5,
  order_cutoff = 82,
  family_cutoff = 86.5,
  genus_cutoff = 96.5,
  species_cutoff = 98.7)

Arguments:

Parameter Description
input Path of CSV output file from isoQC step.
export_html (Default=TRUE) Output the results as an HTML file.
export_csv (Default=TRUE) Output the results as a CSV file.
quick_search (Default=FALSE) Whether or not to perform a comprehensive database search (i.e. optimal global alignment). If TRUE, performs quick search equivalent to setting VSEARCH parameters "–maxaccepts 100 –maxrejects 100". If FALSE, performs comprehensive search equivalent to setting VSEARCH parameters "–maxaccepts 0 –maxrejects 0"
db (Default="16S") Select database option(s) including "16S" (for searching against the NCBI Refseq targeted loci 16S rRNA database), "ITS" (for searching against the NCBI Refseq targeted loci ITS database. For combined databases in cases where input sequences are derived from bacteria and fungi, select "16S|ITS". Setting to anything other than db=NULL or db="custom" causes 'db.path' parameter to be ignored.
db_path Path of FASTA-formatted database sequence file. Ignored if 'db' parameter is set to anything other than NULL or "custom".
iddef Set pairwise identity definition as per VSEARCH definitions (Default=2, and is recommended for highest taxonomic accuracy) (0) CD-HIT definition: (matching columns) / (shortest sequence length). (1) Edit distance: (matching columns) / (alignment length). (2) Edit distance excluding terminal gaps (default definition). (3) Marine Biological Lab definition counting each gap opening (internal or terminal) as a single mismatch, whether or not the gap was extended: 1.0- ((mismatches + gap openings)/(longest sequence length)). (4) BLAST definition, equivalent to –iddef 1 for global pairwise alignments.
phylum_cutoff Percent cutoff for phylum rank demarcation
class_cutoff Percent cutoff for class rank demarcation
order_cutoff Percent cutoff for order rank demarcation
family_cutoff Percent cutoff for family rank demarcation
genus_cutoff Percent cutoff for genus rank demarcation
species_cutoff Percent cutoff for species rank demarcation
Clone this wiki locally