-
Notifications
You must be signed in to change notification settings - Fork 1
isoTAX
Brendan Daisley edited this page Mar 11, 2024
·
1 revision
This function performs taxonomic classification steps by searching query Sanger sequences against specified database of interest. Takes CSV input files, extracts FASTA-formatted query sequences and performs global alignment against specified database of interest via Needleman-Wunsch algorithm by wrapping the –usearch_global command implemented in VSEARCH. Default taxonomic rank cutoffs for 16S rRNA gene sequences are based on Yarza et al. 2014, Nat Rev Microbiol.
isoTAX(
input = NULL,
export_html = TRUE,
export_csv = TRUE,
quick_search = TRUE,
db = "16S",
db_path = NULL,
iddef = 2,
phylum_cutoff = 75,
class_cutoff = 78.5,
order_cutoff = 82,
family_cutoff = 86.5,
genus_cutoff = 96.5,
species_cutoff = 98.7)
Parameter | Description |
---|---|
input | Path of CSV output file from isoQC step. |
export_html | (Default=TRUE) Output the results as an HTML file. |
export_csv | (Default=TRUE) Output the results as a CSV file. |
quick_search | (Default=FALSE) Whether or not to perform a comprehensive database search (i.e. optimal global alignment). If TRUE, performs quick search equivalent to setting VSEARCH parameters "–maxaccepts 100 –maxrejects 100". If FALSE, performs comprehensive search equivalent to setting VSEARCH parameters "–maxaccepts 0 –maxrejects 0" |
db | (Default="16S") Select database option(s) including "16S" (for searching against the NCBI Refseq targeted loci 16S rRNA database), "ITS" (for searching against the NCBI Refseq targeted loci ITS database. For combined databases in cases where input sequences are derived from bacteria and fungi, select "16S|ITS". Setting to anything other than db=NULL or db="custom" causes 'db.path' parameter to be ignored. |
db_path | Path of FASTA-formatted database sequence file. Ignored if 'db' parameter is set to anything other than NULL or "custom". |
iddef | Set pairwise identity definition as per VSEARCH definitions (Default=2, and is recommended for highest taxonomic accuracy) (0) CD-HIT definition: (matching columns) / (shortest sequence length). (1) Edit distance: (matching columns) / (alignment length). (2) Edit distance excluding terminal gaps (default definition). (3) Marine Biological Lab definition counting each gap opening (internal or terminal) as a single mismatch, whether or not the gap was extended: 1.0- ((mismatches + gap openings)/(longest sequence length)). (4) BLAST definition, equivalent to –iddef 1 for global pairwise alignments. |
phylum_cutoff | Percent cutoff for phylum rank demarcation |
class_cutoff | Percent cutoff for class rank demarcation |
order_cutoff | Percent cutoff for order rank demarcation |
family_cutoff | Percent cutoff for family rank demarcation |
genus_cutoff | Percent cutoff for genus rank demarcation |
species_cutoff | Percent cutoff for species rank demarcation |