-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add pipeline to generate, format, filter SILVA database #29
Comments
If using NR, the dereplicate step will likely not do anything, unless they are using the full db or the making use of |
About the SILVA rep tree. I am thinking we can simply filter the curated SILVA alignment based on the seqIDs that survive the processing through rescript. Then we can run q2-phylogeny on a that alignment, after masking. I guess another route would be to manually extract the tree from ARB and prune it based on the seqIDs we have remaining, just to be consistent. My preference would be for making the tree ourselves, just so that it is automated. |
I like the idea of pruning the existing alignment |
re: derep, good point. Out of curiosity, how does SILVA dereplicate the taxonomy in the NR database? do they do any consensus/majority rule? this issue will be to make a Q2 pipeline. We should make another larger pipeline (using snakemake, or just a shell script) to make the full formatted release including dbs for V4 and maybe other subdomains. |
Not sure, specifically, how they do the consensus taxonomy. But there is indeed, quite a bit of manual curation involved. See here. |
I supposed we can consider making an a separate pipeline (or optional steps to insert at 5), to generate an amplicon-region-specific reference? |
yeah, maybe input a list of primer pairs and the pipeline could (optionally) generate amplicon-specific references for each |
inputs:
steps:
parse_silva_taxonomy
? any way to make auto-download optional?screen_seqs
filter_seqs_length_by_taxon
dereplicate
(question: use one more more modes for taxonomy derep?)evaluate_taxonomy
andcross_validate
(with kfold disabled)outputs:
The text was updated successfully, but these errors were encountered: