Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

create sensitive dada2 workflow #28

Open
rmcminds opened this issue Feb 2, 2023 · 0 comments
Open

create sensitive dada2 workflow #28

rmcminds opened this issue Feb 2, 2023 · 0 comments

Comments

@rmcminds
Copy link
Contributor

rmcminds commented Feb 2, 2023

dada2 doesn't recommend merging and then denoising, because the quality scores reported by a machine have a different relationship with the actual error rate than the quality scores generated by merging algorithms. I can see this quite clearly in a dataset where I separately use learnErrors on reads that have been merged with vsearch vs reads that could not be merged and were instead concatenated (the second category is also biased to have more errors just because there could be reads that should have been merged but couldn't, but that is also worth exploring...)

perhaps we could use dada2's learnErrors function on different subsets of the data to simply correct Qscores in a fastq, then pool the different subsets together for a single denoising. I considered simply denoising subsets separately, but there may be cases where reads in different subsets should be pooled into a single ASV. this could be pretty simple: use the loess error model to simply change the quality score and re-write the fastq.

this workflow could be useful in edge cases like fungal ITS where there's so much length variation; we don't want to just use the forward reads because we'd lose info from the rev; we don't want to use dada2 separately on fwd and rev as currently recommended because that has produced artificial chimeras in my experience, and we don't want to merge reads and discard all those not merged; because many of the unmerged ones could simply be too long to have significant overlaps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant