You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
dada2 doesn't recommend merging and then denoising, because the quality scores reported by a machine have a different relationship with the actual error rate than the quality scores generated by merging algorithms. I can see this quite clearly in a dataset where I separately use learnErrors on reads that have been merged with vsearch vs reads that could not be merged and were instead concatenated (the second category is also biased to have more errors just because there could be reads that should have been merged but couldn't, but that is also worth exploring...)
perhaps we could use dada2's learnErrors function on different subsets of the data to simply correct Qscores in a fastq, then pool the different subsets together for a single denoising. I considered simply denoising subsets separately, but there may be cases where reads in different subsets should be pooled into a single ASV. this could be pretty simple: use the loess error model to simply change the quality score and re-write the fastq.
this workflow could be useful in edge cases like fungal ITS where there's so much length variation; we don't want to just use the forward reads because we'd lose info from the rev; we don't want to use dada2 separately on fwd and rev as currently recommended because that has produced artificial chimeras in my experience, and we don't want to merge reads and discard all those not merged; because many of the unmerged ones could simply be too long to have significant overlaps
The text was updated successfully, but these errors were encountered:
dada2 doesn't recommend merging and then denoising, because the quality scores reported by a machine have a different relationship with the actual error rate than the quality scores generated by merging algorithms. I can see this quite clearly in a dataset where I separately use learnErrors on reads that have been merged with vsearch vs reads that could not be merged and were instead concatenated (the second category is also biased to have more errors just because there could be reads that should have been merged but couldn't, but that is also worth exploring...)
perhaps we could use dada2's learnErrors function on different subsets of the data to simply correct Qscores in a fastq, then pool the different subsets together for a single denoising. I considered simply denoising subsets separately, but there may be cases where reads in different subsets should be pooled into a single ASV. this could be pretty simple: use the loess error model to simply change the quality score and re-write the fastq.
this workflow could be useful in edge cases like fungal ITS where there's so much length variation; we don't want to just use the forward reads because we'd lose info from the rev; we don't want to use dada2 separately on fwd and rev as currently recommended because that has produced artificial chimeras in my experience, and we don't want to merge reads and discard all those not merged; because many of the unmerged ones could simply be too long to have significant overlaps
The text was updated successfully, but these errors were encountered: