-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarification: GTDB mixed orientation warning only applies to full length refs? #16
Comments
howdy! yes that's correct — extract reads will orient the sequences (as long as the primers hit the F/RC sequences). would you like to modify this warning to clarify? Personally I don't see much harm in keeping the "experimental" label (since to my knowledge we have not really tested the GTDB bespoke weights extensively), but it would be good to clarify. Another future option (for the FL seqs) would be to use RESCRIPt to re-orient the reads. |
Sounds good, updated an extra line on this warning in a PR. As for using RESCRIPt to fix the full-length reads, would that need another set of reference reads to align against? In that case that may need some benchmarking to fine-tune the alignment parameters right? |
I agree, orienting in the same direction might need a little bit of testing to establish a working protocol, but one could use fairly loose %id and coverage settings to re-align against a small reference db of sequences in a known orientation. I am not sure that I would call it benchmarking per se. A database in both orientations might actually require more benchmarking in my opinion than attempting to orient all in the same direction, since this could lead to changes in classifier performance. |
Gotcha! I didn't realize that would change classifier performance. What do you reckon a good starting database and %id Something like 65% GG at 65% coverage? |
yeah that sounds reasonable... I think that %id is approx what deblur uses for pre-filtering reads, so maybe we can use that as precedent. |
Howdy!
I noticed the disclaimer on the GTDB data page:
Which is of course a valid concern, but this only applies to the full length refs, right? Since the V4 ones go through the
extract read
process initially which correct these mixed orientations? (as per @nbokulich's note here).Or... does
extract read
's--p-read-orientation both
only apply both orientation in its search and doesn't actually correct the reads in the output?Thanks!
The text was updated successfully, but these errors were encountered: