-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mask fewer sites, the mask sites include lots of false positives #384
Comments
You are definitely right about this. Many of those recommendations have outlived their usefulness and it is something @AngieHinrichs and I have been thinking about how to clean up. Briefly, a proposed solution is:
This will certainly break some stuff and we'll have to figure that out when we get there. @AngieHinrichs and @corneliusroemer what do you think? I also think it may be a good time to operationalize the branch-specific screwy site detection approach I made. |
Sounds good. I'll try to get to the MAPLE-ification and matOptimize soon. Yes, for sure I expect 11083 and some others to still be problematic, at least in some major lineages, but let's find out!
As in, recode in C++ in matUtils so it runs faster than its current Pythonic 19 hours? Or just run it every week or month or so, and mask accordingly? |
Cool. Thanks, Angie. Let's not bother with a recode until we decide we really like it and want to run it often. It is not clear to me that branch masking will be something we want to run more than say monthly-ish? |
Usher SARS-CoV-2 masks quite a lot of sites (I think around 270, i.e. almost 1% of genome) based on this vcf: https://raw.githubusercontent.com/W-L/ProblematicSites_SARS-CoV2/master/problematic_sites_sarsCov2.vcf but I think that list of sites includes quite a lot of things that are no longer problematic.
The last update to that mask list was more than 3 years ago, so it's clearly no longer maintained. It might be worth transitioning away from it. Maybe turn it the existing sites into branch specific masks for old clades, but not for new, recent ones?
I noticed this when desigating stuff within KS.1.1.1, trying to untangle what happened. The two sites here being masked really makes things more difficult to untangle:
C2091T
andC16887T
.These are the relevant lines:
The text was updated successfully, but these errors were encountered: