You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When a region is left uncovered (especially an amplicon drop out) with zero reads, and that region contains a site that defines a viral variant defined by a single SNV, the model appears to assume a 50/50 split between the ref and alt alleles. This appears to be ameliorated by setting a minimum coverage value and is likely caused by an unexpected assumption being made when min coverage is set to 0 that the uncovered base is evenly split between known alt alleles and ref.
Is this the desired behavior?
The text was updated successfully, but these errors were encountered:
Hi @michael-weinstein,
Sorry for the delay! The core model used for inference actually fully ignores sites with zero reads, rather than assuming a 50/50 split. As you mention, if you don't account for this using the --depthcutoff option, you'll get a 50/50 split between lineages of equal probability, which can potentially be misleading. This is the desired behavior, although a bit confusing when interpreting the data without additional context.
The "ideal" workflow in this case is to explicitly account for these non-covered sites using --depthcutoff (you can just set it to 1 so that sites with low, but nonzero depth are included. This will return a secondary output yml file, that describes each of these degenerate groupings and their associated MRCA.
That makes sense, although as you say, it can be confusing/misleading. BTW, I am building this into a pipeline for looking at wastewater variant detection. Is this something you are interested in?
When a region is left uncovered (especially an amplicon drop out) with zero reads, and that region contains a site that defines a viral variant defined by a single SNV, the model appears to assume a 50/50 split between the ref and alt alleles. This appears to be ameliorated by setting a minimum coverage value and is likely caused by an unexpected assumption being made when min coverage is set to 0 that the uncovered base is evenly split between known alt alleles and ref.
Is this the desired behavior?
The text was updated successfully, but these errors were encountered: