You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’m encountering an issue while using Sawfish for joint calling.
In the VCF file generated, I noticed some loci appear as completely identical entries, differing from the typical multiallelic case.
Here is an example:
In this case, both entries describe a DEL with the same SVLEN (-340) and identical POS, but the GT results for individual samples differ.
What could be the cause of such duplicate entries in the VCF file, and should I filter out one of them? If so, what criteria should I use to decide which row to keep?
Best wishes
The text was updated successfully, but these errors were encountered:
These cases typically occur when the same deletion is found within 2 (or N) different contextual haplotypes. Sometimes these differences can be biologically interesting, but right now this often appears as the result of sequencing and assembly noise as well. It is another aspect of the larger cohort scaling for joint-genotyping (besides runtime), that will need more optimization in future since the rate of this phenomena tends to increase with sample count. The filtration decision will depend on the downstream application. We'll be adding more outputs soon to more easily match the full assembly contig to each VCF entry to help understand these cases
I’m encountering an issue while using Sawfish for joint calling.
In the VCF file generated, I noticed some loci appear as completely identical entries, differing from the typical multiallelic case.
Here is an example:
chr1 122014316 sawfish:106:1718:0:0 TTTGTAATGTCTGCAAGTGGATATTCAGACCTCTTTGAGGCCTTCGTTGGAAAAGGGATTTCTTCATATTATGCTAGACAGAATAATTCTCAGTAACTTCCTTGTGTTGTGTGTATTCAACTCACAGAGTTGAACGATCCTTTACAGAGAGCAGACTTGAAACACTCTTTTTGTGGAATTTGCAAGTGGAGATTTCAGCCGCTTTGAGGTCAATGGTACAATAGGAAATATCTTCCTATAGAAAATAGACAGAATGATTCTCATAAACTCCTTTGTGATGTGTGCGTTCAACTCACAGAGTTTAACCTTTCTTTTCATAGAGCAGTTAGGAAACACTTTGC T 999 PASS SVTYPE=DEL;END=122014656;SVLEN=-340;HOMLEN=2;HOMSEQ=TT GT:GQ:PL:AD:PS 0/1:32:32,0,232:5,1:. 1/1:12:200,12,0:0,4:. ./.:.:0,0,0:0,0:. ./.:.:0,0,0:0,0:. 0/1:2:702,0,2:1,15:.
chr1 122014316 sawfish:60:1727:1:0 TTTGTAATGTCTGCAAGTGGATATTCAGACCTCTTTGAGGCCTTCGTTGGAAAAGGGATTTCTTCATATTATGCTAGACAGAATAATTCTCAGTAACTTCCTTGTGTTGTGTGTATTCAACTCACAGAGTTGAACGATCCTTTACAGAGAGCAGACTTGAAACACTCTTTTTGTGGAATTTGCAAGTGGAGATTTCAGCCGCTTTGAGGTCAATGGTACAATAGGAAATATCTTCCTATAGAAAATAGACAGAATGATTCTCATAAACTCCTTTGTGATGTGTGCGTTCAACTCACAGAGTTTAACCTTTCTTTTCATAGAGCAGTTAGGAAACACTTTGC T 72 PASS SVTYPE=DEL;END=122014656;SVLEN=-340;HOMLEN=6;HOMSEQ=TTGTAA GT:GQ:PL:AD:PS 0/0:15:0,15,250:5,0:. ./.:.:0,0,0:0,0:. ./.:.:0,0,0:0,0:. ./.:.:0,0,0:0,0:. 0/0:6:0,6,100:2,0:.
In this case, both entries describe a DEL with the same SVLEN (-340) and identical POS, but the GT results for individual samples differ.
What could be the cause of such duplicate entries in the VCF file, and should I filter out one of them? If so, what criteria should I use to decide which row to keep?
Best wishes
The text was updated successfully, but these errors were encountered: