Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dramatic change in GTs with smaller size DEL SV #530

Open
tuannguyen8390 opened this issue Nov 20, 2024 · 10 comments
Open

Dramatic change in GTs with smaller size DEL SV #530

tuannguyen8390 opened this issue Nov 20, 2024 · 10 comments
Assignees
Labels

Comments

@tuannguyen8390
Copy link

Hi @fritzsedlazeck @lfpaulin @hermannromanek

We've been comparing results from new v.2.5.2 with older genotypes (v2.2 snf & then v 2.3.3 for joint call/merge with pct-seq=0) on a set of 108 ONT cattle sequences (~10x – 25X).

As expected, with v2.5.2 we found fewer large DEL with 1/1 genotypes & for several known recessive lethals genotypes were corrected to 0/1 as reported on a different issue thread.

However, with v2.5.2 we also found that 1/1 genotypes for smaller DEL (<5 kb) were also substantially changed from 1/1 to 0/1. For example, with DEL < 5kb we record that 15% of the GTs changed from 2.3.3 -> 2.5.2, with DEL 5-10Kb it's 19%. The number was <1% for INS.

We also test bam visualisation for one SV, the 1st and 5 individual was

Now called as 0/1:60:14:12:Sniffles2.DEL.20E2SA in 2.5.2 (was 1/1 in 2.3.3)
Now called as 0/1:60:23:23:Sniffles2.DEL.1969SA in 2.5.2 (was 1/1 in 2.3.3)

image

At the moment we haven't looked at more than a few examples so far, but will do soon tomorrow

Hardy-Weinberg Equilibrium (HWE) analysis indicates that version 2.5.2 also produced a large increase in proportion of smaller DEL deviating from HWE, with an excess of heterozygous calls compared to 2.3.3.

image

Cheers,

Tuan

@tuannguyen8390
Copy link
Author

Oh forgot to mention but we did compared the settings between 2.3.3 & 2.5.2 and notice that 2 option were disabled as default
--mapq 20 --min-alignment-length 1000, we tried to add back these two and run the above test.

@tuannguyen8390
Copy link
Author

Hi again,

image
We have been looking at the bam file recently and found that this issue seems to being affected by SV that has "overhang reads" spanning over it and caused some false positive counting toward the ref. allele.

We've been looking at the original vcf file (not snf) and found that the genotyping there was not correct also.

Cheers,

Tuan

@hermannromanek
Copy link
Collaborator

Hi Tuan,

Thanks for the detailed writeup - I'll take a look to see if I can reproduce this, this certainly looks like an issue with the genotyper.

Is it possible for you to share a snippet of the BAM-Files for the maybe the first two samples in your first post (i.e. just the reads surrounding the SV location)?

Did you generate the incorrect single sample VCF and SNF files with v2.5 as well?

Thanks,
Hermann

@tuannguyen8390
Copy link
Author

Hi @hermannromanek ,

Can you check these two bams, homoz_ind is the 1st individual, hetero is the 2nd.
https://drive.google.com/file/d/1DcqmnOvc2YtPwIsVmJyGvWw2JRNpmwdf/view?usp=drive_link
https://drive.google.com/file/d/1hoHNZhncihFUnFtTEWq7Ccicg4tqHC2V/view?usp=drive_link

Would be great if you can patch this :)

Many thanks,

Tuan

@tuannguyen8390
Copy link
Author

Forgot to mention but we did reproduce the sample VCF and SNF files with v2.5.2

Tuan

@hermannromanek
Copy link
Collaborator

Hi Tuan,

Thank you so much - those were incredibly helpful, I identified the problem and will try to come up with a fix. It looks like the genotyping bug we fixed was masking another bug with coverage calculations, which causes this.

I'll keep you updated,

Thanks,
Hermann

hermannromanek pushed a commit that referenced this issue Nov 28, 2024
(cherry picked from commit 3fb4494)
@hermannromanek hermannromanek self-assigned this Nov 28, 2024
@hermannromanek
Copy link
Collaborator

Hi Tuan,

I just pushed a fix for this, is it possible for you to run the development version (i.e. a checkout of this repo)? We'll try to get this fix into a v2.5.3 as soon as possible.

As you noticed correctly, this bug affected the genotyping of homozygous DELs, although I think it should have affected all DELs independently of length. Do you have any examples of DEL 1/1 calls of Sniffles 2.5.2, because I think there shouldn't have been any due to this bug, so there might be something else wrong there?

Thanks,
Hermann

@tuannguyen8390
Copy link
Author

Hi @hermannromanek ,

Thanks for the quick fix, it fixes the issue

(base) ζ zcat HOLAUS482-2.5.2-**def**.vcf.gz| gawk '$1==11 && $2>104181247 && $2 < 104182036' | head 
11      104181313       Sniffles2.DEL.20E2SA    N       <DEL>   60    PASS    **0/1**:60:14:12

(base) ζ zcat HOLAUS482-2.5.2-**fix**.vcf.gz| gawk '$1==11 && $2>104181247 && $2 < 104182036' | head    
11      104181313       Sniffles2.DEL.20E2SA    N       <DEL>   60      PASS      **1/1**:13:2:12

I looked at the old version and there were a few DEL - 18 vs a whooping 4541 INS, some below.

(base) ζ zcat HOLAUS482-2.5.2-def.vcf.gz | grep "1/1" | grep DEL                   
1       56773993        Sniffles2.DEL.1C09S0    N       <DEL>   60      PASS    PRECISE;SVTYPE=DEL;SVLEN=-27013;END=56801005;SUPPORT=2;COVERAGE=2,0,0,0,7;STRAND=+-;STDEV_LEN=0.000;STDEV_POS=0.000;VAF=1.000       GT:GQ:DR:DV     1/1:5:0:2
2       1418646 Sniffles2.DEL.790S1     N       <DEL>   60      PASS    PRECISE;SVTYPE=DEL;SVLEN=-1726;END=1420371;SUPPORT=9;COVERAGE=10,0,0,0,9;STRAND=+-;STDEV_LEN=0.894;STDEV_POS=0.000;VAF=1.000        GT:GQ:DR:DV     1/1:25:0:9
7       17373292        Sniffles2.DEL.9CDS6     N       <DEL>   60      PASS    PRECISE;SVTYPE=DEL;SVLEN=-32940;END=17406231;SUPPORT=2;COVERAGE=1,0,0,0,2;STRAND=+-;STDEV_LEN=0.000;STDEV_POS=0.000;VAF=1.000       GT:GQ:DR:DV     1/1:5:0:2
15      50856869        Sniffles2.DEL.142ASE    N       <DEL>   60      PASS    PRECISE;SVTYPE=DEL;SVLEN=-30694;END=50887562;SUPPORT=3;COVERAGE=2,0,0,0,4;STRAND=+-;STDEV_LEN=14.503;STDEV_POS=0.000;VAF=1.000      GT:GQ:DR:DV     1/1:8:0:3
17      68057710        Sniffles2.DEL.16A9S10   N       <DEL>   60      PASS    PRECISE;SVTYPE=DEL;SVLEN=-158781;END=68216490;SUPPORT=17;COVERAGE=14,0,10,0,18;STRAND=+-;STDEV_LEN=0.000;STDEV_POS=0.000;VAF=0.850  GT:GQ:DR:DV     1/1:17:3:17

Cheers,

Tuan

@tuannguyen8390
Copy link
Author

tuannguyen8390 commented Nov 28, 2024

Hi @hermannromanek ,

In addition to the checking up per individual, we also look at SV stats globally and retest using our 108 anims set. HWE & ExHet now coming back to the level of 233.

image

We also meet with our consortium partners this morning, and propose a rerun with new SNF version (2.5.2), did you come up with any solution where we can use the old SNF (which @fritzsedlazeck discussed in issue #505) ?

Also, once the release has been approve, could you please update the conda package :D

Cheers,

Tuan

hermannromanek pushed a commit that referenced this issue Dec 16, 2024
@fritzsedlazeck
Copy link
Owner

Hey @tuannguyen8390 we just put out the new version that should fix this GT issue. Conda update was also pushed. Part of the release was a --re-qc parameter for the merge. This resets the filters. Note that this improves calling on old SNF files but there might be some cases where its not equivalent of the new version of SNF files.. Hope that helps
Cheers
Fritz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants