-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Merging of different variants VarDict and TNscope #1519
Comments
I emailed Sentieon about the possibility of TNscope outputting MNVs in the case of these variants and got this response from Don Freed: " TNscope does not have any argument for merging multiple variants into a single MNP. However, we have an open-source tool that is able to merge variants into MNPs in a post-calling script, https://github.com/Sentieon/sentieon-scripts/tree/master/merge_mnp. Here is an example usage:
The first step to add the PASS filter may or may not be required depending if your input VCF already has PASS variants. Here is a real-world example from some test data that I had on hand. Here are the variants before the merge script:
The merge script command line, sentieon pyexec ./merge_mnp/merge_mnp.py test-somatic/test_merge-mnp.vcf.gz /home/regression/references/b37/hs37d5.fa --max_distance 5. Here is the output:
The script will output a new MNP from the original calls. The original records are also output with a "MERGED" flag. Best regards, This sounds like a decent solution, though as it doesn't have the bamfile as an input I don't think it can know which variants are actually part of the same molecule, and therefore should be interpreted together, and which ones are in different molecules and should be interpreted separately. It seems in the case of the VarDict variants the adjacent SNVs are part of the same reads, and should be interpreted together as they should influence how VEP determines the effect on the protein. I think for the moment maybe it is best to just keep both representations, the atomised versions from TNscope, and the MNVs from VarDict. Because I don't know how else to merge the variants in a reliable way, other than making a complex tool to read the bamfile and seeing which adjacent TNscope variants should be merged, or to use VarDict as the reference and merge variants that fit into a VarDict MNV. But I feel like it's all too much work for little benefit at the moment when we have so much else that's prioritised. |
Update on this...seems like TNscope does output phased variants, so it should be possible to connect adjacent SNVs into a MNV. So if the script that Don linked earlier takes this into account then it should be possible to create true MVNs. There's also the possibility that VEP already reads this phased info and takes it into account. But we would still have the issue of different representations of the same biological event. To start I'll look into this script and see if it uses the phasing info. (Five seconds later...ok, it does look at phasing info) |
Alternatives:
|
I looked into It's not aiming to solve the issue here, which is to merge variants at different positions, if they occur within the same reads. But to merge and create 1 variant if there are multiple variants at the same position. Such as: REF: TAAAAAAAAA ALT: T,TAAAAAAAAAAA,TAAAAAAAAAAAAA I haven't seen any other option from bcftools that seem promising. I'll continue with testing the Sentieon script for now |
I took tested the sentieon merge script on the TWIST reference sample selectbengal, and put the results of my test in this google sheet: https://docs.google.com/spreadsheets/d/1pocrlClqrNoDNBAIF_hZMyurMxYe5K649e1iLXPkYhI/edit?gid=0#gid=0 As written by Sentieon the tool keeps the variants that were merged as well, setting the filter to MERGED. I checked 4 of the MNVs that it created, and compared it to the MNV called by VarDict, and in all 4 variants that I checked the final variant was the same as the one made by VarDict. This means that if the tnscope VCF had been pre-processed with this script before merging with VarDict in my python-script, all of these variants would be merged to 1, which is the behaviour that we would prefer. The only remaining issue is how the script handles variants with filters set. If a variant has a filter it is never merged with any other variant, but simply output as it is. This is an issue for us for at least a couple of reason that I can think of now:
The script that Sentieon has shared is under a copy-right which allows for modifications however, so maybe this can be fixed with some tweaking. In which case maybe we can create a different solution and simply stack filters if any occur in separate SNVs, removing PASS and keeping the unique set.
|
Replaced by user story: #1525 |
Description
The current method in balsamic v16.0.0 using bcftools concat to merge the VCFs doesn't require that the variants are matching perfectly in the ALT column, so for instance if a variant has been called as a MNV in VarDict and as separate SNVs in TNscope, it merges only the first variant.
Such as if such a MNV exists in VarDict
REF: ATTC -> ALT: TTTA
And in TNscope as two separate variants:
REF: A -> ALT: T
REF: C -> ALT: A
Then it would merge the 1st of the TNscope variants into the VarDict one, and keep the second variant.
How to reproduce
No response
Expected behaviour
Ideally the two different ways of representing the same variant should be able to be consolidated into 1 event.
But this may require quite a bit of work, and an intermediate solution would be to not merge these variants, but to maintain both representations of the event.
Anything else?
Example from IGV:
Top track is 2 variants from TNscope, bottom is 1 variant from VarDict:
Here is the variants after merging:
Top track is using the current v16.0.0 method bcftools concat, and the bottom one is using a custom python script
Pipeline version
16.0.0
The text was updated successfully, but these errors were encountered: