Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VCF Spec #23

Open
themkdemiiir opened this issue Dec 14, 2023 · 6 comments
Open

VCF Spec #23

themkdemiiir opened this issue Dec 14, 2023 · 6 comments

Comments

@themkdemiiir
Copy link

Hi,

I noticed you currently use version 4.3 of the VCF specification for tandem repeats. However, version 4.3 does not provide guidelines on handling tandem repeats, whereas version 4.4 does. Do you plan to follow the guidelines provided in version 4.4? Additionally, would you consider splitting the tandem repeats, as it can be challenging to annotate them if they are in the same structure as (AT)nTCG(GC)n?

Thank you.

@themkdemiiir
Copy link
Author

Could you also share an example TRGT vcf file with me? Thanks

@egor-dolzhenko
Copy link
Collaborator

Thanks for the questions. Note that the <CNV:TR> variants introduced in the 4.4 specification are designed for situations "when the exact [TR] sequence is not known". TRGT outputs full-length TR sequences and so does not currently use this variant representation.

And I agree that splitting complex repeat regions into constituent simple TRs can be helpful. We are planning to create some helper tools to decompose / annotate repeats after VCFs were generated. (Splitting complex tandem repeats into multiple VCF records can significantly complicate analyses involving multiple samples and also analyses of regions containing large clusters of simple TRs.) What kind of annotation are you interested in?

@themkdemiiir
Copy link
Author

I had initially planned to use VEP to annotate the consequences of TR in transcripts. However, due to the (AT)nTCG(GC)n structure of TRs, VEP could not annotate them accurately. Therefore, handling each TR as a separate VCF line would be much easier. I would appreciate having an option to split it accordingly.

@egor-dolzhenko
Copy link
Collaborator

Thanks for clarifying. I have very little experience with VEP, but I will try to learn how it works and see what we could do to make TRGT more compatible with this tool. In particular, we will definitely consider providing some kind of option to split complex repeats into constituent simple repeats either during VCF generation or after.

Also, if you are interested in annotating variants with VEP, it might be better to use general-purpose variant calling tools that VEP was designed for. We are working on our own TR-specific annotation engine that would annotate unusual expansions and composition changes within the repeat sequence.

Another idea is to apply variant normalization to TRGT VCFs. The resulting normalized VCFs might be more amenable to analysis with VEP.

@themkdemiiir
Copy link
Author

Hello @egor-dolzhenko, any news on the trgt annotator?

@egor-dolzhenko
Copy link
Collaborator

Hi @themkdemiiir. Yes, I believe a pre-print describing a new method for TR prioritization will come out sometime before the end of the year. I think this will be your best option for annotating TRs. I will try to remember to link the paper in this thread when it comes out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants