-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to Generate a VCF File Showing Differences Between HG002 Haplotypes from the HPRC Pangenome #4419
Comments
HG002 was held out of the release HPRC graphs. If you want to make your own hg002-only graph you can do so quite quickly with minigraph-cactus You'd feed it something like
And run with Otherwise if you already have a graph with HG002 in it, then I think |
Thank you for your response. I followed your method and tried it out, but I noticed that the contig lengths in the generated VCF file do not match the original lengths, resulting in positional misalignment. Could this be due to some trimming performed during pangenome construction(minigraph cactus)? Is it possible to obtain the trimmed fasta file used in pangenome construction?
|
Yeah, that's a known issue due to path fragmentation. The VCF itself is valid and coordinates correct, it's just that the contig lengths can be too short in the header. This only happens when multiple references are given, and only to references after the first (so hap2 in our example). You options are:
|
I used hg38 as the reference, then switched the reference using vg convert and constructed the VCF file with vg deconstruct. However, I noticed that the reference bases in the VCF file at the corresponding coordinates do not match the original bases in the input FASTA file. Can using an unclipped graph solve this problem? |
Hello VG Team, I am currently working with the HPRC pangenome and aiming to construct a VCF file that highlights the differences between the two haplotypes (hap1 and hap2) of the HG002 sample. Specifically, I want to generate a VCF file that represents hap2 relative to hap1 for HG002. So far, I have downloaded the HPRC pangenome data from the HPRC project, which includes multiple haplotypes for various samples, including HG002. I have attempted to use VG tools, such as vg convert to change the reference, but found that it doesn't seem to support operations targeting individual haplotypes, and vg deconstruct to obtain VCF files; however, it appears that it does not allow for processing single haplotypes separately. It seems that the current VG tools do not support operations on individual haplotypes within a sample. I am specifically looking to extract the variant differences between hap1 and hap2 of HG002 and represent them in a VCF file. Could you please guide me on how to effectively generate a VCF file that captures the differences between the two haplotypes (hap1 and hap2) of the HG002 sample from the HPRC pangenome? Thank you for your support!
The text was updated successfully, but these errors were encountered: