-
Notifications
You must be signed in to change notification settings - Fork 2
VCF chromosome contig names
Dave Lawrence edited this page Nov 16, 2023
·
2 revisions
When handling VCFs it's crucial that chromosome names match what we expect. I think it's best to just use contig ids (eg "NC_000001.10") as that is explicit and you can't get builds mixed up.
If you want to convert chromosome names, you can do so via bcftools annotate:
bcftools annotate --rename-chrs chrom_contig.map file.vcf -o converted_file.vcf
you can generate the mapping files (converting to contig accession) via running in VG Django shell:
from snpdb.models import GenomeBuild
def write_chrom_mapping_file(genome_build):
with open(f"chrom_mapping_{genome_build}.map", "w") as f:
for contig in genome_build.contigs.filter(role='AM'):
f.write("\t".join([contig.name, contig.refseq_accession]) + "\n")
write_chrom_mapping_file(GenomeBuild.grch37())
write_chrom_mapping_file(GenomeBuild.grch38())
I have also added these files in:
snpdb/genome/chrom_mapping_GRCh37.map
snpdb/genome/chrom_mapping_GRCh38.map