GitHub - kbeigel/vcf_ref_extraction: Extracts reference allele for each site SNP substitution site, concatenates the SNP ref alleles to new FASTA

README

The Python script, ‘vcf_ref_extraction.py,’ is available on GitHub (github.com/kbeigel/vcf_ref_extraction). This script takes a .vcf file as input and examines the .vcf file and extracts all of the bases from sites in the reference genome that were identified as SNP sites. These SNPs are kept in the same order as they appear in the .vcf (which is also the same order of SNPs in the PHYLIP file of sample SNP data) and output into a text file. Using Mesquite (v. 3.61), the output reference file (.txt or FASTA) was added back into the PHYLIP file (Maddison and Maddison 2019). As a side note, this script could be tweaked to extract other information stored in the .vcf file such as counts and identities of insertions and deletions that appeared in the variant-calling process (which are not a part of the output SNP dataset; SNPs are the result of TYP == SUB).

1. Extraction of reference sequence from .vcf

The script was written and desgined to be run in the PyCharm IDE Community Edition 2019.3.3 with the Python 3.8 interpreter on Windows 10; execution of the script may take a couple of minutes since .vcf files can be large. The script contains information about how to call the .vcf file and designate an output file name. The script also prints (to the Run window) the total number of characters output, the number of characters omitted, and the total characters.

2. What to do with reference bases after extraction

The reference bases in the output file can then be added to the SNP data PHYLIP file containing the sample data. The order of sites is retained (SNP bases are concatenated). Adding the reference to the SNP data file can be done using a program like Mesquite v3.61: Open the SNP PHYLIP/fasta file in Mesquite and add the reference sequence from a .txt or FASTA file.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.gitignore		.gitignore
README.md		README.md
vcf_ref_extraction.py		vcf_ref_extraction.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

README

1. Extraction of reference sequence from .vcf

2. What to do with reference bases after extraction

About

Releases

Packages

Languages

kbeigel/vcf_ref_extraction

Folders and files

Latest commit

History

Repository files navigation

README

1. Extraction of reference sequence from .vcf

2. What to do with reference bases after extraction

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages