Coxsackievirus A16 Nextstrain Analysis

This repository provides a comprehensive Nextstrain analysis of Coxsackievirus A16. You can choose to perform either a VP1 run (>=600 base pairs) or a whole genome run (>=6400 base pairs).

For those unfamiliar with Nextstrain or needing installation guidance, please refer to the Nextstrain documentation.

Enhancing the Analysis

Most of the data for this analysis can be obtained from NCBI Virus. Instructions for downloading sequences are provided at the end of this README under Sequences.

Repository Organization

This repository includes the following directories and files:

ingest: Contains Python scripts and the snakefile for automatic downloading of CVA16 sequences and metadata.
scripts: Custom Python scripts called by the snakefile.
snakefile: The entire computational pipeline, managed using Snakemake. Snakemake documentation can be found here.
vp1: Sequences and configuration files for the VP1 run.
whole_genome: Sequences and configuration files for the whole genome run.

Configuration Files

The config, vp1/config, and whole_genome/config directories contain necessary configuration files:

colors.tsv: Color scheme
geo_regions.tsv: Geographical locations
lat_longs.tsv: Latitude data
dropped_strains.txt: Dropped strains
clades_genome.tsv: Virus clade assignments
reference_sequence.gb: Reference sequence
auspice_config.json: Auspice configuration file

The reference sequence used is G-10, accession number U05876.

Quickstart

Setup

Nextstrain Environment

Install the Nextstrain environment by following these instructions.

Running a Build

Activate the Nextstrain environment:

conda activate nextstrain

To perform a build, run:

snakemake --cores 1

For specific builds:

VP1 build:

snakemake auspice/cv_a16_vp1.json --cores 1

Whole genome build:

snakemake auspice/cv_a16_whole_genome.json --cores 1

First steps

To run the ingest, you will need some specific reference files, such as a reference.fasta or annotation.gff3 file.

In the config file: check that the taxid is correct
To get these files you have to run the script generate_from_genbank.py manually.
```
python3 ingest/generate_from_genbank.py --reference "U05876.1" --output-dir "whole_genome/config/"
```
- You need to specify a few things: [0];[product];[2].
- It will create the files in the subdirectory data/references.
- These files will be used by the ingest snakefile.
Check that the attributes in data/references/pathogen.json are up to date.
Run the `ingest' snakefile (either manually or using the main snakefile).
- Depending on your system you may need to run chmod +x ./vendored/*; chmod +x ./bin/* first.
Run the main snakefile.

Visualizing the Build

To visualize the build, use Auspice:

auspice view --datasetDir auspice

To run two visualizations simultaneously, you may need to set the port:

export PORT=4001

Sequences

Sequences can be downloaded manually or automatically.

Manual Download: Visit NCBI Virus, search for CVA16 or Taxid 31704, and download the sequences.
Automated Download: The ingest functionality, included in the main snakefile, handles automatic downloading.

The ingest pipeline is based on the Nextstrain RSV ingest workflow. Running the ingest pipeline produces data/metadata.tsv and data/sequences.fasta.

Updating Vendored Scripts

This repository uses git subrepo to manage copies of ingest scripts in ingest/vendored. To pull new changes from the central ingest repository, first install git subrepo and then follow the instructions in ingest/vendored/README.md.

Feedback

For questions or comments, contact me via GitHub or [email protected].

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Coxsackievirus A16 Nextstrain Analysis

Enhancing the Analysis

Repository Organization

Configuration Files

Quickstart

Setup

Nextstrain Environment

Running a Build

First steps

Visualizing the Build

Sequences

Updating Vendored Scripts

Feedback

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
config		config
ingest		ingest
scripts		scripts
vp1/config		vp1/config
whole_genome/config		whole_genome/config
.gitignore		.gitignore
README.md		README.md
snakefile		snakefile

hodcroftlab/coxsackievirus_a16

Folders and files

Latest commit

History

Repository files navigation

Coxsackievirus A16 Nextstrain Analysis

Enhancing the Analysis

Repository Organization

Configuration Files

Quickstart

Setup

Nextstrain Environment

Running a Build

First steps

Visualizing the Build

Sequences

Updating Vendored Scripts

Feedback

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages