WA-focused ncov Nexstrain builds for genomic surveillance of SARS-CoV-2 in Washington state. Builds located here:
- Washington-focused SARS-CoV-2 genomic analysis: Past two months
- Washington-focused SARS-CoV-2 genomic analysis: Past four months
- Washington-focused SARS-CoV-2 genomic analysis: Past six months
First, install the ncov nextstrain pipeline and clone the ncov repository using git clone https://github.com/nextstrain/ncov
or gh repo clone nextstrain/ncov
.
Next, clone this repository in the ncov
folder. You can do this in the command-line terminal by navigating to the ncov
repository using cd ncov
and then cloning the repository using git clone https://github.com/DOH-SML1303/wa_sc2.git
or gh repo clone DOH-SML1303/wa_sc2
.
The WA-Focused build includes county level metadata for the sequences that may not be present in Genbank. For Molecular Epi folks, this data can be retrieved from the Y-drive. When you retrieve this file, change the file name to county_metadata.csv
and move it into the wa_data/
folder prior to running the fetch_n_process.smk
workflow.
The SARS-CoV-2 datasets are maintained by the Nextstrain team. You can view the list of remote datasets here.
A workflow has been configured for you to pull the inputs that you will need for the WA-focused build. To run this workflow, first cd wa_sc2
to navigate to the wa_sc2 repository
. Next, to run the workflow, run snakemake -s wa_workflow/fetch_n_process.smk --cores 6
. You can specify different number of cores to use for the workflow based on your computing capacity.
Prior to running the WA-focused build, you will need to update your input locations in the wa_sc2/wa_profiles/wa-subsampled-background-genbank/subsampled-background-builds.yaml
on lines 13 to 20. The subsampled-background-builds.yaml
is set up for you to move your data into an S3 Bucket, but you can keep the files locally. If you decide to move the data to an S3 bucket the recommended path would be s3://bucket-name/ncov/data/wa-sequences.tar.xz"
for both the metadata and sequencing data inputs.
If you decide to keep things locally then the file path might look something like this:
wa_sc2/wa_data/wa-sequences.tar.xz
This ncov Nexstrain build sources data from Genbank and inclues a 2m, 4m, and 6m build. Once you have your inputs and have updated the subsampled-background-builds.yaml
with the input location, you can run the pipeline. If you're running Nextstrain in a conda environment then you want to make sure you pull the latest ncov github repository updates first by running git pull
in the ncov
directory, activating the conda environment using conda activate nextstrain
followed by nextstrain update
to update Nextstrain. It's recommended to pull updates prior to running the pipeline. The same could also be said for this repo as well! :)
You can configure your AWS_ACCESS_KEY_ID
and AWS_SECRET_ACCESS_KEY
in your AWS credentials file which can be accessed in terminal using nano ~/.aws/credentials
, or you can simply export the environmental variables upon opening a terminal window using:
export AWS_ACCESS_KEY_ID= export AWS_SECRET_ACCESS_KEY=
If you're running Batch then you need to make sure all of the information is included in your ~/.nextstrain/config
. File. See this documentation for more information.
To run the builds with your data stored in an AWS Bucket, navigate to the ncov
directory and run:
nextstrain build --aws-batch-s3-bucket bucket-name --cpus=6 . --profile wa_sc2/wa_profiles/wa-subsampled-background-genbank
nextstrain build --cpus=6 . --profile wa_sc2/wa_profiles/wa-subsampled-background-genbank
You can check your results once the pipeline is done running using nextstrain view auspice