Analysis of SARS-CoV-2 antigenic drift and evolutionary fitness

Installation

All dependencies can be met by running through the Nextstrain runtime. See docs.nextstrain.org for installation instructions.

Provision metadata locally

Windowed analyses require access to SARS-CoV-2 metadata. This can be acquired via

aws s3 cp s3://nextstrain-ncov-private/metadata.tsv.gz data/gisaid_metadata.tsv.gz
zstd -c -d data/gisaid_metadata.tsv.gz \
   | tsv-select -H -f strain,date,date_submitted,country,clade_nextstrain,Nextclade_pango,QC_overall_status \
   | gzip -c > data/gisaid_metadata_filtered.tsv.gz

Access to this S3 bucket is restricted based on GISAID data sharing policies.

Non-windowed analyses will provision sequence counts from forecasts-ncov. These reduced sequence counts are publicly available.

Windowed analyses require working from detailed metadata as the rules process_metadata and observe_over_period restrict sequence counts to follow what was actually available based on submission dates, rather than what's available at the current moment.

Workflow

Once metadata is provision locally, run the workflow with

nextstrain build . all

This will generate sequence count files for all analyses specificied in ./config/config.yaml. Under analysis_period in the config file, you can specify an analysis name as well as minimum and maximum dates for the analysis, the pivot variant of interest, lineages to forcibly include, and predictors to use for the regression-prior model e.g.

analysis_period:
  xbb15:
    min_date: "2023-01-01"
    max_date: "2023-12-01"
    pivot: "XBB.1.5"
    force_include: "defaults/xbb15/force_include_lineages.txt"
    predictor_names:
      - "spike pseudovirus DMS human sera escape relative to XBB.1.5"
      - "spike pseudovirus DMS ACE2 binding relative to XBB.1.5"
      - "RBD yeast-display DMS ACE2 affinity relative to XBB.1.5"
      - "RBD yeast-display DMS RBD expression relative to XBB.1.5"
      - "RBD yeast-display DMS escape relative to XBB.1.5"

Sequence counts and variant relationships

It produces sequence count files for windowed analyses, non-windowed analyses, collapsing lineages into their parents based on count thresholds specified in ./config/config.yaml, as well as generating variant relationship files, so that the innovation model can be fit.

Collapsed sequence counts follow the form data/{analysis_period}/collapsed_seq_counts.tsv and variant relationships follow the form data/{analysis_period}/pango_variant_relationships.tsv.

MLR innovation model

The collapsed sequence count files and variant relationships are used to estimate relative fitness using the uninformed (normal-prior) innovation model.

For each analysis period, this produces results/{analysis_period}/growth_advantage.tsv and results/{analysis_periods}/growth_advantage_delta.tsv.

The model posteriors will also be saved under results/{analysis_period}/posteriors/data_{location}.pkl and results/{analysis_period}/posteriors/samples_{location}.pkl.

If predictor names are provided, they are used to fit the regression prior innovation model. These predictor-informed results are stored under results/{analysis_period}/informed/growth_advantages.tsv and results/{analysis_period}/informed/growth_advantages.tsv.

Name		Name	Last commit message	Last commit date
Latest commit History 4,831 Commits
config		config
defaults		defaults
escape-mlr-comparison		escape-mlr-comparison
manuscript		manuscript
mlr-fitness		mlr-fitness
ncov-workflow		ncov-workflow
notebooks		notebooks
predictors		predictors
scripts		scripts
utils		utils
workflow/snakemake_rules		workflow/snakemake_rules
.dockstore.yml		.dockstore.yml
.gitignore		.gitignore
README.md		README.md
Snakefile		Snakefile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analysis of SARS-CoV-2 antigenic drift and evolutionary fitness

Installation

Provision metadata locally

Workflow

Sequence counts and variant relationships

MLR innovation model

About

Releases

Packages

Languages

blab/ncov-escape

Folders and files

Latest commit

History

Repository files navigation

Analysis of SARS-CoV-2 antigenic drift and evolutionary fitness

Installation

Provision metadata locally

Workflow

Sequence counts and variant relationships

MLR innovation model

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages