Skip to content

Commit

Permalink
Change nomenclature from SNP/SNV to "Mutation" (#439)
Browse files Browse the repository at this point in the history
* refactor from SNP/SNV -> mutation

* remove more instances of SNV/SNP

* update data

* update VOCs

* use proper config file for local dev

* server cleanup

* fix typos
  • Loading branch information
atc3 authored Nov 12, 2021
1 parent 2abfcea commit 7270830
Show file tree
Hide file tree
Showing 1,085 changed files with 187,731 additions and 202,217 deletions.
2 changes: 1 addition & 1 deletion .gitattributes
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
*.ipynb binary linguist-detectable=false

# Don't track changes in the example data
example_data_genbank/* -diff -merge -text
example_data_genbank/** -diff -merge -text
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,7 +159,7 @@ $ CONFIGFILE=../../config/config_genbank.yaml ./serve.sh # Run Flask server in d

Data analysis is run with [Snakemake](https://snakemake.readthedocs.io/en/stable/), Python scripts, and bioinformatics tools such as `bowtie2`. Please ensure that the conda environment is configured correctly (See [Pipeline Installation](#Pipeline-Installation)).

Data analysis is broken up into two snakemake pipelines: 1) ingestion and 2) main. The ingestion pipeline downloads, chunks, and prepares metadata for the main analysis, and the main pipeline analyzes sequences, extracts SNVs, and compiles data for display in the web application.
Data analysis is broken up into two snakemake pipelines: 1) ingestion and 2) main. The ingestion pipeline downloads, chunks, and prepares metadata for the main analysis, and the main pipeline analyzes sequences, extracts mutations, and compiles data for display in the web application.

Configuration of the pipeline is defined in the `config/config_[workflow].yaml` files.

Expand Down Expand Up @@ -213,7 +213,7 @@ cd workflow_main
snakemake --configfile ../config/config_genbank.yaml
```

This pipeline will align sequences to the reference sequence with `minimap2`, extract SNVs on both the NT and AA level, and combine all metadata and SNV information into one file: `data_package.json.gz`.
This pipeline will align sequences to the reference sequence with `minimap2`, extract mutations on both the NT and AA level, and combine all metadata and mutation information into one file: `data_package.json.gz`.

To pass this data onto the front-end application, host the `data_package.json.gz` file on an accessible endpoint, then specify that endpoint in the `data_package_url` field in the `config/config_[workflow]` file that you are using.

Expand Down
4 changes: 2 additions & 2 deletions config/config_alpha.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ chunk_size: 100000
# ANALYSIS
# --------------------

# SNPs with less than this number of global occurrences will be ignored
snp_count_threshold: 3
# Mutations with less than this number of global occurrences will be ignored
mutation_count_threshold: 3

# Threshold of prevalence to report a mutation as being a consensus
# mutation for a group (e.g., clade, lineage)
Expand Down
4 changes: 2 additions & 2 deletions config/config_custom.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ chunk_size: 100000
# ANALYSIS
# --------------------

# SNPs with less than this number of global occurrences will be ignored
snp_count_threshold: 3
# Mutations with less than this number of global occurrences will be ignored
mutation_count_threshold: 3

# Threshold of prevalence to report a mutation as being a consensus
# mutation for a group (e.g., clade, lineage)
Expand Down
4 changes: 2 additions & 2 deletions config/config_genbank.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ chunk_size: 100000
# ANALYSIS
# --------------------

# SNPs with less than this number of global occurrences will be ignored
snp_count_threshold: 3
# Mutations with less than this number of global occurrences will be ignored
mutation_count_threshold: 3

# Threshold of prevalence to report a mutation as being a consensus
# mutation for a group (e.g., clade, lineage)
Expand Down
4 changes: 2 additions & 2 deletions config/config_genbank_example.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ chunk_size: 100000
# ANALYSIS
# --------------------

# SNPs with less than this number of global occurrences will be ignored
snp_count_threshold: 3
# Mutations with less than this number of global occurrences will be ignored
mutation_count_threshold: 3

# Threshold of prevalence to report a mutation as being a consensus
# mutation for a group (e.g., clade, lineage)
Expand Down
4 changes: 2 additions & 2 deletions config/config_gisaid.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ chunk_size: 100000
# ANALYSIS
# --------------------

# SNPs with less than this number of global occurrences will be ignored
snp_count_threshold: 3
# Mutations with less than this number of global occurrences will be ignored
mutation_count_threshold: 3

# Threshold of prevalence to report a mutation as being a consensus
# mutation for a group (e.g., clade, lineage)
Expand Down
4 changes: 2 additions & 2 deletions config/config_gisaid_private.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ chunk_size: 100000
# ANALYSIS
# --------------------

# SNPs with less than this number of global occurrences will be ignored
snp_count_threshold: 3
# Mutations with less than this number of global occurrences will be ignored
mutation_count_threshold: 3

# Threshold of prevalence to report a mutation as being a consensus
# mutation for a group (e.g., clade, lineage)
Expand Down
4 changes: 2 additions & 2 deletions config/config_rsv_custom.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ chunk_size: 100000
# ANALYSIS
# --------------------

# SNPs with less than this number of global occurrences will be ignored
snp_count_threshold: 3
# Mutations with less than this number of global occurrences will be ignored
mutation_count_threshold: 3

# Threshold of prevalence to report a mutation as being a consensus
# mutation for a group (e.g., clade, lineage)
Expand Down
4 changes: 2 additions & 2 deletions config/config_rsv_genbank.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ chunk_size: 100000
# ANALYSIS
# --------------------

# SNPs with less than this number of global occurrences will be ignored
snp_count_threshold: 3
# Mutations with less than this number of global occurrences will be ignored
mutation_count_threshold: 3

# Threshold of prevalence to report a mutation as being a consensus
# mutation for a group (e.g., clade, lineage)
Expand Down
2 changes: 1 addition & 1 deletion docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ services:
- LOGINS=user1:pass1,user2:pass2
- FLASK_APP=cg_server/main.py
- FLASK_ENV=development
- CONFIGFILE=/opt/config/config_genbank.yaml
- CONFIGFILE=/opt/config/config_genbank_example.yaml
- CONSTANTSFILE=/opt/constants/defs.json
- DATA_PATH=/data
- STATIC_DATA_PATH=/opt/static_data
Expand Down
Loading

0 comments on commit 7270830

Please sign in to comment.