-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Fix various metadata bugs * Fix issues with fasta file parsing * Remove concurrent index building - can't do it in the middle of a transaction * More verbose sequence processing to identify filtering bottlenecks * year partition breaks for all flu sites * Add build scripts for flu genbank site * Add dev environment for flu genbank * formatting * Fix gzip bug
- Loading branch information
Showing
23 changed files
with
269 additions
and
73 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,181 @@ | ||
# ------------------ | ||
# GLOBAL | ||
# ------------------ | ||
|
||
# Virus this config is written for | ||
virus: "flu" | ||
|
||
# Path to folder with downloaded and processed data | ||
# This path is relative to the project root | ||
data_folder: "data_flu_small" | ||
|
||
# Path to folder with genome information (reference.fasta, genes.json, proteins.json) | ||
# This path is relative to the project root | ||
static_data_folder: "static_data/flu" | ||
|
||
# Database for this virus | ||
postgres_db: "flu_genbank_dev" | ||
|
||
# ------------------ | ||
# INGEST | ||
# ------------------ | ||
|
||
# Download sequences from GenBank (NCBI Virus vvsearch) in chunks of this size | ||
# D = day, W = week, M = month, Y = year | ||
dl_chunk_period: "Y" | ||
|
||
# Number of genomes to load into memory before flushing to disk | ||
chunk_size: 10000 | ||
|
||
# -------------------- | ||
# ANALYSIS | ||
# -------------------- | ||
|
||
# Don't process sequences prior to this date | ||
# Leave empty to ignore | ||
start_date_cutoff: | ||
# Don't process sequences after this date | ||
# Leave empty to ignore | ||
end_date_cutoff: | ||
|
||
# Don't process sequences after X days ago | ||
# Leave empty to ignore | ||
start_date_cutoff_days_ago: | ||
# Don't process sequences prior to X days ago | ||
# Leave empty to ignore | ||
end_date_cutoff_days_ago: | ||
|
||
segments: ["1", "2", "3", "4", "5", "6", "7", "8"] | ||
|
||
# Insertions or deletions with more than this difference in bases between the | ||
# ref and the alt will be discarded (NT level only) | ||
max_indel_length: 100 | ||
|
||
# Mutations with less than this number of global occurrences will be ignored | ||
mutation_count_threshold: 3 | ||
|
||
# Threshold of prevalence to report a mutation as being a consensus | ||
# mutation for a group (e.g., clade, lineage) | ||
consensus_fraction: 0.9 | ||
|
||
# Threshold of prevalence to report a mutation as being associated | ||
# with a group (e.g., clade, lineage) | ||
min_reporting_fraction: 0.05 | ||
|
||
metadata_cols: | ||
database: | ||
title: "Database" | ||
host: | ||
title: "Host" | ||
isolation_source: | ||
title: "Isolation source" | ||
authors: | ||
title: "Authors" | ||
publications: | ||
title: "Publications" | ||
|
||
group_cols: | ||
serotype: | ||
name: "serotype" | ||
title: "Serotype" | ||
description: "" | ||
show_collapse_options: false | ||
|
||
# AZ report options | ||
report_gene: HA | ||
report_group_col: serotype | ||
report_group_references: | ||
B-vic: B-Austria-1359417-2021 | ||
B-yam: B-Phuket-3073-2013 | ||
H1N1: A-Wisconsin-67-2022 | ||
H3N2: A-Darwin-6-2021 | ||
H5NX: A-Goose-Guangdong-1-96 | ||
H7NX: A-Shanghai-02-2013 | ||
H9NX: A-Hong-Kong-1073-99 | ||
H10NX: A-Jiangsu-428-2021 | ||
|
||
# Surveillance plot options | ||
# see: workflow_main/scripts/surveillance.py | ||
surv_group_col: "serotype" | ||
surv_start_date: "1956-01-01" | ||
surv_period: "Y" | ||
surv_min_combo_count: 50 | ||
surv_min_single_count: 50 | ||
surv_start_date_days_ago: 90 | ||
surv_end_date_days_ago: 30 | ||
surv_group_references: | ||
B-vic: B-Austria-1359417-2021 | ||
B-yam: B-Phuket-3073-2013 | ||
H1N1: A-Wisconsin-67-2022 | ||
H3N2: A-Darwin-6-2021 | ||
H5NX: A-Goose-Guangdong-1-96 | ||
H7NX: A-Shanghai-02-2013 | ||
H9NX: A-Hong-Kong-1073-99 | ||
H10NX: A-Jiangsu-428-2021 | ||
|
||
# --------------- | ||
# DATABASE | ||
# --------------- | ||
|
||
# Split mutation table partitions into periods of this length | ||
# See: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#offset-aliases | ||
# 'M' month end frequency | ||
# 'Y' year end frequency | ||
mutation_partition_break: "Y" | ||
|
||
# --------------- | ||
# SERVER | ||
# --------------- | ||
|
||
# Require a login for accessing the website | ||
# Users are provided to the app via. the "LOGINS" environment variable, | ||
# which is structured as "user1:pass1,user2:pass2,..." | ||
login_required: false | ||
|
||
dev_hostname: "http://localhost:5003" | ||
prod_hostname: | ||
- "https://flu.genbank.pathmut.org" | ||
|
||
# ---------------------- | ||
# VISUALIZATION | ||
# ---------------------- | ||
|
||
site_title: "Flu PathMut" | ||
data_provider: "NCBI GenBank" | ||
motd_url: "https://storage.googleapis.com/ve-public/MOTD.html" | ||
|
||
# Default references for each subtype | ||
default_references: | ||
B-vic: B-Austria-1359417-2021 | ||
B-yam: B-Phuket-3073-2013 | ||
H1N1: A-Wisconsin-67-2022 | ||
H3N2: A-Darwin-6-2021 | ||
H5NX: A-Goose-Guangdong-1-96 | ||
H7NX: A-Shanghai-02-2013 | ||
H9NX: A-Hong-Kong-1073-99 | ||
H10NX: A-Jiangsu-428-2021 | ||
|
||
# Home page | ||
show_home_banner: false | ||
show_walkthroughs: false | ||
show_surveillance: true | ||
show_global_seq_plot: false | ||
|
||
show_reports_tab: false | ||
show_global_sequencing_tab: false | ||
show_methods_tab: false | ||
show_related_projects_tab: false | ||
|
||
default_gene: HA | ||
default_protein: HA | ||
|
||
min_date: "1956-01-01" | ||
|
||
show_logos: | ||
GISAID: false | ||
GenBank: true | ||
|
||
# Allow downloads of sequence metadata (before aggregation) | ||
allow_metadata_download: true | ||
# Allow downloads of raw genomes | ||
allow_genome_download: true |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.