Skip to content

Commit

Permalink
Improved metadata downloads, raw genome downloads (#278)
Browse files Browse the repository at this point in the history
* insert sequences during database push

* add example fasta files

* Automatically seed development database on first run

* fix typo

* Add config file for alpha site

* Add SNVs to metadata download

* Download metadata modal + options

* Move constants, defs, and config to mounts instead of copying at container build time

* move download SNV code - not sure how we'll use this now that the metadata download is better

* add annotations to server files

* Mount config file and static data onto frontend container as well, instead of from copies

* Add genome download

* Fix for empty LOGINS field

* remove X_SENDFILE arg

* fix typo

* update fake prod environment

* selectively show GISAID logos, depending on the build

* Add GenBank header for GenBank builds

* fix bad import
  • Loading branch information
atc3 authored Mar 5, 2021
1 parent 2d2afac commit 7b80c15
Show file tree
Hide file tree
Showing 369 changed files with 2,084 additions and 696 deletions.
40 changes: 35 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ Table of Contents

- [COVID-19 CG (CoV Genetics)](#covid-19-cg-cov-genetics)
- [Installation](#installation)
- [Dependency changes](#dependency-changes)
- [Database refresh](#database-refresh)
- [Per-service installation](#per-service-installation)
- [Javascript](#javascript)
- [PostgreSQL](#postgresql)
Expand Down Expand Up @@ -35,12 +37,42 @@ The analysis pipeline for processing raw SARS-CoV-2 genomes is a separate instal
# (Re-builds only necessary if packages or
# dependencies have changed)
> docker-compose up -d # Run all services
> curl localhost:5000/seed # Seed PostgreSQL database with example data
# (this only needs to be run once, the
# Docker volume will persist the database)
> docker-compose down # Shut down all services when finished
```

**NOTE**: When starting from a fresh database, the server will automatically seed the database with data from the `example_data_genbank` folder. This process may take a few minutes as ~50K genomes are loaded into the database.

### Dependency changes

If the dependencies for the JS change (i.e., a change in `package.json`), then you can rebuild the frontend container with:

```bash
> docker-compose down
> docker-compose build --no-cache frontend
> docker-compose up
```

A rebuild will also need to be run if the toolchains change (`webpack*.js` or anything in `tools/`)

For files outside of `src`, i.e., in `config/` or in `static_data/`, the container will need to be restarted but not rebuilt.

For dependency changes for the server (i.e., changes in `requirements.txt`)

```bash
> docker-compose down
> docker-compose build --no-cache server
> docker-compose up
```

### Database refresh

To erase the local development database, delete the postgres docker volume with:

```bash
> docker-compose down -v # -v will delete the volume
> docker-compose up
```

## Per-service installation

We recommend developing with Docker and `docker-compose`. More details on the installation for each service can be found in their respective `Dockerfile`s in the `services/` folder, and in the `docker-compose.yml` file. Running each service separately is not recommended and not tested on our end. Since we are not actively testing per-service installations, please submit a GitHub issue if you run into any problems during installation or running.
Expand Down Expand Up @@ -100,8 +132,6 @@ Please provide DB connection information to the Flask server with the following
- POSTGRES_HOST
- POSTGRES_PORT

Seed the database with the example data by running: `curl localhost:5000/seed`, once the Flask server is up and running.

### Flask Server

Requirements:
Expand Down
95 changes: 95 additions & 0 deletions config/config_alpha.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# ------------------
# GLOBAL
# ------------------

# Path to folder with downloaded and processed data
# This path is relative to the workflow folders
data_folder: "../data"

# Path to folder with genome information (reference.fasta, genes.json, proteins.json)
# This path is relative to the workflow folders
static_data_folder: "../static_data"

# ------------------
# INGEST
# ------------------

# Number of genomes to load into memory before flushing to disk
chunk_size: 100000

# --------------------
# ANALYSIS
# --------------------

# SNPs with less than this number of global occurrences will be ignored
snp_count_threshold: 3
# Percentage of sequences within a group (e.g., clade, lineage) that need
# to have a SNV, in order for the SNV to be called as a "consensus" SNV
consensus_fraction: 0.9

metadata_cols:
host:
title: "Host"
gender:
title: "Gender"
patient_status:
title: "Patient Status"
passage:
title: "Passage"
disabled: true
specimen:
title: "Specimen"
sequencing_tech:
title: "Sequencing"
assembly_method:
title: "Assembly"
comment_type:
title: "Flag"
authors:
title: "Authors"
originating_lab:
title: "Originating lab"
submitting_lab:
title: "Submitting lab"

group_cols:
lineage:
name: "lineage"
title: "Lineage"
description: ""
link:
title: "(Lineage Descriptions)"
href: "https://cov-lineages.org/descriptions.html"
clade:
name: "clade"
title: "Clade"
description: "For more information about clade and lineage nomenclature, visit this:"
link:
title: "[GISAID note]"
href: "https://www.gisaid.org/references/statements-clarifications/clade-and-lineage-nomenclature-aids-in-genomic-epidemiology-of-active-hcov-19-viruses/"

# ---------------
# SERVER
# ---------------

# Require a login for accessing the website
# Users are provided to the app via. the "LOGINS" environment variable,
# which is structured as "user1:pass1,user2:pass2,..."
login_required: true

dev_hostname: "http://localhost:5000"
prod_hostname: "https://alpha.covidcg.org"
# prod_hostname: "http://localhost:8080"

# ----------------------
# VISUALIZATION
# ----------------------

show_logos:
GISAID: true
GenBank: false

# Allow downloads of sequence metadata (before aggregation)
allow_metadata_download: true
# Allow downloads of raw genomes
allow_genome_download: true
6 changes: 6 additions & 0 deletions config/config_custom.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -58,5 +58,11 @@ prod_hostname: "http://localhost:8080"
# VISUALIZATION
# ----------------------

show_logos:
GISAID: false
GenBank: false

# Allow downloads of sequence metadata (before aggregation)
allow_metadata_download: true
# Allow downloads of raw genomes
allow_genome_download: true
6 changes: 6 additions & 0 deletions config/config_genbank.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -65,5 +65,11 @@ prod_hostname: "https://genbank.covidcg.org"
# VISUALIZATION
# ----------------------

show_logos:
GISAID: false
GenBank: true

# Allow downloads of sequence metadata (before aggregation)
allow_metadata_download: true
# Allow downloads of raw genomes
allow_genome_download: true
8 changes: 7 additions & 1 deletion config/config_gisaid.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -85,5 +85,11 @@ prod_hostname: "https://covidcg.org"
# VISUALIZATION
# ----------------------

show_logos:
GISAID: true
GenBank: false

# Allow downloads of sequence metadata (before aggregation)
allow_metadata_download: true
allow_metadata_download: false
# Allow downloads of raw genomes
allow_genome_download: false
6 changes: 6 additions & 0 deletions config/config_gisaid_private.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -85,5 +85,11 @@ prod_hostname: "https://az.covidcg.org"
# VISUALIZATION
# ----------------------

show_logos:
GISAID: true
GenBank: false

# Allow downloads of sequence metadata (before aggregation)
allow_metadata_download: true
# Allow downloads of raw genomes
allow_genome_download: true
4 changes: 3 additions & 1 deletion docker-compose.cloudsql.prod.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,15 @@ services:
CONFIGFILE: ${CONFIGFILE}
environment:
- LOGINS=user1:pass1,user2:pass2
- STATIC_DATA_PATH=/static_data
# - FLASK_APP=cg_server/main.py
- FLASK_ENV=production
- CLOUDSQL_CONNECTION_NAME
- POSTGRES_USER
- POSTGRES_PASSWORD
- POSTGRES_DB
- POSTGRES_HOST
command: "gunicorn --bind :8080 --workers 1 --threads 8 --timeout 0 cg_server.main:app"
# command: "flask run --host 0.0.0.0 --port=8080"
ports:
- 8080:8080
working_dir: /app
Expand Down
14 changes: 9 additions & 5 deletions docker-compose.cloudsql.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,13 @@ services:
build:
context: ./
dockerfile: ./services/server/cloudsql.Dockerfile
args:
CONFIGFILE: ${CONFIGFILE}
environment:
- LOGINS=user1:pass1,user2:pass2
- FLASK_APP=cg_server/main.py
- FLASK_ENV=development
- CONFIGFILE=/opt/${CONFIGFILE}
- DATA_PATH=/data
- STATIC_DATA_PATH=/static_data
- STATIC_DATA_PATH=/opt/static_data
- CLOUDSQL_CONNECTION_NAME
- POSTGRES_USER
- POSTGRES_PASSWORD
Expand All @@ -30,6 +29,9 @@ services:
- cloudsql:/cloudsql
- ./services/server:/app:cached # Mount the server python code at run-time, so that the flask development server can refresh on changes
- ./example_data_genbank:/data:cached # Mount the data at run-time (for database seeding only). Should prevent sending all the data over unnecessarily at build-time
- ./src/constants:/opt/constants:cached
- ./config:/opt/config:cached
- ./static_data:/opt/static_data:cached
depends_on:
- sql_proxy

Expand All @@ -50,11 +52,13 @@ services:
build:
context: ./
dockerfile: ./services/frontend/Dockerfile
args:
CONFIGFILE: ${CONFIGFILE}
environment:
CONFIGFILE: /app/${CONFIGFILE}
working_dir: /app
volumes:
- ./src:/app/src:cached # Mount the JS code at run-time, so the babel server can recompile the app on file changes
- ./config:/app/config:cached
- ./static_data:/app/static_data:cached
command: "npm start -s"
ports:
- 3000:3000
Expand Down
14 changes: 9 additions & 5 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,13 @@ services:
build:
context: ./
dockerfile: ./services/server/dev.Dockerfile
args:
CONFIGFILE: ./config/config_genbank.yaml # TODO: allow this to be configured via. an environmental variable prior to running the docker-compose command?
environment:
- LOGINS=user1:pass1,user2:pass2
- FLASK_APP=cg_server/main.py
- FLASK_ENV=development
- CONFIGFILE=/opt/config/config_genbank.yaml
- DATA_PATH=/data
- STATIC_DATA_PATH=/static_data
- STATIC_DATA_PATH=/opt/static_data
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=cg
- POSTGRES_DB=cg_dev
Expand All @@ -26,6 +25,9 @@ services:
volumes:
- ./services/server:/app:cached # Mount the server python code at run-time, so that the flask development server can refresh on changes
- ./example_data_genbank:/data:cached # Mount the data at run-time (for database seeding only). Should prevent sending all the data over unnecessarily at build-time
- ./src/constants:/opt/constants:cached
- ./config:/opt/config:cached
- ./static_data:/opt/static_data:cached
depends_on:
- db
db:
Expand All @@ -44,11 +46,13 @@ services:
build:
context: ./
dockerfile: ./services/frontend/Dockerfile
args:
CONFIGFILE: ./config/config_genbank.yaml
environment:
CONFIGFILE: /app/config/config_genbank.yaml
working_dir: /app
volumes:
- ./src:/app/src:cached # Mount the JS code at run-time, so the babel server can recompile the app on file changes
- ./config:/app/config:cached
- ./static_data:/app/static_data:cached
command: "npm start -s"
ports:
- 3000:3000
Expand Down
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading

0 comments on commit 7b80c15

Please sign in to comment.