Skip to content

Commit

Permalink
Per-service deployment fixes (#384)
Browse files Browse the repository at this point in the history
* disable metadata downloads for alpha site

* Add export commands to serve.sh script (#382)

* Add more instructions about defining CONFIGFILE env variable before running flask server (#382)

* Add warning when CONFIGFILE env variable is not defined (#382)

* Fix data paths to be relative to project root

* Add env var for constants file, and better default that works on per-service deployments (#381)

* Update instructions, add more env vars to serve script
  • Loading branch information
atc3 authored Aug 17, 2021
1 parent 2990512 commit d7b8fd8
Show file tree
Hide file tree
Showing 20 changed files with 124 additions and 86 deletions.
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ Table of Contents
- [PostgreSQL](#postgresql)
- [Flask Server](#flask-server)
- [Analysis Pipeline](#analysis-pipeline)
- [Pipeline Installation](#pipeline-installation)
- [Ingestion](#ingestion)
- [Main Analysis](#main-analysis)
- [About the project](#about-the-project)
Expand Down Expand Up @@ -148,7 +149,8 @@ Run server:

```bash
$ cd services/server
$ ./serve.sh # Run Flask server in development mode
$ CONFIGFILE=../../config/config_genbank.yaml ./serve.sh # Run Flask server in development mode, with GenBank settings
# Optionally, edit the serve.sh script to set the config file
```

---
Expand Down
8 changes: 4 additions & 4 deletions config/config_alpha.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@
# ------------------

# Path to folder with downloaded and processed data
# This path is relative to the workflow folders
data_folder: "../data"
# This path is relative to the project root
data_folder: "data"

# Path to folder with genome information (reference.fasta, genes.json, proteins.json)
# This path is relative to the workflow folders
static_data_folder: "../static_data"
# This path is relative to the project root
static_data_folder: "static_data"

# ------------------
# INGEST
Expand Down
10 changes: 4 additions & 6 deletions config/config_custom.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@
# ------------------

# Path to folder with downloaded and processed data
# This path is relative to the workflow folders
data_folder: "../data_custom"
# This path is relative to the project root
data_folder: "data_custom"

# Path to folder with genome information (reference.fasta, genes.json, proteins.json)
# This path is relative to the workflow folders
static_data_folder: "../static_data"
# This path is relative to the project root
static_data_folder: "static_data"

# ------------------
# INGEST
Expand Down Expand Up @@ -47,15 +47,13 @@ group_cols:
title: "(Lineage Descriptions)"
href: "https://cov-lineages.org/descriptions.html"


# Surveillance plot options
# see: workflow_main/scripts/surveillance.py
surv_min_combo_count: 50
surv_min_single_count: 50
surv_start_date_days_ago: 90
surv_end_date_days_ago: 30


# ---------------
# SERVER
# ---------------
Expand Down
10 changes: 4 additions & 6 deletions config/config_genbank.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@
# ------------------

# Path to folder with downloaded and processed data
# This path is relative to the workflow folders
data_folder: "../data_genbank"
# This path is relative to the project root
data_folder: "data_genbank"

# Path to folder with genome information (reference.fasta, genes.json, proteins.json)
# This path is relative to the workflow folders
static_data_folder: "../static_data"
# This path is relative to the project root
static_data_folder: "static_data"

# ------------------
# INGEST
Expand Down Expand Up @@ -53,15 +53,13 @@ group_cols:
title: "(Lineage Descriptions)"
href: "https://cov-lineages.org/descriptions.html"


# Surveillance plot options
# see: workflow_main/scripts/surveillance.py
surv_min_combo_count: 50
surv_min_single_count: 50
surv_start_date_days_ago: 90
surv_end_date_days_ago: 30


# ---------------
# SERVER
# ---------------
Expand Down
10 changes: 4 additions & 6 deletions config/config_genbank_example.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@
# ------------------

# Path to folder with downloaded and processed data
# This path is relative to the workflow folders
data_folder: "../example_data_genbank"
# This path is relative to the project root
data_folder: "example_data_genbank"

# Path to folder with genome information (reference.fasta, genes.json, proteins.json)
# This path is relative to the workflow folders
static_data_folder: "../static_data"
# This path is relative to the project root
static_data_folder: "static_data"

# ------------------
# INGEST
Expand Down Expand Up @@ -53,15 +53,13 @@ group_cols:
title: "(Lineage Descriptions)"
href: "https://cov-lineages.org/descriptions.html"


# Surveillance plot options
# see: workflow_main/scripts/surveillance.py
surv_min_combo_count: 50
surv_min_single_count: 50
surv_start_date_days_ago: 240
surv_end_date_days_ago: 30


# ---------------
# SERVER
# ---------------
Expand Down
9 changes: 4 additions & 5 deletions config/config_gisaid.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@
# ------------------

# Path to folder with downloaded and processed data
# This path is relative to the workflow folders
data_folder: "../data"
# This path is relative to the project root
data_folder: "data"

# Path to folder with genome information (reference.fasta, genes.json, proteins.json)
# This path is relative to the workflow folders
static_data_folder: "../static_data"
# This path is relative to the project root
static_data_folder: "static_data"

# ------------------
# INGEST
Expand Down Expand Up @@ -73,7 +73,6 @@ group_cols:
title: "[GISAID note]"
href: "https://www.gisaid.org/references/statements-clarifications/clade-and-lineage-nomenclature-aids-in-genomic-epidemiology-of-active-hcov-19-viruses/"


# Surveillance plot options
# see: workflow_main/scripts/surveillance.py
surv_min_combo_count: 50
Expand Down
10 changes: 4 additions & 6 deletions config/config_gisaid_private.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@
# ------------------

# Path to folder with downloaded and processed data
# This path is relative to the workflow folders
data_folder: "../data_private"
# This path is relative to the project root
data_folder: "data_private"

# Path to folder with genome information (reference.fasta, genes.json, proteins.json)
# This path is relative to the workflow folders
static_data_folder: "../static_data"
# This path is relative to the project root
static_data_folder: "static_data"

# ------------------
# INGEST
Expand Down Expand Up @@ -73,15 +73,13 @@ group_cols:
title: "[GISAID note]"
href: "https://www.gisaid.org/references/statements-clarifications/clade-and-lineage-nomenclature-aids-in-genomic-epidemiology-of-active-hcov-19-viruses/"


# Surveillance plot options
# see: workflow_main/scripts/surveillance.py
surv_min_combo_count: 50
surv_min_single_count: 50
surv_start_date_days_ago: 90
surv_end_date_days_ago: 30


# ---------------
# SERVER
# ---------------
Expand Down
1 change: 1 addition & 0 deletions docker-compose.cloudsql.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ services:
- FLASK_APP=cg_server/main.py
- FLASK_ENV=development
- CONFIGFILE=/opt/${CONFIGFILE}
- CONSTANTSFILE=/opt/constants/defs.json
- DATA_PATH=/data
- STATIC_DATA_PATH=/opt/static_data
- CLOUDSQL_CONNECTION_NAME
Expand Down
1 change: 1 addition & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ services:
- FLASK_APP=cg_server/main.py
- FLASK_ENV=development
- CONFIGFILE=/opt/config/config_genbank.yaml
- CONSTANTSFILE=/opt/constants/defs.json
- DATA_PATH=/data
- STATIC_DATA_PATH=/opt/static_data
- POSTGRES_USER=postgres
Expand Down
8 changes: 8 additions & 0 deletions services/server/cg_server/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,14 @@
config = {}

# Load app configuration

# Print warning if configfile is not defined
# Only an issue in non-docker deployments
if "CONFIGFILE" not in os.environ:
print(
'CONFIGFILE environment variable not defined. Defaulting config file path to "/opt/config.yaml"'
)

config_file_path = os.getenv("CONFIGFILE", "/opt/config.yaml")
with open(config_file_path, "r") as fp:
config = load(fp.read(), Loader=Loader)
Expand Down
11 changes: 10 additions & 1 deletion services/server/cg_server/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,19 @@
"""

import json
import os

from pathlib import Path

constants = {}

constants_path = os.getenv(
"CONSTANTSFILE",
# root/services/server/cg_server/constants.py --> need to go back 3 levels
Path(__file__).parent.parent.parent.parent / "src" / "constants" / "defs.json",
)

# Load constant defs
with open("/opt/constants/defs.json", "r") as fp:
with open(constants_path, "r") as fp:
constants = json.loads(fp.read())

8 changes: 6 additions & 2 deletions services/server/cg_server/db_seed/seed.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,12 @@
from cg_server.config import config
from .load_snvs import process_dna_snvs, process_aa_snvs

data_path = Path(os.getenv("DATA_PATH", config["data_folder"]))
static_data_path = Path(os.getenv("STATIC_DATA_PATH", config["static_data_folder"]))
# root/services/server/cg_server/db_seed/seed.py
project_root = Path(__file__).parent.parent.parent.parent.parent
data_path = Path(os.getenv("DATA_PATH", project_root / config["data_folder"]))
static_data_path = Path(
os.getenv("STATIC_DATA_PATH", project_root / config["static_data_folder"])
)

genes = pd.read_json(str(static_data_path / "genes_processed.json"))
genes = genes.set_index("name")
Expand Down
6 changes: 5 additions & 1 deletion services/server/cg_server/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,9 +30,13 @@
generate_report,
)

from pathlib import Path
from psycopg2 import pool
from cg_server.query.connection_pooling import get_conn_from_pool

# root/services/server/cg_server/main.py
project_root = Path(__file__).parent.parent.parent.parent

app = Flask(__name__, static_url_path="", static_folder="dist")
Gzip(app)
CORS(app)
Expand Down Expand Up @@ -98,7 +102,7 @@ def verify_password(username, password):
seed_database(conn)
insert_sequences(
conn,
os.getenv("DATA_PATH", config["data_folder"]),
os.getenv("DATA_PATH", project_root / config["data_folder"]),
filenames_as_dates=True,
)

Expand Down
1 change: 1 addition & 0 deletions services/server/prod.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ ENV FLASK_ENV production
ENV PORT 8080
ENV CONFIGFILE /opt/$CONFIGFILE
ENV STATIC_DATA_PATH /opt/static_data
ENV CONSTANTSFILE /opt/constants/defs.json

# Run the web service on container startup. Here we use the gunicorn
# webserver, with one worker process and 8 threads.
Expand Down
18 changes: 15 additions & 3 deletions services/server/serve.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,18 @@
#!/bin/bash

FLASK_APP=cg_server/main.py
FLASK_ENV=development
export FLASK_APP=cg_server/main.py
export FLASK_ENV=development
# export CONFIGFILE=/path/to/configfile.yml # Set config file here

flask run --host 0.0.0.0 --port=5000
# export DATA_PATH=... # Optional - defaults to data path in config file
# export STATIC_DATA_PATH=... # Optional - defaults to static data path in config file

# POSTGRES CONFIG
export POSTGRES_USER=postgres
export POSTGRES_PASSWORD=cg
export POSTGRES_DB=cg_dev
export POSTGRES_HOST=127.0.0.1
export POSTGRES_PORT=5432
export POSTGRES_MAX_CONN=20

flask run --host 0.0.0.0 --port=5001
22 changes: 12 additions & 10 deletions workflow_custom_ingest/Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -10,25 +10,27 @@ from scripts.combine_lineages import combine_lineages

configfile: "../config/config_custom.yaml"

data_folder = os.path.join("..", config["data_folder"])
# static_data_folder = os.path.join("..", config["static_data_folder"])

CHUNKS, = glob_wildcards(os.path.join(
config["data_folder"], "fasta_raw", "{chunk}.fa.gz"
data_folder, "fasta_raw", "{chunk}.fa.gz"
))


rule all:
input:
# Cleaned metadata, with lineage assignments
os.path.join(config["data_folder"], "metadata.csv")
os.path.join(data_folder, "metadata.csv")


rule pangolin_lineages:
"""Assign a lineage to each sequence using pangolin
"""
input:
fasta = os.path.join(config["data_folder"], "fasta_raw", "{chunk}.fa.gz")
fasta = os.path.join(data_folder, "fasta_raw", "{chunk}.fa.gz")
output:
fasta = temp(os.path.join(config["data_folder"], "lineages", "{chunk}.fa")),
lineages = os.path.join(config["data_folder"], "lineages", "{chunk}.csv")
fasta = temp(os.path.join(data_folder, "lineages", "{chunk}.fa")),
lineages = os.path.join(data_folder, "lineages", "{chunk}.csv")
conda: "envs/pangolin.yaml"
shell:
"""
Expand All @@ -42,9 +44,9 @@ rule combine_lineages:
"""Combine all lineage result chunks
"""
input:
lineages = expand(os.path.join(config["data_folder"], "lineages", "{chunk}.csv"), chunk=CHUNKS)
lineages = expand(os.path.join(data_folder, "lineages", "{chunk}.csv"), chunk=CHUNKS)
output:
lineages = os.path.join(config["data_folder"], "lineages.csv")
lineages = os.path.join(data_folder, "lineages.csv")
run:
combine_lineages(input.lineages, output.lineages)

Expand All @@ -53,9 +55,9 @@ rule clean_metadata:
"""Clean metadata, incorporate lineage assignments into metadata
"""
input:
metadata_dirty = os.path.join(config["data_folder"], "metadata_raw.csv"),
metadata_dirty = os.path.join(data_folder, "metadata_raw.csv"),
lineages = rules.combine_lineages.output.lineages
output:
metadata_clean = os.path.join(config["data_folder"], "metadata.csv")
metadata_clean = os.path.join(data_folder, "metadata.csv")
run:
clean_metadata(input.metadata_dirty, input.lineages, output.metadata_clean)
Loading

0 comments on commit d7b8fd8

Please sign in to comment.