diff --git a/.gitmodules b/.gitmodules index 712c3d8e..dc342682 100644 --- a/.gitmodules +++ b/.gitmodules @@ -6,5 +6,5 @@ path = reactapp url = https://github.com/superphy/reactapp.git [submodule "app/modules/PanPredic"] - path = app/modules/PanPredic - url = https://github.com/superphy/PanPredic.git + path = app/modules/PanPredic + url = https://github.com/superphy/PanPredic.git diff --git a/app/static/js/main.js b/app/static/js/main.js index e51c8a76..eaa46329 100755 --- a/app/static/js/main.js +++ b/app/static/js/main.js @@ -96,6 +96,10 @@ app.controller('SpfyController', [ fd.append('options.serotype', $scope.formData.options.serotype); fd.append('options.pi', $scope.pi); fd.append('g-recaptcha-response', $scope.response); + // fix to maintain legacy compatability + fd.append('options.bulk', false); + fd.append('options.groupresults', false); + // end of fix $log.log($scope.response); $log.log($scope.formData); $scope.loading = true; diff --git a/docs/source/algorithms.png b/docs/source/algorithms.png new file mode 100644 index 00000000..dde3cc4a Binary files /dev/null and b/docs/source/algorithms.png differ diff --git a/docs/source/contributing.rst b/docs/source/contributing.rst index 4df94e32..047be560 100644 --- a/docs/source/contributing.rst +++ b/docs/source/contributing.rst @@ -10,7 +10,11 @@ Getting Started Don't worry, genome files are just like Excel spreadsheets. -.. image:: https://imgs.xkcd.com/comics/algorithms.png +.. image:: algorithms.png + :align: center + :alt: excel is complicated + +(from the excellent https://xkcd.com/) We use Docker and Docker-Compose for managing the databases: Blazegraph and Redis, the webserver: Nginx/Flask/Conda, and Redis-Queue (RQ) workers: mostly in Conda. The official `Install Docker Compose guide`_ lists steps for installing both the base Docker Engine, and for installing Docker-Compose separately if you're on Linux. For Mac and Windows users, Docker-Compose comes bundled with Docker Engine. diff --git a/docs/source/deploying.rst b/docs/source/deploying.rst index 7c6dd976..6fddcb89 100644 --- a/docs/source/deploying.rst +++ b/docs/source/deploying.rst @@ -4,6 +4,164 @@ Deplyoment Guide .. contents:: Table of Contents :local: + +The way we recommend you deploy Spfy is to simply use the Docker composition for everything; this approach is documented in `Deploying in General`_. Specifics related to the NML's deployment is given in `Deploying to Corefacility`_. + +Deploying in General +==================== + +Let's take a look at the ``docker-compose.yml`` file. + +.. code-block:: yaml + + version: '2' + services: + webserver: + build: + context: . + dockerfile: Dockerfile-spfy + image: backend + ports: + - "8000:80" + depends_on: + - redis + - blazegraph + volumes: + - /datastore + + reactapp: + build: + context: . + dockerfile: Dockerfile-reactapp + image: reactapp + ports: + - "8090:5000" + depends_on: + - webserver + + worker: + build: + context: . + dockerfile: Dockerfile-rq + image: backend-rq + ports: + - "9181:9181" #this is for debugging, drop a shell and run rq-dashboard if you need to see jobs + volumes_from: + - webserver + depends_on: + - webserver + + worker-blazegraph-ids: + build: + context: . + dockerfile: Dockerfile-rq-blazegraph + image: backend-rq-blazegraph + volumes_from: + - webserver + depends_on: + - webserver + + worker-priority: + build: + context: . + dockerfile: Dockerfile-rq-priority + image: backend-rq-priority + volumes_from: + - webserver + depends_on: + - webserver + + redis: + image: redis:3.2 + command: redis-server --appendonly yes # for persistance + volumes: + - /data + + blazegraph: + image: superphy/blazegraph:2.1.4-inferencing + ports: + - "8080:8080" + volumes: + - /var/lib/jetty/ + +Host to Container Mapping +------------------------- + +There are a few key points to note: + +.. code-block:: yaml + + ports: + - "8000:80" + +The configuration maps ``host:container``; so port 8000 on the host (your computer) is linked to port 80 of the container. Fields like volumes typically have only one value: ``/var/lib/jetty/``; this is done to instruct Docker to map the folder ``/var/lib/jetty`` within the container itself to a generic volume managed by Docker, thereby enabling the data to persist across start/stop cycles. + +You can also add a host path to volume mappings such as ``/dbbackup/:/var/lib/jetty/`` so that Docker uses an actual path on your host, instead of a generic Docker-managed volume. As before, the first term, ``/dbbackup/`` would reside on the host. + +.. warning:: + + A caveat to note is that if you do not specify a host folder on volume mappings, running a ``docker-compose down`` will still **wipe** the generic volume. Either run ``docker-compose stop`` instead, or specify a host mapping to persist the data. + +Volume Mapping in Production +---------------------------- + +In production, at minimum we recommend you map Blazegraph's volume to a backup directory. ``/datastore`` also stores all the uploaded genome files and related temporary files generated during analysis. ``/data`` is used to store both the parsed responses to the front-end, and the task queue managing them. If you want the analysis tasks to continue, or existing results shown to the front-end, to persist after running ``docker-compose down`` you'll have to map both volumes - server failures or just running ``docker-compose stop`` will still persist the data without requiring you to map to host. + +Ports +----- + +``reactapp`` is the front-end user interface for Spfy whereas ``webserver`` serves the backend Flask APIs. Without modification, when you run ``docker-compose up`` port 8090 is used to access the app. The front-end then calls port 8000 to submit requests to the backend. This approach is fine for individual users on their own computer, but this setup should not be used for production as it would, at minimum, require opening one additional port. + +Instead, we recommend you change the port for ``reactapp`` to the standard port 80, and also map the ``webserver`` to a subdomain. + +Setting the host port mapping can be done by modifying the ``webserver`` config with the below: + +.. code-block:: yaml + + ports: + - "80:80" + +For networking the backend APIs, you can keep the webserver running on port 8000 and use a reverse-proxy such as NGINX to map the subdomain to port 8000 on your server. In other words, we'll set it up so requests made by reactapp to the API are sent to ``api.mydomain.com``, for example, which maps to the IP address of your server (ideally via HTTPS). Your reverse-proxy will then redirect the request to port 8000 locally, while serving the reactapp interface on the main domain (``mydomain.com``, in this case). + +Setting a Subdomain +------------------- + +This has to be done through the interface of your domain registrar. You'll have to add an Address Record (A Record), which is typically under the heading "Manage Advanced DNS Records" or similar. + +Setting up a Reverse Proxy +-------------------------- + +We recommend you use NGINX as the reverse proxy. You can find their Getting Started guide at https://www.nginx.com/resources/wiki/start/ + +In addition, we recommend you use Certbot (part of the EFF's Let's Encrypt) project to get the required certificates and setup HTTPS on your server. You can find their interactive guide at https://certbot.eff.org/ which allow's you to specify the webserver (NGINX) and operating system you are using. Certbot comes with a nice script to automatically modify your NGINX configuration as required. + +Point Reactapp to Your Subdomain +-------------------------------- + +To tell reactapp to point to your subdomain, you'll have to modify the ``api.js`` settings located at ``reactapp/src/middleware/api.js``. + +The current ``ROOT`` of the target domain is: + +.. code-block:: js + + const ROOT = window.location.protocol + '//' + window.location.hostname + ':8000/' + +change this to: + +.. code-block:: js + + const ROOT = 'https' + '//' + 'api.mydomain.com' + '/' + +and then rebuild and redeploy reactapp. + +.. code-block:: sh + + docker-compose build --no-cache reactapp + docker-compose up -d + +.. note:: + + The Flask webserver has Cross-Origin Requests (CORS) enabled, so you can deploy reactapp to another server (that is only running reactapp, and not the webserver, databases, workers). The domain can be ``mydomain.com`` or any domain name you own - you'll just have to setup the A records as appropriate. Deploying to Corefacility ========================= diff --git a/scripts/enterobase.py b/scripts/enterobase.py index 0f5088e5..a84f6a47 100644 --- a/scripts/enterobase.py +++ b/scripts/enterobase.py @@ -1,6 +1,5 @@ import os import requests -import pandas as pd from time import sleep def get(identifier, barcode, dl_folder): @@ -36,7 +35,7 @@ def enterobase(): # {u'secondary_sample_accession': u'SRS1016187', u'comment': None, u'collection_year': 2009, u'serotype': None, u'antibiotic_resistance': None, u'strain': u'AZ-TG71511', u'postcode': None, u'owner': None, u'continent': u'North America', u'city': None, u'collection_date': None, u'collection_month': 10, u'not_editable': True, u'id': 7740, u'admin1': u'Illinois', u'source_details': u'packaged turkey', u'admin2': None, u'longitude': None, u'best_assembly': 21647, u'study_accession': u'PRJNA230968', u'serological_group': None, u'source_niche': u'Poultry', u'barcode': u'ESC_AA7740AA', u'latitude': None, u'simple_disease': None, u'secondary_study_accession': u'SRP038995', u'ecor': None, u'path_nonpath': None, u'disease': None, u'uberstrain': 7740, u'country': u'United States', u'release_date': u'2015-07-29', u'created': u'2015-08-27', u'Accession': [{u'seq_platform': u'ILLUMINA', u'seq_library': u'Paired', u'experiment_accession': u'SRX1123765', u'seq_insert': 500, u'accession': u'SRR2133399'}], u'assembly_status': u'Assembled', u'sample_accession': u'SAMN02463300', u'simple_pathogenesis': None, u'source_type': u'Avian', u'contact': u'FOOD AND DRUG ADMINISTRATION, CENTER FOR FOOD SAFETY AND APPLIED NUTRITION', u'species': u'Escherichia coli', u'collection_time': None} # >>> experiment[0] # {u'status': u'Assembled', u'barcode': u'ESC_CA1647AA_AS', u'n50': 113343, u'pipeline_version': 2.2, u'low_qualities': 10355, u'coverage': None, u'total_length': 4878259, u'id': 7740, u'contig_number': 157, u'top_species': u'Escherichia coli / Shigella;94.66%'} - df = pd.DataFrame.from_records(experiment) + #df = pd.DataFrame.from_records(strains) # >>> df.keys() # Index([u'barcode', u'can_view', u'contig_number', u'coverage', # u'extra_row_info', u'id', u'low_qualities', u'n50', u'pipeline_version', @@ -45,10 +44,10 @@ def enterobase(): dl_folder = 'enterobase_db' if not os.path.exists(dl_folder): os.makedirs(dl_folder) - for row in df.itertuples(): - identifier = row[6] - barcode = row[1] - assembled = row[10] + for row in strains: + identifier = row['best_assembly'] + barcode = row['strain'] + assembled = row['assembly_status'] if assembled == 'Assembled': i = 1 while i < 10: