Skip to content

Latest commit

 

History

History
142 lines (83 loc) · 6.74 KB

README.md

File metadata and controls

142 lines (83 loc) · 6.74 KB

COVID-19 CG (CoV Genetics)

Preprint now up on bioRxiv: https://www.biorxiv.org/content/10.1101/2020.09.23.310565v2

Table of Contents

Installation

git clone https://github.com/vector-engineering/covidcg.git

Python

  1. Get a conda distribution of python, we recommend miniconda3.

  2. Install dependencies

    conda env create -n covidcg -f environment.yml

Data Requirements

The python scripts require a data folder inside the root folder of the project in order to run. In accordance with the GISAID Database Access Agreement (DAA), we cannot share data outside of their distribution service.

We are currently rewriting our data pipeline to be more generalized, and will update on instructions soon.

Data Package

As of version 1.2.0, the snakemake pipeline will bundle all necessary data into a file data_package.json.

If hosting your own COVID CG instance, you can either point the application to our data package, or host your own by changing the URL for the data package (located at src/stores/asyncDataStore.js).

Javascript

This app was built from the react-slingshot example app.

  1. Install Node 8.0.0 or greater

    Need to run multiple versions of Node? Use nvm.

  2. Install Git.

  3. Disable safe write in your editor to assure hot reloading works properly.

  4. Complete the steps below for your operating system:

    macOS

    • Install watchman via brew install watchman to avoid this issue which occurs if your macOS has no appropriate file watching service installed.

    Linux

    • Run this to increase the limit on the number of files Linux will watch. Here's why.

      echo fs.inotify.max_user_watches=524288 | sudo tee -a /etc/sysctl.conf && sudo sysctl -p.

  5. Install NPM packages

    npm install

  6. Run the example app

    npm start -s

    This will run the automated build process, start up a webserver, and open the application in your default browser. When doing development with this kit, this command will continue watching all your files. Every time you hit save the code is rebuilt, linting runs, and tests run automatically. Note: The -s flag is optional. It enables silent mode which suppresses unnecessary messages during the build.


Analysis Pipeline

Data analysis is run with Snakemake, Python scripts, and bioinformatics tools such as bowtie2. Please ensure that the conda environment is configured correctly (See Python) and that all data files are present and linked correctly to the data/ folder.

Preview (dry-run) the pipeline by running:

snakemake -n

and run the pipeline with:

snakemake

NOTE: bowtie2 usually uses anywhere from 8 – 10 GB of RAM per CPU during the alignment step. If the pipeline includes the alignment step, then only use as many cores as you have RAM / 10. i.e., if your machine has 128 GB RAM, then you can run at most 128 / 10 ~= 12 cores.


About the project

This project is developed by the Vector Engineering Lab:

  • Albert Chen (Broad Institute)
  • Kevin Altschuler
  • Shing Hei Zhan, PhD (University of British Columbia)
  • Alina Yujia Chan, PhD (Broad Institute)
  • Ben Deverman, PhD (Broad Institute)

The manuscript for this project is currently being prepared.

Contact the authors by email: [email protected]

Python/snakemake scripts were run and tested on MacOS 10.15.4 (8 threads, 16 GB RAM), Google Cloud Debian 10 (buster), (64 threads, 412 GB RAM), and Windows 10/Ubuntu 20.04 via. WSL2 (48 threads, 128 GB RAM)

Data enabling COVID CG

We are extremely grateful to the GISAID Initiative and all its data contributors, i.e. the Authors from the Originating laboratories responsible for obtaining the speciments and the Submitting laboratories where genetic sequence data were generated and shared via the GISAID Initiative, on which this research is based.

Elbe, S., and Buckland-Merrett, G. (2017) Data, disease and diplomacy: GISAID’s innovative contribution to global health. Global Challenges, 1:33-46. DOI:10.1002/gch2.1018 PMCID: 31565258

Citing COVID CG

Users are encouraged to share, download, and further analyze data from this site. Plots can be downloaded as PNG or SVG files, and the data powering the plots and tables can be downloaded as well. Please attribute any data/images to covidcg.org.

Note: When using results from these analyses in your manuscript, ensure that you acknowledge the contributors of data, i.e. We gratefully acknowledge all the Authors from the Originating laboratories responsible for obtaining the speciments and the Submitting laboratories where genetic sequence data were generated and shared via the GISAID Initiative, on which this research is based.

and cite the following reference(s):

Shu, Y., McCauley, J. (2017) GISAID: Global initiative on sharing all influenza data – from vision to reality. EuroSurveillance, 22(13) DOI:10.2807/1560-7917.ES.2017.22.13.30494 PMCID: PMC5388101

License

COVID-19 CG is distributed by an MIT license.

Contributing

Please feel free to contribute to this project by opening an issue or pull request in the GitHub repository.