This tutorial & exercise, developed in 2020, is designed for trainees interested in joining our lab. It reflects our expected computing skills using R, Python, Linux shell commands and bioinformatics workflow languages. By completing these exercise you will also set up the computational environment on your computer necessary to perform small scale data analysis that does not require access to high performance computing cluster. Even if you lack the skills for one or more of the languages at the time you start this tutorial, we believe the learning curve for teaching yourself enough to complete the exercises is reasonable. Still, please do not hesitate to contact us ([email protected]) if there is a blocker as you go through the material.
We assume you are comfortable with command-line interface (on Linux or Mac). In this task you are going to work with git from command shell, and install basic software and packages needed for data analysis of Tasks 2 and 3.
Most of our work will be saved and shared on github in public or private repositories. If you have not used git in the past, please follow the instructions here for a 5 minutes git tutorial.
As the next step please fork this repository, add your name to the Markdown file named hello.md
, commit it to github with a customized commit message, eg, "Add my name and github handle", and create a pull request so we can see your update and incorporate it to the repository.
This tutorial (and our research in general) requires R
, Python
, Script of Scripts (SoS)
bioinformatics workflow system and docker
.
Please follow this setup instruction to complete the installations.
You can use any editor of choice but if you have not thought about this before, here is a personal suggestion: I use gvim
for many years before I switched to VS Code text editor.
To open a particular folder on your computer from command terminal:
cd <path to the folder>
code ./
This task is an example of a bioinformatics workflow developed at our group. It uses IPython notebook (with JupyterLab IDE as a recommendation), and runs an SoS kernel for bioinformatics workflows.
Please find the example notebook file notebook/orientation.ipynb
, follow the instructions and complete the Quiz at the end of the notebook.
Please follow the instructions and complete the R exercise analysis/orientation.Rmd
. Rmd
stands for R Markdown. They are text file with R code and narratives that you can open and analyze using software such as Rstudio.
Please follow the instructions and complete the genetic fine-mapping exercise notebook/finemapping.ipynb
. This is somewhat advanced material for those who has a background in statistical genetics.
Organizing and communicating your work with others is essential to your success in conducting reproducible computational research. For every project we require the analysis written in Rmd and IPython notebooks to be converted to a research website that will either be hosted on github (a service github provides) or accessed locally on one's web browser.
We will use workflowr to organize the R analysis. Please follow workflowr
instructions to convert the Rmd files under analysis/
folder into HTML based website under a directory called docs
.
You should find a file docs/index.html
and view it in your web browser once you successfully build the website.
We will use jnbinder program to organize the Python analysis. Please follow the section "Run from a docker image" up to step 3 and return here, then type:
jnbinder --root docs/ipynb
The first time you run the command above, a website template will be configured under docs/ipynb
and the program will quit on error with a line of prompt instructing you to edit
a file called config.yml
. Please open this file with text editor and uncomment to configure name
, repo
and footer
as you see fit, and set add_commit_info
to False
for now.
Finally for include_dir
please specify ["notebook", "workflow"]
to include notebooks under these folders to the website to be generated.
Now please run the command above again to generate IPython based HTML files to docs/ipynb
folder. You can view docs/ipynb/index.html
in your web browser, particularly please click and view the tabs "Notebook" and "Workflow".
After you have completed all tasks please make a tarball for your docs
folder and email it to [email protected]
for us to review.