A template for organizing and performing analysis (adapted from https://github.com/PhanstielLab/project-template).
- Install Docker Desktop for Windows and Mac.
-
Create a repository from this template according to these instructions: https://docs.github.com/en/repositories/creating-and-managing-repositories/creating-a-repository-from-a-template
-
Clone the repository to your home computer.
git clone https://github.com/bailey-lab/Repo_Template.git
- Rename the directory to your project name
mv Repo_Template My_Project
cd My_Project
- Build docker image (rename with the title of your project) in a terminal window of your project directory.
docker build -t my_project .
- Run the docker container with the username and password below in a terminal window of your project directory.
docker run --rm -p 8787:8787 -e USER=rstudio -e PASSWORD=yourpassword --volume ${PWD}:/home/rstudio my_project
-
Open a web browser and navigate to
localhost:8787
. Log in with usernamerstudio
and passwordyourpassword
. -
As I already have the
data/raw
,config
andsrc
directories (with theanalysis
,processing
,utils
subdirectories) made, you can start adding your data and scripts to these directories. -
Run make commands (in the terminal) to inialize the general dictionaries.
make dirs
- Fill the config file with the necessary information for your project. The formatting of the config file is as follows:
# your relative file path
config_path: "config/YYYY_MM_DD_config.yaml"
# the date of the analysis
date: "YYYY_MM_DD"
# the name of the project
project_name: "my_project"
# the description of the project
description: "Cool analysis of something"
# the path to the raw data
sample_summary_csv: "data/raw/UMI_counts.csv"
# the path to the output report file
output_file: "YYYY_MM_DD_final.html"
- Run the snakemake pipeline (in the terminal), that creates date specific subdirectories in the
data/processed
,plots
andreports
directories and then renders the quarto document with the information from your config file.
snakemake -s Snakefile -c 1 --configfile config/YYYY_MM_DD_config.yaml
- To clear all output and rerun this pipeline, run the following commands:
snakemake -s Snakefile --delete-all-output -c 1 --configfile config/YYYY_MM_DD_config.yaml
make clean
-
data
:data/raw
: This is where you put your raw data files. These files are not modified in any way. They are the starting point of your analysis.data/processed
: This is where you put your processed data files. These files are created by your scripts and are used as input for your analysis.
-
plots
: This is where you put your plots. These plots are created by your scripts and are used in your report.- The subdirectories in
plots
are named after the date of the analysis.
- The subdirectories in
-
reports
: This is where you put your reports. These reports are created by your scripts and are the final output of your analysis. They are named by date. -
src
: This is where you put your source code.- Analysis scripts (performing PCA, etc) are stored in the
analysis
subdirectory. - Processing scripts (reformatting, etc) are stored in the
processing
subdirectory. - Source scripts (common functions you use and make) are stored in the
utils
subdirectory.
- Analysis scripts (performing PCA, etc) are stored in the
-
config
: This is where you put your configuration files. These files contain the information needed for your scripts to run.