Merge pull request #15 from FredHutch/djg-vignette-review

added todo comments for clarification
FredHutch · Mar 11, 2024 · 1e7217e · 1e7217e
2 parents 9067eab + 5781565
commit 1e7217e
Showing 1 changed file with 25 additions and 8 deletions.
diff --git a/vignettes/getting-started.Rmd b/vignettes/getting-started.Rmd
@@ -16,25 +16,37 @@ knitr::opts_chunk$set(
 
 # gimap tutorial
 
+gimap performs analysis of dual-targeting CRISPR screening data, with the goal of aiding the identification of genetic interactions (e.g. cooperativity, synthetic lethality) in models of disease and other biological contexts. gimap analyzes functional genomic data generated by the pgPEN (paired guide RNAs for genetic interaction mapping) approach, quantifying growth effects of single and paired gene knockouts upon application of a CRISPR library. A multitude of CRISPR screen types can be used for this analysis, with helpful descriptions found in this review (https://www.nature.com/articles/s43586-021-00093-4). Use of pgPEN and GI-mapping in a paired gRNA format can be found here (https://pubmed.ncbi.nlm.nih.gov/34469736/). 
+
 ```{r}
 library(magrittr)
 library(gimap)
 ```
 
-```{r}
-example_data <- example_data()
+## Data requirements 
 
-# Let's examine this example metadata
-example_data
+Let's examine this example pgPEN counts table. It's divided into columns containing: 
+
+- an ID corresponding to the names of paired guides
+- gRNA sequence 1, targeting "paralog A"
+- gRNA sequence 2, targeting "paralog B"
+- The sample, day, and replicate number for which gRNAs were sequenced
+
+```{r}
+example_data <- gimap::example_data()
 ```
+The metadata you have may vary slightly from this but you'll want to make sure you have the essential variables and information regarding how you collected your data. 
+example_data
 
 ```{r}
 colnames(example_data)
 ```
 
 ## Setting up data
 
-We're going to set up three datasets. The first is required, its the counts that the genetic interaction analysis will be used for. 
+We're going to set up three datasets. The first is required, it's the counts that the genetic interaction analysis will be used for. 
+
+The first data set contains the readcounts from each sample type. Required for analysis is a Day 0 (or plasmid) sample, and at least one further timepoint sample. The T0 sample, or plasmid sample, will represent the entire library before any type of selection has occurred during the length of the screen. This is the baseline for guide RNA representation. The length of time cells should remain in culture throughout the screen is heavily dependent on the type of selection occurring, helpful advice can be found in (https://www.nature.com/articles/s43586-021-00093-4). QC analysis will follow to correlate replicates if inputted. Comparison of early and late timepoints is possible in this function, but not required if early timepoints were not taken. 
 
 ```{r}
 example_counts <- example_data %>%
@@ -47,12 +59,17 @@ The next two datasets are metadata that describe the dimensions of the count dat
 - The sizes of these metadata must correspond to the dimensions of the counts data. 
 - The first column of the pg_metadata must be a unique id
 
+`pg_metadata` is the information that describes the paired guide RNA targets. This information contains a table of the paired guide RNA sequences and the corresponding paralog gene that is being targeted for cutting by the gRNA-Cas9 complex.
+
 ```{r}
-# pg metadata is the information that describes the paired guide RNA targets
+
 example_pg_metadata <- example_data %>%
   dplyr::select(c("id", "seq_1", "seq_2"))
+```
+
+`sample_metadata` is the information that describes timepoint information and replicate information relating to each sample. In general, two replicates at each timepoint are carried through to analysis, where they are later collapsed. 
 
-# sample metadata is the information that describes 
+```{r}
 example_sample_metadata <- data.frame(
   id = 1:5,
   day = as.factor(c("Day00", "Day05", "Day22", "Day22", "Day22")), 
@@ -63,7 +80,7 @@ example_sample_metadata <- data.frame(
 Now let's setup our data using `setup_data()`. Optionally we can provide the metadata in this function as well so that it is stored with the data. 
 
 ```{r}
-gimap_dataset <- setup_data(counts = example_counts, 
+gimap_dataset <-  gimap::setup_data(counts = example_counts, 
                             pg_metadata = example_pg_metadata,
                             sample_metadata = example_sample_metadata)
 ```