Skip to content

Latest commit

 

History

History
229 lines (166 loc) · 8.86 KB

README.md

File metadata and controls

229 lines (166 loc) · 8.86 KB

CRISRPball Lift the Fog from Your CRISPR Analyses

CRISPRball is a Shiny application to explore, visualize, filter, and integrate CRISPR screens with public data and multiple datasets. In particular, it allows for publication-quality figure generation including full aesthetic customization and interactive labeling, filtering of results using DepMap Common Essential genes, simple comparisons between datasets/timepoints/treatments, etc.

It is designed for end users and may be particularly useful for bioinformatics/genome editing cores that perform basic analyses before returning results to users. Pointing users to the online version of the app (or a hosted one) will allow them to quickly wade through and interpret their data.

Currently, it supports the output from MAGeCK RRA and MLE analysis methods. This package supplements the MAGeCKFlute package, adding additional functionality, visualizations, and a Shiny interface. Support for the output of additional analysis tools and methods will be added upon request.

Installation

CRISPRball is available on Bioconductor and can be installed using the following commands:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("CRISPRball")

# To install the development version from Github:
if (!requireNamespace("devtools", quietly = TRUE))
    install.packages("devtools")

devtools::install_github("j-andrews7/CRISPRball")

Usage

Starting the app is as simple as calling the CRISPRball function.

library("CRISPRball")
CRISPRball()

Users can then upload their data within the app, which will enable specific tabs in the application as the data is provided.

One can also pass their input data directly as input - all that are needed are file paths to the MAGeCK output files.

Passing data directly can be useful when hosting the app on a local Shiny server where having pre-loaded data for the user is wanted. This is particularly useful for core or shared resource facilities that perform basic analyses for end-users.

MAGeCK RRA Output

In this case, we'll use the example output from the third MAGeCK tutorial.

In this example, the two datasets are just reverse comparisons (ESC vs plasmid & plasmid vs ESC) where DDX27 has been manually altered in the ESC vs plasmid comparison to no longer be a significant hit.

# Create lists of results summaries for each dataset.
d1.genes <- read.delim(system.file("extdata", "esc1.gene_summary.txt",
    package = "CRISPRball"
), check.names = FALSE)
d2.genes <- read.delim(system.file("extdata", "plasmid.gene_summary.txt",
    package = "CRISPRball"
), check.names = FALSE)

d1.sgrnas <- read.delim(system.file("extdata", "esc1.sgrna_summary.txt",
    package = "CRISPRball"
), check.names = FALSE)
d2.sgrnas <- read.delim(system.file("extdata", "plasmid.sgrna_summary.txt",
    package = "CRISPRball"
), check.names = FALSE)

count.summ <- read.delim(system.file("extdata", "escneg.countsummary.txt",
    package = "CRISPRball"
), check.names = FALSE)
norm.counts <- read.delim(system.file("extdata", "escneg.count_normalized.txt",
    package = "CRISPRball"
), check.names = FALSE)

# Look at the first few rows of the gene summary for the ESC vs plasmid comparison.
head(d1.genes)

We can then provide this data to the CRISPRball function.

genes <- list(ESC = d1.genes, plasmid = d2.genes)
sgrnas <- list(ESC = d1.sgrnas, plasmid = d2.sgrnas)

CRISPRball(
    gene.data = genes, sgrna.data = sgrnas,
    count.summary = count.summ, norm.counts = norm.counts
)

MAGeCK MLE Output

CRISPRball also supports the MLE output from MAGeCK. In this case, we'll use the example data from the fourth MAGeCK tutorial.

# Create lists of results summaries for each dataset.
genes <- read_mle_gene_summary(system.file("extdata", "beta_leukemia.gene_summary.txt",
    package = "CRISPRball"
))

count.summ <- read.delim(system.file("extdata", "escneg.countsummary.txt",
    package = "CRISPRball"
), check.names = FALSE)
norm.counts <- read.delim(system.file("extdata", "escneg.count_normalized.txt",
    package = "CRISPRball"
), check.names = FALSE)

CRISPRball(
    gene.data = genes, 
    count.summary = count.summ, norm.counts = norm.counts
)

The QC Tab

On load, the application will display the QC tab, which provides multiple interactive plots to assess the quality control of all samples in the dataset. Plots include those that assess the Gini Index (a measure of read distribution inequality), counts of fully depleted sgRNAs, percentage of reads mapped, read distributions across sgRNAs, correlation matrix between samples, and a PCA plot containing all samples.

Controls to adjust the plots are provided in the sidebar in the left side of the application. Plots are re-sizable and easily download as SVGs using the plotly controls. In addition, interactive HTML versions of the plots can be downloaded with the Download buttons above or below each plot.

The Gene (Overview) Tab

This tab provides an overview of the gene-level results for up to two comparisons. It includes a volcano plot, rank plot, and randomly ordered "lawn" plot for each comparison of interest. All plots are interactive and fully customizable via the controls in the sidebar.

Common hits between two comparisons can easily be highlighted by selecting the Highlight Common Hits checkbox in the sidebar.

Highlighting Gene(sets)

Often, it can be useful to label a set of genes on the plots. This can be done by passing a named list of gene identifiers to the genesets argument.

library("msigdbr")

# Retrieve MSigDB Hallmark gene sets and convert to a named list.
gene.sets <- msigdbr(species = "Homo sapiens", category = "H")
gene.sets <- gene.sets %>% split(x = .$gene_symbol, f = .$gs_name)

# Can also add genesets manually.
gene.sets[["my_fav_genes"]] <- c("TOP2A", "FECH", "SOX2", "DUT", "RELA")

CRISPRball(
    gene.data = genes, sgrna.data = sgrnas, count.summary = count.summ,
    norm.counts = norm.counts, genesets = gene.sets
)

Such genesets can then be highlighted very easily on the plots in the Gene (Overview) tab using the "Highlight Gene(sets)" inputs in the sidebar.

Individual genes of interest can also be added by the user manually from within the app.

The sgRNA Tab

This tab is mostly only useful with MAGeCK RRA results. It provides two plots for up to two comparisons. The first allows users to view normalized sgRNA counts across two conditions - which is helpful for identification of poor sgRNAs and validation of hits.

The second provides the sgRNA rank among all sgRNAs for a chosen effect size variable. This provides a useful view of how a given sgRNA compares to all sgRNAs.

The DepMap Tab

DepMap contains a multitude of data for hundreds of cell lines, including CRISPR/RNAi screen results, gene expression data, copy number variation, and more. This data can be extremely useful to remove common dependencies from your own results, look at expression or copy number for a given gene across various lineages or diseases, and to identify hits that are selective to a given (sub)lineage.

For fast access to this data, CRISPRball includes a function (build_depmap_db()) to build a SQLite database using the depmap R package. This database will be large, >4 GB.

This SQLite database can then be passed to the app and the data contained therein easily explored in the DepMap tab.

library("depmap")
library("pool")
library("RSQLite")

# This will take a few minutes to run.
# The database will be named "depmap_db.sqlite" and placed in the working directory.
build_depmap_db()

CRISPRball(
    gene.data = genes, sgrna.data = sgrnas, count.summary = count.summ,
    norm.counts = norm.counts, genesets = gene.sets, depmap.db = "depmap_db.sqlite"
)

Filtering DepMap Common Essential Hits

When provided, one can also easily filter the DepMap common essential hits from the results in the Gene (Overview) tab via the checkboxes in the sidebar.

Additional Help

Almost every input in the app will display a helpful tooltip explaining its function on hover. Plots also have an information icon that will explain the plot contents when hovered or clicked.

Bug reports, feature requests, and questions can be submitted to the issue tracker.