Wrapper and helper functions to use bulk RNA-seq differential expression methods with single-cell data
Developed by Christoph Hafemeister in the Developmental Cancer Genomics group at St. Anna Children's Cancer Research Institute (CCRI)
Motivated by our observation that single-cell RNA-seq differential expression tests within a sample should use pseudo-bulk data of pseudo-replicates.
Install from GitHub
remotes::install_github('cancerbits/DElegate')
Given a Seurat object s
, run differential expression tests between each cluster and the rest of the cells.
de_results <- DElegate::findDE(object = s)
Or find all cluster markers and show top 5 for each cluster
marker_results <- DElegate::FindAllMarkers2(object = s)
dplyr::filter(marker_results, feature_rank < 6)
An overview of the functionality, including examples, can be found here
DElegate
is an R package that allows bulk RNA-seq differential expression methods to be used with single-cell data. It is a light wrapper around
DESeq2
,
edgeR
, and
limma
, similar to the Libra
package. In contrast to Libra
, DElegate
focuses on a few DE methods and will assign cells to pseudo-replicates if no true replicates are available.
All DElegate
functionality is contained in one function - findDE()
. It has one mandatory input argument: object
, which may be of class
Seurat
- the count matrix will be extracted from the'RNA'
assaySingleCellExperiment
- the count matrix will be extracted viacounts()
dgCMatrix
- sparse matrix of theMatrix
packagematrix
To indicate the cell group memberships, you have several options, depending on input type:
Seurat
- in the object viaIdents(object)
, or use thegroup_column
argumentSingleCellExperiment
- in the object viacolLables(object)
, or use thegroup_column
argumentdgCMatrix
, ormatrix
- use themeta_data
andgroup_column
arguments
DElegate
uses bulk RNA-seq DE methods and relies on replicates. If no true replicates are available, it assigns cells to pseudo-replicates. However, if replicates are available in the input, the replicate_column
argument can be used to indicate where to find the labels.
To tell findDE()
which cell groups to compare, use the compare
argument. We provide several ways to set up the comparisons that will be tested:
'each_vs_rest'
, the default, does multiple comparisons, one per group vs all remaining cells'all_vs_all'
, also does multiple comparisons, covering all group pairs- a length one character vector, e.g.
'MONOCYTES'
, does one comparison between that group and the remaining cells - a length two character vector, e.g.
c('T CELLS', 'B CELLS')
, does one comparison between those two groups - a list of length two, e.g.
list(c('T CELLS', 'B CELLS'), c('MONOCYTES'))
, does one comparison after combining groups
Finally, there are currently three DE methods supported
'edger'
usesedgeR::glmQLFit
'deseq'
usesDESeq2::DESeq(test = 'Wald')
'limma'
useslimma::eBayes(trend = TRUE, robust = TRUE)
For complete details, consult the package documentation: ?DElegate::findDE
.
DElegate
supports parallelization via the future
package.
For example, to use the multicore strategy with 12 workers you may call
future::plan(strategy = 'future::multicore', workers = 12)
before DE testing.
See more details at the future
website
Note that every comparison is run single-threaded, but multiple comparisons will be done in parallel.
Trouble shooting: You may get an error regarding future.globals.maxSize
, the maximum allowed total size of global variables. The default value is 500 MiB and may be too small. You may increase it, for example to 8GB, using options(future.globals.maxSize = 8 * 10^9)
.
For reporting progress updates, DElegate
relies on the progressr
package. By default no progress updates are rendered, but may be turned on in an R session: progressr::handlers(global = TRUE)
and its default presentation modified (e.g. progressr::handlers(progressr::handler_progress)
).
See more details at the progressr
website