The domino2 analysis pipeline requires the use of a reference receptor-ligand database to identify potential receptor-ligand interactions. We recommend the use of CellPhoneDB. Here is a very brief tutorial on how to download the requisite files from CellPhoneDB v4.0.0.
+
+
File Downloads for CellPhoneDB:
+
+
Database files from CellPhoneDB v4.0.0 for human scRNAseq data can be installed from a public Github repository from the Tiechmann Group that developed CellPhoneDB.
+
+# URL for desired version of cellphoneDB
+cellphone_url<-"https://github.com/ventolab/cellphonedb-data/archive/refs/tags/v4.0.0.tar.gz"
+
+# download compressed database
+cellphone_tar<-paste0(temp_dir, "/cellphoneDB_v4.tar.gz")
+download.file(url =cellphone_url, destfile =cellphone_tar)
+
+# move contents of the compressed file to a new directory
+untar(tarfile =cellphone_tar, exdir =cellphone_dir)
+cellphone_data<-paste0(cellphone_dir, "/cellphonedb-data-4.0.0/data")
+
To facilitate the use of these files in the format used in domino2, we include a helper function, create_rl_map_cellphonedb(), that automatically parses files from the cellPhoneDB database to arrive at the rl_map format. For more information on how to use these files in the domino2 pipeline, please see our Getting Started page. To learn how to use SCENIC for TF activation scoring, please see our Using SCENIC for TF Activation tutorial.
domino2 is a tool for analysis of intra- and intercellular signaling in single cell RNA sequencing (scRNAseq) data based on transcription factor (TF) activation. Here, we show a basic example pipeline for generating communication networks as a domino object. This vignette will demonstrate how to use domino2 on the 10X Genomics Peripheral Blood Mononuclear Cells (PBMC) data set of 2,700 cells. The data can be downloaded here.
Analysis of cell-cell communication with domino2 often follows initial processing and annotation of scRNAseq data. We used the Satija Lab’s Guided Clustering Tutorial to prepare the data set for analysis with domino2. The complete processing script is available in the data-raw directory of the domino2 package. The processed data can be downloaded from Zenodo.
domino2 was designed to be compatible with Seurat objects in addition to accepting matrices and vectors of the data set as function inputs. However, pySCENIC is used for TF activation inference, and it requires as input a cell by gene matrix. This is the opposite orientation of Seurat objects but is the default for Python based tools such as scanpy. The RNA counts matrix can be saved as a tab-seperated value (.tsv) or comma-sperated value (.csv) file after transposing the matrix to cell by gene orientation.
tsv and csv files are very inefficient for storing large matrices. As an alternative, the counts matrix can be saved as a loom file and directly passed to pySCENIC in this format. Generating a loom file in R requires use of the loomR package. The package automatically handles the transposition of the counts matrix to cell by gene orientation as well. We recommend this approach to passing a counts matrix to pySCENIC. However, users should be aware that loomR is not hosted on CRAN or BioConductor at the time of this vignette’s creation.
-
+
library(loomR)# save loom counts matrix
-pbmc_counts<-GetAssayData(object =pbmc, slot ="counts")
+pbmc_counts<-GetAssayData(object =pbmc, slot ="counts")pbmc_loom<-loomR::create(filename =paste0(input_dir, "/pbmc3k_counts.loom"), data =pbmc_counts)# connection to the loom file must be closed to complete file creationpbmc_loom$close_all()
For this tutorial, pySCENIC is used as the method for TF activity inference, and the assessment of ligand-receptor interactions is based on curated interactions from CellPhoneDB v4.0.0. Each of these requires downloading some files prior to use in our analysis pipeline.
-
-
Downloads for SCENIC:
-
-
A singularity image of SCENIC v0.12.1 can be installed from the DockerHub image as a source (example scripts for use with Docker are provided in the scenic_bash directory as well). SCENIC requires a list of TFs, motif annotations, and cisTarget motifs which are all available from the authors of SCENIC for human (HGNC), mouse (MGI), and fly. The following will download everything necessary for an analysis of a data set with HGNC gene labels for the hg38 genome.
-
SCENIC_DIR="'temp_dir'/scenic"
-mkdir${SCENIC_DIR}
-
-# Build singularity image
-singularity build "${SCENIC_DIR}/aertslab-pyscenic-0.12.1.sif" docker://aertslab/pyscenic:0.12.1
-
-# Matrix containing motifs as rows and genes as columns and ranking position for each gene and motif (based on CRM scores) as values. URLs provided link to v2 feather files required for the 0.12.1 version of pySENIC.
-
-curl"https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc_v10_clust/gene_based/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather" \
- -o "${SCENIC_DIR}/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather"
-
-curl"https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc_v10_clust/gene_based/hg38_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather" \
- -o "${SCENIC_DIR}/hg38_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather"
-
-# List of genes encoding TFs in the HG38 reference genome
-curl"https://resources.aertslab.org/cistarget/tf_lists/allTFs_hg38.txt" \
- -o "${SCENIC_DIR}/allTFs_hg38.txt"
-
-# Motif annotations based on the 2017 cisTarget motif collection. Use these files if you are using the mc9nr databases.
-curl"https://resources.aertslab.org/cistarget/motif2tf/motifs-v9-nr.hgnc-m0.001-o0.0.tbl" \
- -o "${SCENIC_DIR}/motifs-v9-nr.hgnc-m0.001-o0.0.tbl"
-
-
-
Downloads for CellPhoneDB:
-
-
Database files from CellPhoneDB v4.0.0 for human scRNAseq data can be installed from a public Github repository from the Tiechmann Group that developed CellPhoneDB.
-
-#
-cellphone_url<-"https://github.com/ventolab/cellphonedb-data/archive/refs/tags/v4.0.0.tar.gz"
-
-# download compressed database
-cellphone_tar<-paste0(temp_dir, "/cellphoneDB_v4.tar.gz")
-download.file(url =cellphone_url, destfile =cellphone_tar)
-
-# move contents of the compressed file to a new directory
-untar(tarfile =cellphone_tar, exdir =cellphone_dir)
-cellphone_data<-paste0(cellphone_dir, "/cellphonedb-data-4.0.0/data")
-
-
-
-
-
SCENIC Analysis
-
-
SCENIC is our lab’s preferred software for TF activity scoring. We recommend using the Python implementation (pySCENIC) as it is faster than the original R implementation. Be aware, the use of SCENIC is often the slowest and most memory intensive step of this analysis pipeline. SCENIC should be run on computing resources with access to multi-core processing and large amounts of memory.
-
As described in the section on saving preprocessed data, SCENIC requires RNA counts to be provided in a cell by gene matrix format, so we will be using the pbmc3k_counts.loom file in this stage of analysis.
-
pySCENIC is initiated using bash scripting in the terminal. The analysis consists of 3 steps to score genes for TF motif enrichment, construct TF regulons consisting of genes targeted by the TFs, and arrive at AUC scores for enrichment of regulon gene transcription within each cell.
-
-
grn: construct TF-modules
-
-
Co-expression modules are used to quantify gene-TF adjacencies.
-
singularity exec "${SCENIC_DIR}/aertslab-pyscenic-0.12.1.sif" pyscenic grn \
-"pbmc3k_counts.loom" \
-"${SCENIC_DIR}/allTFs_hg38.txt" \
- -o "${SCENIC_DIR}/pbmc_adj_3k.tsv" \
- --num_workers 6 \
- --seed 123
-
-# Arguments:
-# path to the loom file
-# list of TFs
-# output directory for the adjacency matrix
-# number of CPUs to use if multi-core processing is available
-# specify a random seed for reproducibility
-
-
-
ctx: construct TF regulons with pruning based on TF motif enrichment
-
-
The rankings of genes based on enrichment of TF motifs to the transcription start site (TSS) are considered in the construction of regulons, where target genes in the TF-modules are removed if they lack motifs proximal to the TSS where the TF may bind.
-
singularity exec "${SCENIC_DIR}/aertslab-pyscenic-0.12.1.sif" pyscenic ctx \
-"${SCENIC_DIR}/pbmc_adj.tsv" \
-"${SCENIC_DIR}/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather" \
-"${SCENIC_DIR}/hg38_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather" \
- --annotations_fname "${SCENIC_DIR}/motifs-v9-nr.hgnc-m0.001-o0.0.tbl" \
- --expression_mtx_fname "pbmc3k_counts.loom" \
- --mode "dask_multiprocessing" \
- --output "${SCENIC_DIR}/regulons_pbmc_3k.csv" \
- --num_workers 1
-
-# Arguments:
-# adjacency matrix output from grn
-# target rankings of motif enrichment within 10 kb of TSS
-# target rankings of motif enrichment within 500 bp upstream and 100 bp downstream of TSS
-# TF motif features
-# counts matrix loom file
-# enable multi-core processing
-# output file of learned TF regulons
-# number of CPU cores
-
-
-
aucell: calculate TF activity scores
-
-
Enrichment of a regulon is measured as the Area Under the recovery Curve (AUC) of the genes that define this regulon.
Building a Signaling Network with domino2
@@ -249,18 +202,18 @@
Building a Signaling Network
Loading SCENIC Results
The TF activities are a required input for domino2, and the regulons learned by SCENIC are an important but optional input needed to annotate TF-target interactions and to prune TF-receptor linkages where the receptor is a target of the TF. This prevents the distinction of receptor expression driving TF activity or the TF inducing the receptor’s expression.
The initial regulons data frame read into R from the ctx function has 2 rows of column names that need to be replaced with one succinct description. domino2 has changed the input format for TF regulons to be a list storing vectors of target genes in each regulon, where the names of the list are the TF genes. This facilitates the use of alternative methods for TF activity quantification. We provide a helper function, create_regulon_list_scenic() for easy retrieval of TF regulons from the output of the pySCENIC ctx function.
Users should be aware that the AUC matrix from SCENIC is loaded in a cell x TF orientation and should be transposed to TF x cell orientation. pySCENIC also appends “(+)” to TF names that are converted to “…” upon loading into R. These characters can be included without affecting the results of domino2 analysis, but can be confusing when querying TF features in the data. We recommend comprehensive removal of the “…” characters using the gsub() function.
-
+
auc_in<-as.data.frame(t(auc))# Remove pattern '...' from the end of all rownames:rownames(auc_in)<-gsub("\\.\\.\\.$", "", rownames(auc_in))
The change to use rl_map formatting also enables users to manually append interactions of interest that are not included in the interaction database if need be. This can be attained by formatting the desired interactions as a data.frame with the same column headers as the rl_map and using the rbind() function.
-
+
# Integrin complexes are not annotated as receptors in CellPhoneDB_v4.0# collagen-integrin interactions between cells may be missed unless tables from# the CellPhoneDB reference are edited or the interactions are manually added
@@ -584,9 +537,9 @@
domino2 infers active receipt of signals via receptors based on the correlation of the receptor’s expression with TF activity across the data set and differential activity of the TF within a cell cluster. Correlations are conducted using scaled expression values rather than raw counts or normalized counts. For assessment of receptor activity on a per cell type basis, a named vector of cell cluster assignments, where the names are cell barcodes matching the expression matrix, are provided. Assessing signaling based on other categorical groupings of cells can be achieved by passing these groupings as “clusters” to build_domino() in place of cell types. domino2 accepts either a Seurat object with counts, scaled counts, and clusters included, or it requires a matrix of counts, a matrix of scaled counts, and a named vector of cell cluster labels as factors. Shown below is how to extract these elements from a Seurat object.
Note: Ligand and receptor expression can only be assessed for genes included in the z_scores matrix. Many scRNAseq analysis pipelines recommend storing only genes with high variance in scaled expression slots for these data objects, thereby missing many genes encoding ligands and receptors. Ensure that all your genes of interest are included in the rows of your z_scores matrix. Scaled expression was calculated for all genes in this PBMC data set after removal of genes expressed in less than 3 cells.
function can be used to make the object. Parameters of note include “use_clusters” which is required to assess signaling between cell types rather than linkage of TFs and receptors broadly across the data set. “use_complexes” decides if receptors that function in heteromeric complexes will be considered in testing linkages between TFs and receptors. If TRUE, a receptor complex is only linked to a TF if a majority of the component genes meet the Spearman correlation threshold. “remove_rec_dropout” decides whether receptor values of 0 will be considered in correlation calculations; this measure is intended to reduce the effect of dropout on receptors with low expression.
The above results will be equivalent to using the Seurat object directly. Note that inclusion of both the matrix inputs and the Seurat object will prioritize the Seurat object.
-
+
pbmc_dom<-create_domino(rl_map =rl_map, features =auc_in, ser =pbmc, tf_targets =regulon_list, use_clusters =TRUE, use_complexes =TRUE, remove_rec_dropout =FALSE)
Build Dominobuild_domino() finalizes the construction of the domino object by setting parameters for identifying TFs with differential activation between clusters, receptor linkage with TFs based on magnitude of positive correlation, and the minimum percentage of cells within a cluster that have expression of a receptor for the receptor to be called as active.
There are also options for thresholds of the number of TFs that may be called active in a cluster and the number of receptors that may be linked to any one TF. For thresholds of n TFs and m receptors, the bottom n TFs by lowest p-values from the wilcoxon rank sum test and the top m receptors by Spearman correlation coefficient are chosen.
-
+
pbmc_dom<-build_domino(dom =pbmc_dom, min_tf_pval =0.001, max_tf_per_clust =25, max_rec_per_tf =25, rec_tf_cor_threshold =0.25, min_rec_percentage =0.1)# min_tf_pval: Threshold for p-value of DE for TFs rec_tf_cor_threshold:# Minimum correlation between receptor and TF min_rec_percentage: Minimum# percent of cells that must express receptor
Both of the thresholds for the number of receptors and TFs can be sent to infinity (Inf) to collect all receptors and TFs that meet statistical significance thresholds.
The cumulative degree of signaling between clusters is assessed as the sum of the scaled expression of ligands targeting active receptors on another cluster. This can be visualized in graph format using the signaling_network() function. Nodes represent each cell cluster and edges scale with the magnitude of signaling between the clusters. The color of the edge corresponds to the sender cluster for that signal.
Signaling networks can also be drawn with the edges only rendering the signals directed towards a given cell type or signals from one cell type directed to others. To see those options in use, as well as other plotting functions and their options, please see our plotting vignette.
Specific Signaling Interactions between Clusters
Beyond the aggregated degree of signaling between cell types, the degrees of signaling through specific ligand-receptor interactions can be assessed. gene_network() provides a graph of linkages between active TFs in a cluster, the linked receptors in that cluster, and the possible ligands of these active receptors.
New to domino2, gene_network() can be used between two clusters to determine if any of the possible ligands for a given receptor are expressed by a putative outgoing signaling cluster.
Another form of comprehensive ligand expression assessment is available for individual active receptors in the form of circos plots new to domino2. The outer arcs correspond to clusters in the domino object with inner arcs representing each possible ligand of the plotted receptor. Arcs are drawn between ligands on a cell type and the receptor if the ligand is expressed above the specified threshold. Arc widths correspond to the mean express of the ligand by the cluster with the widest arc width scaling to the maximum expression of the ligand within the data.
If, for some reason, you find yourself in need of the entire linkage structure (not recommended), it can be accessed through its slot name; domino objects are S4 objects.
-all_linkages<-slot(dom, "linkages")
+all_linkages<-slot(dom, "linkages")# Names of all sub-structures:names(all_linkages)#> [1] "complexes" "rec_lig" "tf_targets"
diff --git a/docs/articles/index.html b/docs/articles/index.html
index b528a44..5183e6f 100644
--- a/docs/articles/index.html
+++ b/docs/articles/index.html
@@ -24,6 +24,8 @@
For this tutorial, pySCENIC is used as the method for TF activity inference. This requires downloading some files prior to use. Here, we show how to download these files and the way we run pySCENIC to generate the necessary files for use in domino2 analyis.
+
A singularity image of SCENIC v0.12.1 can be installed from the DockerHub image as a source. SCENIC requires a list of TFs, motif annotations, and cisTarget motifs which are all available from the authors of SCENIC for human (HGNC), mouse (MGI), and fly. The following will download everything necessary for an analysis of a data set with HGNC gene labels for the hg38 genome.
+
SCENIC_DIR="'temp_dir'/scenic"
+mkdir${SCENIC_DIR}
+
+# Build singularity image
+singularity build "${SCENIC_DIR}/aertslab-pyscenic-0.12.1.sif" docker://aertslab/pyscenic:0.12.1
+
+# Matrix containing motifs as rows and genes as columns and ranking position for each gene and motif (based on CRM scores) as values. URLs provided link to v2 feather files required for the 0.12.1 version of pySENIC.
+
+curl"https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc_v10_clust/gene_based/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather" \
+ -o "${SCENIC_DIR}/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather"
+
+curl"https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/refseq_r80/mc_v10_clust/gene_based/hg38_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather" \
+ -o "${SCENIC_DIR}/hg38_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather"
+
+# List of genes encoding TFs in the HG38 reference genome
+curl"https://resources.aertslab.org/cistarget/tf_lists/allTFs_hg38.txt" \
+ -o "${SCENIC_DIR}/allTFs_hg38.txt"
+
+# Motif annotations based on the 2017 cisTarget motif collection. Use these files if you are using the mc9nr databases.
+curl"https://resources.aertslab.org/cistarget/motif2tf/motifs-v9-nr.hgnc-m0.001-o0.0.tbl" \
+ -o "${SCENIC_DIR}/motifs-v9-nr.hgnc-m0.001-o0.0.tbl"
+
+
+
SCENIC Analysis
+
+
SCENIC is our lab’s preferred software for TF activity scoring. We recommend using the Python implementation (pySCENIC) as it is faster than the original R implementation. Be aware, the use of SCENIC is often the slowest and most memory intensive step of this analysis pipeline. SCENIC should be run on computing resources with access to multi-core processing and large amounts of memory.
+
As described in the section on saving preprocessed data, SCENIC requires RNA counts to be provided in a cell by gene matrix format. While this can be down with tsv and csv files, they are very inefficient for storing large matrices. As an alternative, the counts matrix can be saved as a loom file and directly passed to pySCENIC in this format. Generating a loom file in R requires use of the loomR package. The package automatically handles the transposition of the counts matrix to cell by gene orientation as well. We recommend this approach to passing a counts matrix to pySCENIC. However, users should be aware that loomR is not hosted on CRAN or BioConductor at the time of this vignette’s creation.
+
+library(loomR)
+
+# save loom counts matrix
+pbmc_counts<-GetAssayData(object =pbmc, slot ="counts")
+pbmc_loom<-loomR::create(filename =paste0(input_dir, "/pbmc3k_counts.loom"), data =pbmc_counts)
+# connection to the loom file must be closed to complete file creation
+pbmc_loom$close_all()
+
We will demonstrate the use of the pbmc3k_counts.loom file in this stage of analysis.
+
pySCENIC is initiated using bash scripting in the terminal. The analysis consists of 3 steps to score genes for TF motif enrichment, construct TF regulons consisting of genes targeted by the TFs, and arrive at AUC scores for enrichment of regulon gene transcription within each cell.
+
+
grn: construct TF-modules
+
+
Co-expression modules are used to quantify gene-TF adjacencies.
+
singularity exec "${SCENIC_DIR}/aertslab-pyscenic-0.12.1.sif" pyscenic grn \
+"pbmc3k_counts.loom" \
+"${SCENIC_DIR}/allTFs_hg38.txt" \
+ -o "${SCENIC_DIR}/pbmc_adj_3k.tsv" \
+ --num_workers 6 \
+ --seed 123
+
+# Arguments:
+# path to the loom file
+# list of TFs
+# output directory for the adjacency matrix
+# number of CPUs to use if multi-core processing is available
+# specify a random seed for reproducibility
+
+
+
ctx: construct TF regulons with pruning based on TF motif enrichment
+
+
The rankings of genes based on enrichment of TF motifs to the transcription start site (TSS) are considered in the construction of regulons, where target genes in the TF-modules are removed if they lack motifs proximal to the TSS where the TF may bind.
+
singularity exec "${SCENIC_DIR}/aertslab-pyscenic-0.12.1.sif" pyscenic ctx \
+"${SCENIC_DIR}/pbmc_adj.tsv" \
+"${SCENIC_DIR}/hg38_10kbp_up_10kbp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather" \
+"${SCENIC_DIR}/hg38_500bp_up_100bp_down_full_tx_v10_clust.genes_vs_motifs.rankings.feather" \
+ --annotations_fname "${SCENIC_DIR}/motifs-v9-nr.hgnc-m0.001-o0.0.tbl" \
+ --expression_mtx_fname "pbmc3k_counts.loom" \
+ --mode "dask_multiprocessing" \
+ --output "${SCENIC_DIR}/regulons_pbmc_3k.csv" \
+ --num_workers 1
+
+# Arguments:
+# adjacency matrix output from grn
+# target rankings of motif enrichment within 10 kb of TSS
+# target rankings of motif enrichment within 500 bp upstream and 100 bp downstream of TSS
+# TF motif features
+# counts matrix loom file
+# enable multi-core processing
+# output file of learned TF regulons
+# number of CPU cores
+
+
+
aucell: calculate TF activity scores
+
+
Enrichment of a regulon is measured as the Area Under the recovery Curve (AUC) of the genes that define this regulon.
The csv files created in this analysis can be used as the TF activation inputs in a domino2 analysis. Please see our Getting Started page for more information on how to perform an analysis with domino2. For a brief tutorial on the download of the necessary files for using cellphonedb, please see the Using the cellphoneDB database vignette.
+
Vignette Build Information Date last built and session information:
A ligand-receptor database is used to map linkages between ligands and receptors (we recommend using cellphoneDB, but other methods can be used as well).
+
Transcription factor activation scores are calculated (we recommend using pySCENIC, but other methods can be used as well). For more information on how to use SCENIC, please see our Using SCENIC for TF Activation page.
+
A ligand-receptor database is used to map linkages between ligands and receptors (we recommend using cellphoneDB, but other methods can be used as well). For information on downloading the necessary files for cellphoneDB, please see our Using the cellphoneDB Database page.
A domino object is created using counts, z-scored counts, clustering information, and the data from steps 1 and 2.
Parameters such as the maximum number of transcription factors and receptors or the minimum correlation threshold (among others) are used to make a cell communication network
Communication networks can be extracted from within the domino object or visualized using a variety of plotting functions
-
Please see the Getting Started page for an example analysis that includes all of these steps in detail, from downloading and running pySCENIC to building a domino object and visualizing domino results. Other articles include further details on plotting functions and the structure of the domino object.
+
Please see the Getting Started page for an example analysis that includes all of these steps in a domino2 analysis in detail, from creating a domino object to parameters for building the network to visualizing domino results. Other articles include further details on plotting functions and the structure of the domino object.