diff --git a/docs/404.html b/docs/404.html index a5dbb4d7c0..66b0034ba5 100644 --- a/docs/404.html +++ b/docs/404.html @@ -32,7 +32,7 @@
diff --git a/docs/LICENSE.html b/docs/LICENSE.html index 8fa22f8df5..6e4050b8ad 100644 --- a/docs/LICENSE.html +++ b/docs/LICENSE.html @@ -17,7 +17,7 @@ diff --git a/docs/articles/Cell_Bender_Functions.html b/docs/articles/Cell_Bender_Functions.html index 2fcc70c085..6eeabce5d4 100644 --- a/docs/articles/Cell_Bender_Functions.html +++ b/docs/articles/Cell_Bender_Functions.html @@ -33,7 +33,7 @@ @@ -130,7 +130,7 @@vignettes/Cell_Bender_Functions.Rmd
Cell_Bender_Functions.Rmd
Cell Bender is software for elimination of technical artifacts in scRNA-seq/snRNA-seq that uses deep generative model for unsupervised removal of ambient RNA and chimeric PCR artifacts. You can find out more info about Cell Bender from the bioRxiv Preprint, GitHub Repo, and Documentation.
-Following completion of Cell Bender scCustomize contains a couple of functions that may be helpful when creating and visualizing data in Seurat.
+CellBender is software for elimination of technical artifacts in scRNA-seq/snRNA-seq that uses deep generative model for unsupervised removal of ambient RNA and chimeric PCR artifacts. You can find out more info about CellBender from the bioRxiv Preprint, GitHub Repo, and Documentation.
+Following completion of CellBender scCustomize contains a couple of functions that may be helpful when creating and visualizing data in Seurat.
library(tidyverse)
library(patchwork)
@@ -160,11 +160,21 @@ Cell Bender Functionality
Importing Cell Bender H5 Outputs
-The output from Cell Bender is an H5 file that is styled and can be read like 10X Genomics H5 file. However, Seurat::Read10X_h5()
assumes that the file contains no name prefix.
-However,Read10X_h5_Multi_Directory()
can be used when reading in Cell Bender files. and contains additional parameters related to Cell Bender imports.
+The output from CellBender is an H5 file that is styled and can be read like 10X Genomics H5 file. In CellBender pre-v3 this file could be read by Seurat::Read10X_h5()
(although that function assumes that the file contains no name prefix). However, v3+ contains extra information that causes Read10X_h5
to fail.
+scCustomize contains function Read_CellBender_h5_Mat()
can be used when reading in CellBender h5 files regardless of the version of CellBender used.
+
+cell_bender_mat <- Read_CellBender_h5_Mat(file_name = "PATH/SampleA_out_filtered.h5")
+
scCustomize also contains two wrapper functions to easily read multiple CellBender files stored either in single directory or in multiple sub-directories Read_CellBender_h5_Multi_File
and Read_CellBender_h5_Multi_Directory
.
Read_CellBender_h5_Multi_Directory()
.
The following is typical output directory structure for Cell Bender files with sub-directory labeled with sample name and each file also prefixed with the sample name.
Parent_Directory
├── sample_01
@@ -180,31 +190,45 @@ Cell Bender Output Structure
-Read in H5 outputs
-
+
+Read Files
+
All we have to do is adjust the parameters to account for cell bender file names and directory structure.
-
-
secondary_path = ""
as the files are directly within the immediate sub-directory
+secondary_path
In this case we can leave as NULL
because samples are located in immediate subdirectory (secondary path must be the same for all samples).
-
-
default_10X_path = FALSE
because these are not 10X outputs.
--
-
h5_filename = "_out.h5"
specifies the shared aspect of file name that is not part of sample name. (Can also specify “_out_filtered.h5” depending on which file is desired).
+custom_name = "_out.h5"
This specifies what the common file suffix of all files are.
-
-cell_bender_merged <- Read10X_h5_Multi_Directory(base_path = "assets/cell_bender/", secondary_path = "",
- default_10X_path = FALSE, h5_filename = "_out.h5", cell_bender = TRUE, merge = TRUE, sample_names = c("WT1",
- "WT2"), parallel = TRUE, num_cores = 2)
+Optional Parameters
+* parallel
and num_cores
to use multiple core processing. * sample_list
By default Read_CellBender_h5_Multi_Directory
will read in all sub-directories present in parent directory. However a subset can be specified by passing a vector of sample directory names. * sample_names
As with other functions by default Read_CellBender_h5_Multi_Directory
will use the sub-directory names within parent directory to name the output list entries. Alternate names for the list entries can be provided here if desired. These names will also be used to add cell prefixes if merge = TRUE
(see below).
+* merge
logical (default FALSE). Whether to combine all samples into single sparse matrix and using sample_names
to provide sample prefixes.
+
+cell_bender_merged <- Read_CellBender_h5_Multi_Directory(base_path = "assets/Cell_Bender_Example/",
+ custom_name = "_out.h5", sample_names = c("WT1", "WT2"), merge = TRUE)
+
Read_CellBender_h5_Multi_File()
.
-cell_bender_seurat <- CreateSeuratObject(counts = cell_bender_merged, names.field = 1, names.delim = "_")
If all output files are in single directory you can use Read_CellBender_h5_Multi_File
to read in all of the files with single function.
Parent_Directory
+├── CellBender_Outputs
+│ └── sample_01_out.h5
+│ └── sample_02_out.h5
+
+cell_bender_merged <- Read_CellBender_h5_Multi_File(data_dir = "assets/Cell_Bender_Example/", custom_name = "_out.h5",
+ sample_names = c("WT1", "WT2"))
Creating a Seurat object from merged CellBender matrices then is identical to creating any other Seurat object.
+
+cell_bender_seurat <- CreateSeuratObject(counts = cell_bender_merged, names.field = 1, names.delim = "_")
Sometimes it can be helpful to create object that contains both the cell ranger values and cell bender values (we’ll come to why below). scCustomize contains a helper function Create_CellBender_Merged_Seurat()
to handle object creation in one quick step.
-cell_bender_merged <- Read10X_h5_Multi_Directory(base_path = "assets/cell_bender/", secondary_path = "",
- default_10X_path = FALSE, h5_filename = "_out.h5", cell_bender = TRUE, merge = TRUE, sample_names = c("WT1",
- "WT2"), parallel = TRUE, num_cores = 2)
+
+cell_bender_merged <- Read_CellBender_h5_Multi_Directory(base_path = "assets/Cell_Bender_Cell_Ranger_Data/Cell_Bender_Example/",
+ custom_name = "_out.h5", sample_names = c("WT1", "WT2"), merge = TRUE)
-cell_ranger_merged <- Read10X_h5_Multi_Directory(base_path = "assets/cell_ranger/", default_10X_path = TRUE,
- h5_filename = "filtered_feature_bc_matrix.h5", cell_bender = F, merge = TRUE, sample_names = c("WT1",
+cell_ranger_merged <- Read10X_h5_Multi_Directory(base_path = "assets/Cell_Bender_Cell_Ranger_Data/Cell_Ranger_Example/",
+ default_10X_path = FALSE, h5_filename = "filtered_feature_bc_matrix.h5", merge = TRUE, sample_names = c("WT1",
"WT2"), parallel = TRUE, num_cores = 2)
To run the function the user simply needs to provide the names of the two matrices and a name for assay containing the Cell Ranger counts (by default this is named “RAW”).
-+dual_seurat <- Create_CellBender_Merged_Seurat(raw_cell_bender_matrix = cell_bender_merged, raw_counts_matrix = cell_ranger_merged, raw_assay_name = "RAW")
Users can specify any additional parameters normally passed to Seurat::CreateSeuratObject()
when using this function.
+dual_seurat <- Create_CellBender_Merged_Seurat(raw_cell_bender_matrix = cell_bender_merged, raw_counts_matrix = cell_ranger_merged, raw_assay_name = "RAW", min_cells = 5, min_features = 200)
scCustomize includes function Add_Cell_Bender_Diff()
to help with this process. This function will take the nCount and nFeature statistics from both assays in the object and calculate the difference and return 2 new columns (“nCount_Diff” and “nFeature_Diff”) to the object meta.data.
-astrocytes_cortex <- Add_Cell_Bender_Diff(seurat_object = astrocytes_cortex, raw_assay_name = "RAW",
- cell_bender_assay_name = "RNA")
+scCustomize includes function Add_CellBender_Diff()
to help with this process. This function will take the nCount and nFeature statistics from both assays in the object and calculate the difference and return 2 new columns (“nCount_Diff” and “nFeature_Diff”) to the object meta.data.
+
+dual_seurat <- Add_CellBender_Diff(seurat_object = dual_seurat, raw_assay_name = "RAW", cell_bender_assay_name = "RNA")
-head(astrocytes_cortex@meta.data, 5)
+head(dual_seurat@meta.data, 5)
We can then use Median_Stats()
to calculate per sample averages across all cells by supplying the new variables to the median_var
parameter.
+median_stats <- Median_Stats(seurat_object = dual_seurat, group_by_var = "orig.ident", median_var = c("nCount_Diff",
+ "nFeature_Diff"))
+orig.ident + | ++Median_nCount_RNA + | ++Median_nFeature_RNA + | ++Median_nCount_Diff + | ++Median_nFeature_Diff + | +||||
---|---|---|---|---|---|---|---|---|
+WT1 | -2504 +7724.5 | -1235 - | --4 - | --4 +2936 | -142 +216 | -251 +73 | ||
-WT-2a_AAACGCTCATGTTACG-1 - | --WT-2a - | --2685 - | --1518 +WT2 | -2999 +7404.5 | -1694 +2803 | -0.0744879 +212 | -0.8193669 +61 | --0.8938547 + |
+Totals (All Cells) | -2720 +7565.0 | -1509 - | --4 - | --4 +2869 | -176 +214 | -314 +67 |
We can then use Median_Stats()
to calculate per sample averages across all cells by supplying the new variables to the median_var
parameter.
-median_stats <- Median_Stats(seurat_object = astrocytes_cortex, group_by_var = "orig.ident", median_var = c("nCount_Diff",
- "nFeature_Diff"))
It can also be helpful to understand what features may have changed the most. scCustomize provides the function CellBender_Feature_Diff
to determine changes in features. This will return a data.frame with rowSums for each feature, the difference in each feature, and the percent change of each feature.
+feature_diff <- CellBender_Feature_Diff(seurat_object = dual_seurat, raw_assay = "RAW", cell_bender_assay = "RNA")
-orig.ident - | --Median_nCount_RNA | -Median_nFeature_RNA - | --Median_percent_mito +Raw_Counts | -Median_percent_ribo +CellBender_Counts | -Median_percent_mito_ribo - | --Median_nCount_Diff +Count_Diff | -Median_nFeature_Diff +Pct_Diff |
---|---|---|---|---|---|---|---|
-WT-2a - | --2949.5 - | --1527.5 - | --0.0224121 +Gm42418 | -0.0673027 +256834 | -0.0948317 +3198 | -280 +253636 | -156 +98.75484 |
-WT-6a +Uqcr11 | -2765.0 +1315 | -1451.5 +20 | -0.0352609 +1295 | -0.0656276 +98.47909 + | +|||
+Gm48099 | -0.1065341 +277 | -279 +5 | -154 +272 + | ++98.19495 | |||
-Totals (All Cells) +Tmsb4x + | ++6010 | -2875.0 +116 | -1498.5 +5894 | -0.0293169 +98.06988 + | +|||
+Rps21 | -0.0665447 +2597 | -0.1006627 +61 | -279 +2536 | -156 +97.65114 |
In addition to returning the data.frame it can be useful to visually examine the changes/trends after running CellBender. The function CellBender_Diff_Plot
takes the data.frame from CellBender_Feature_Diff
as input and plots the results.
Optional Parameters
+pct_diff_threshold
plot genes that exhibit a change equal to or greater than this threshold.num_features
instead of plotting genes above a threshold simply plot the top X changed genes.num_labels
change how many genes are labeled.label
logical, whether or not to label features.custom_labels
specify vector of specific features to label.
+p1 <- CellBender_Diff_Plot(feature_diff_df = feature_diff)
+p2 <- CellBender_Diff_Plot(feature_diff_df = feature_diff, pct_diff_threshold = 50)
+p3 <- CellBender_Diff_Plot(feature_diff_df = feature_diff, num_features = 500, pct_diff_threshold = NULL)
+p4 <- CellBender_Diff_Plot(feature_diff_df = feature_diff, num_labels = 10)
+p5 <- CellBender_Diff_Plot(feature_diff_df = feature_diff, label = F)
+p6 <- CellBender_Diff_Plot(feature_diff_df = feature_diff, custom_labels = "Gm48099")
+
+wrap_plots(p1, p2, p3, p4, p5, p6, ncol = 2)
First let’s plot gene that represents ambient RNA as it’s restricted in expression to neurons (synaptic gene). If Cell Bender has worked well we expect that expression of this gene will be very different between the two assays.
+First let’s plot gene that represents ambient RNA as it’s restricted in expression to neurons (synaptic gene). If Cell Bender has worked well we expect that expression of this gene will be very different between the two assays.
Now let’s plot normally astrocyte restricted gene. If Cell Bender has worked well we expect that expression of this gene shouldn’t be very different between the two assays.
+Now let’s plot normally astrocyte restricted gene. If Cell Bender has worked well we expect that expression of this gene shouldn’t be very different between the two assays.
vignettes/Color_Palettes.Rmd
Color_Palettes.Rmd
vignettes/FAQ.Rmd
FAQ.Rmd
vignettes/Gene_Expression_Plotting.Rmd
Gene_Expression_Plotting.Rmd
vignettes/Helpers_and_Utilities.Rmd
Helpers_and_Utilities.Rmd
vignettes/Installation.Rmd
Installation.Rmd
vignettes/Iterative_Plotting.Rmd
Iterative_Plotting.Rmd
vignettes/LIGER_Functions.Rmd
LIGER_Functions.Rmd
vignettes/Markers_and_Cluster_Annotation.Rmd
Markers_and_Cluster_Annotation.Rmd
vignettes/Misc_Functions.Rmd
Misc_Functions.Rmd
vignettes/QC_Plots.Rmd
QC_Plots.Rmd
vignettes/Read_and_Write_Functions.Rmd
Read_and_Write_Functions.Rmd
vignettes/Sequencing_QC_Plots.Rmd
Sequencing_QC_Plots.Rmd
vignettes/Statistics.Rmd
Statistics.Rmd
NEWS.md
+ CellBender_Feature_Diff
to return data.frame with count sums and differences between raw and CellBender assays.CellBender_Diff_Plot
to plot differences between raw and CellBender assays using data from CellBender_Feature_Diff
.R/Object_Utilities.R
+ Add_CellBender_Diff.Rd
Calculate the difference in features and UMIs per cell when both cell bender and raw assays are present.
+Add_CellBender_Diff(seurat_object, raw_assay_name, cell_bender_assay_name)
object name.
name of the assay containing the raw data.
name of the assay containing the Cell Bender'ed data.
Seurat object with 2 new columns in the meta.data slot.
+if (FALSE) {
+object <- Add_CellBender_Diff(seurat_object = obj, raw_assay_name = "RAW", cell_bender_assay_name = "RNA")
+}
+
+
R/Statistics_Plotting.R
+ CellBender_Diff_Plot.Rd
Plot of total cell or nuclei number per sample grouped by another meta data variable.
+CellBender_Diff_Plot(
+ feature_diff_df,
+ pct_diff_threshold = 25,
+ num_features = NULL,
+ label = TRUE,
+ num_labels = 20,
+ repel = TRUE,
+ custom_labels = NULL,
+ plot_line = TRUE,
+ plot_title = "Raw Counts vs. Cell Bender Counts",
+ x_axis_label = "Raw Data Counts",
+ y_axis_label = "Cell Bender Counts",
+ xnudge = 0,
+ ynudge = 0,
+ max.overlaps = 100,
+ label_color = "dodgerblue",
+ fontface = "bold",
+ label_size = 3.88,
+ bg.color = "white",
+ bg.r = 0.15,
+ ...
+)
name of data.frame created using CellBender_Feature_Diff
logical, whether or not to label the features that have largest percent difference +between raw and CellBender counts (Default is TRUE).
Number of features to label if label = TRUE
, (default is 20).
logical, whether to use geom_text_repel to create a nicely-repelled labels; this is +slow when a lot of points are being plotted. If using repel, set xnudge and ynudge to 0, (Default is TRUE).
A custom set of features to label instead of the features most different between +raw and CellBender counts.
logical, whether to plot diagonal line with slope = 1 (Default is TRUE).
Plot title.
Label for x axis.
Label for y axis.
Amount to nudge X and Y coordinates of labels by.
Amount to nudge X and Y coordinates of labels by.
passed to geom_text_repel
, exclude text labels that
+overlap too many things. Defaults to 100.
Color to use for text labels.
font face to use for text labels (“plain”, “bold”, “italic”, “bold.italic”) (Default is "bold").
text size for feature labels (passed to geom_text_repel
).
color to use for shadow/outline of text labels (passed to geom_text_repel
) (Default is white).
radius to use for shadow/outline of text labels (passed to geom_text_repel
) (Default is 0.15).
Extra parameters passed to geom_text_repel
through
+LabelPoints
.
A ggplot object
+if (FALSE) {
+# get cell bender differences data.frame
+cb_stats <- CellBender_Feature_Diff(seurat_object - obj, raw_assay = "RAW",
+cell_bender_assay = "RNA")
+
+# plot
+CellBender_Diff_Plot(feature_diff_df = cb_stats, pct_diff_threshold = 25)
+}
+
+
Get quick values for raw counts, CellBender counts, count differences, and percent count differences +per feature.
+CellBender_Feature_Diff(seurat_object, raw_assay, cell_bender_assay)
Seurat object name.
Name of the assay containing the raw count data.
Name of the assay containing the CellBender count data.
A data frame containing summed raw counts, CellBender counts, count difference, and +percent difference in counts.
+if (FALSE) {
+cb_stats <- CellBender_Feature_Diff(seurat_object - obj, raw_assay = "RAW",
+cell_bender_assay = "RNA")
+}
+
+
Calculate and add differences post-cell bender analysis
Functions quick return of different object and data metrics.
CellBender Feature Differences
Calculate Cluster Stats
Calculate percent of expressing cells
Plot Number of Cells/Nuclei per Sample