ClonoCluster_paper

Intro

Raw data and analysis scripts for the ClonoCluster paper. Includes additional figures generated and analyses that were not present in the publication.

For the actual distributed open source software, including worked examples, check out this repository.

Dependencies

In addition to installing ClonoCluster, this workflow requires the following R packages from CRAN:

magrittr
data.table
ggplot2
entropy
WebGestaltR

Get the raw data

Due to size limitations, the raw data for this study cannot be directy hosted on GitHub. I have compressed it and uploaded it with the release of this package. You will need to download (click here) and install pixz to uncompress this package.

pixz -d ClonoCluster_raw_data.tpxz

Then untar the package.

tar -xvf ClonoCluster_raw_data.tar

Now copy the data folder into the repository.

cp Data_genes/ ./ClonoCluster_paper/Data/

Datasets

There are 9 replicates from six sources in the raw data, the short names for them are as follows:

YG1-3: WM989 melanoma cells treated with low and high doses of vemurafenib (one low dose and two high dose), link.
Kleind2ws: murine hematopoietic stem cells differentiating in vitro harvested at day 2, link.
CJ: human induced pluripotent stem cell line directed toward cardiomyocyte fate, link.
WM9891-2: WM983b melanoma cells treated with vemurafenib (2 replicates), link.
MDA1-2: MDA breast cancer cells treated with Paclitaxel, link

Manifest

Data/ - raw data folder
- Data_barcodes/ - barcode assignments for each dataset, raw data
- Data_genes/ - normalized/scaled count matrices for each dataset in gene by cell format, raw data, included as a compressed file with the GitHub release of this package
Paper/ - Scripts to process raw data and generate figures
- Run_full_analysis.R - script that when run will run the complete analysis and generate all processed data and figures by running scripts in the extractionScripts/ and plotScripts/ directories. Fully annotated and may be run line by line or sourced from the package root directory.
```
setwd("ClonoCluster_paper/")

source("Run_full_analysis.R")
```
- extractionScripts/ - scripts to generate processed data
  1. Constants.R - Script with paths and variables for all analyses
  2. Long_alluvia.R - Generate cluster assignments for a range of alphas, and plot long Sankeys.
  3. Marker_analysis.R - Identify cluster markers for hybrid clusters, reorganization markers, and perform gene set overrepresentation analysis
- extractedData/ - processed data, generated from the raw data by the scripts in extractionScripts/
  1. *_cluster_assignments.txt - hybrid clustering assignments, generated by Long_alluvia.R
  2. *_auc_linclust.txt - ROC-derived AUC values for all possible cluster markers for each clustering level, from Marker_analysis.R.
  3. *_auc_barcodes.txt - AUC to identify reorganization markers between a paired transcriptome cluster and a hybrid cluster, from Marker_analysis.R.
  4. *_rearrangment_ORA.txt - Output from WebGestaltR overrepresentation analysis of reorganization markers, from Marker_analysis.R.
- plotScripts/ - Output figures from analysis scripts, divided by subfolder for each script.
  1. cluster_size_plots.R - generate line graphs of alpha vs number of clusters for all samples seen in figure 1A and Figure S3.
  2. Confusion_stats.R - generate boxplots of alpha vs cohen's kappa, as in Figure S2B.
  3. Fig4_clusters.R - generate warped umaps found in Figure 4B-C.
  4. Fig4.R - generate warp factor umaps and PC distributions from simulated data as found in Figure 4A and S6.
  5. geneset_hm.R - generate heatmap from overrepresentation analysis results, as in Figure 3.
  6. grid_graph.R - generate simulated representation of network graphs, Figure S1B.
  7. Heme_comp.R - Plots of entropy of hematopoietic cell types from Weinreb et al., Figure S7.
  8. Labeled_umaps.R - generate UMAPs and Sankey plots showing combined warp factor and hybrid clustering in Figure 5.
  9. Marker_box_and_venn.R - generate boxplots of top cluster marker strength for all samples as in Figure 2A and Figure S5B, as well as venn diagrams for top cluster marker overlap as in Figure S4.
  10. Marker_sankey.R - Sankeys of marker positivity, as in Figure 2B-D.
  11. Marker_turnover_hm.R - Heatmaps of top cluster marker AUC at each alpha level, as in Figure S5A.
  12. Model_edge_weights.R - generate plots of network graph edge weight vs alpha, Figure S1A.
  13. Reorg_sankey.R - Sankey plots showing schematic of how reorganization marker AUCs are determined, as in Figure S3B.
  14. Short_alluvia.R - Sankey plots of just 3 clustering levels, transcriptome, high and low alpha, as in Figure 1E.
- plots/ - Output from plot scripts, divided into sub folders.
  1. AUC_sankey/ - Marker_sankey.R output, Sankeys of marker positivity across alpha levels
  2. Reorg_sankey/ - Reorg_sankey.R output, representative sankeys of reorganization analysis
  3. cluster_size_plots/ - cluster_size_plots.R output, plots of cluster number vs alpha level for all samples
  4. Confusion/ - Confusion_stats.R output, cohen's kappa vs alpha level plots for all samples
  5. entropy/ - Heme_comp.R output, single plot of entropy for each cell type with different clustering methods
  6. Gene_umaps/ - Labeled_umaps.R output, UMAPs showing high alpha cluster of interest and contributing transcriptome clusters
  7. Long_alluvia/ - Long_alluvia.R output, Sankey for all samples for all alpha iterations
  8. Short_alluvia/ - Short_alluvia.R output, Sankey for all samples for Transcriptome, high alpha, and low alpha level
  9. top_markers_box/ - Marker_box_and_venn.R output, boxplots of overall cluster marker AUC
  10. Turnover_hm/ - Marker_turnover_hm.R output, heatmaps of the AUC of the union of top cluster markers at all alpha levels
  11. Venn/ - Marker_box_and_venn.R output, venn diagrams of top cluster markers for all samples.
  12. warped/ - Fig4_clusters.R output, demonstrations of UMAP warp factor on the datasets.
  13. reorg_hm/ - geneset_hm.R output, heatmaps of gene set overrepresentation analysis results
  14. simulations/ - grid_graph.R and Fig4.R output, simulated data results
- finalFigures/ - pdfs of figures for the final manuscript
  1. F1.pdf - C from Long_alluvia.R, D from cluster_size_plots.R, and E from Short_alluvia.R output
  2. F2.pdf - A from Marker_box_and_venn.R, B-D from Marker_sankey.R output
  3. F3.pdf - B from Reorg_sankey.R, C from Marker_turnover_hm.R output
  4. F4.pdf - A from Fig4.R, B-C from Fig4_clusters.R output
  5. F5.pdf - from Labeled_umaps.R output
  6. Figure_S1.pdf - A from Model_edge_weights.R and B from grid_graph.R output
  7. Figure_S2.pdf - A from Long_alluvia.R and B from Confusion_stats.R output
  8. Figure_S3.pdf - from cluster_size_plots.R output
  9. Figure_S4.pdf - from Marker_box_and_venn.R output
  10. Figure_S5.pdf - A from Marker_turnover_hm.R, B from Marker_box_and_venn.R output
  11. Figure_S6.pdf - from Fig4.R output
  12. Figure_S7.pdf - from Heme_comp.R output
  13. Dataset_table.pdf - Pdf of Table 1, datasets used in the study

Walkthrough

The entire analysis may be rerun from the package directory with source("Run_full_analysis.R"). You may also run portions of the analysis following the walkthrough below:

Set working directory.

setwd("ClonoCluster_paper")

Establish needed variables with sample names and alpha analysis values.

source("Paper/extractionScripts/Constants.R")

Get clusters across range of alphas from 0 to 1 and save intermediate output in extracted data folder. Plot long Sankey as in Figure 1C and Figure S2A.

source("Paper/extractionScripts/Long_alluvia.R")

Perform marker identification for cluster and reorganization markers as well as gene set overrepresentation analysis, saves to extracted data folder.

source("Paper/extractionScripts/Marker_analysis.R")

Generate plots of cluster sizes as in Figure 1D and Figure S3.

source("Paper/plotScripts/Cluster_size_plots.R")

Generate Cohen's kappa plots as in Figure S2B.

source("Paper/plotScripts/Confusion_stats.R")

Generate short Sankeys as in Figure 1E.

source("Paper/plotScripts/Short_alluvia.R")

Generate top cluster marker fidelity boxplots (Figure 2A and Figure S5B) and venn diagrams of all marker overlap (Figure S4)

source("Paper/plotScripts/Marker_box_and_venn.R")

Generate heatmaps of marker fidelity for top cluster markers as in Figure S5A

source("Paper/plotScripts/Marker_turnover_hm.R")

Generate alluvia of interesting markers as in Figure 2B-D.

source("Paper/plotScripts/Marker_sankey.R")

Generate sample Sankeys for the reorganization analysis (Figure 3B).

source("Paper/plotScripts/Reorg_sankey.R")

Generate reorganization marker overrepresentation analysis heat maps (Figure 3C).

source("Paper/plotScripts/geneset_hm.R")

Generate warped UMAPs from sample data and real data (Figure 4).

source("Paper/plotScripts/Fig4.R")
source("Paper/plotScripts/Fig4_clusters.R")

Generate UMAP and Sankeys as in Figure 5, showing the combined hybrid clustering and warp factor.

source("Paper/plotScripts/Labeled_umaps.R")

Generate supplemental analyses showing how ClonoCluster works (Figure S1).

source("Paper/plotScripts/Model_edge_weights.R") # show curves for how model influences edge weight at beta = 0.1
source("Paper/plotScripts/grid_graph.R") # simulation of network graphs with alpha

Generate entropy analysis for hematopoietic sample.

source("Paper/plotScripts/Heme_comp.R")

Citation

biorxiv link

Contact

Open an issue on this GitHub repository, contact @leeprichman, @arjunrajlaboratory, or email myself or Dr. Arjun Raj.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
Data/Data_barcodes		Data/Data_barcodes
Paper		Paper
.gitignore		.gitignore
ClonoCluster_Graphical_abstract.svg		ClonoCluster_Graphical_abstract.svg
LICENSE		LICENSE
README.md		README.md
Run_all_analysis.R		Run_all_analysis.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ClonoCluster_paper

Intro

Dependencies

Get the raw data

Datasets

Manifest

Walkthrough

Citation

Contact

About

Releases 2

Packages

Languages

License

arjunrajlaboratory/ClonoCluster_paper

Folders and files

Latest commit

History

Repository files navigation

ClonoCluster_paper

Intro

Dependencies

Get the raw data

Datasets

Manifest

Walkthrough

Citation

Contact

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages