This repository is an official starting point to explore Dual Simplex NMF/deconvolution method It contains code to reproduce figures from the paper, and at the same time, provides examples on how to use the DualSimplex package.
Non-negative matrix factorization and deconvolution as dual simplex problem
Denis Kleverov, Ekaterina Aladyeva, Alexey Serdyukov, Maxim Artyomov
bioRxiv 2024.04.09.588652; doi: https://doi.org/10.1101/2024.04.09.588652
- data — all the external data, used in figures
- figures — notebooks for figures reproduction
- out — generated svgs and dualsimplex checkpoints will be placed here
- R — supporting code, imported in figures
- Select a figure to reproduce.
- Script
setup.R
(executed at the beginnig of the each script) will install the DualSimplex package using the github - If you chose Figure 6 or Figure 7, download and unpack the contents of large.tar.gz into
data/large
. - Go to the
figures
directory and open the corresponding notebook. - Run cells in the notebook one by one. Optionally, tweak some parameters to see alternative outcomes.
- See resulting figures in the
out
directory.
Simple visualization of the Sinkhorn procedure applied to factorizable matrix (2_sinkhorn_visualization.Rmd)
Deconvolution of simulated bulk RNA-seq gene expression dataset with main approach (3c_simulated_gene_expression_main_algorithm.Rmd)
Deconvolution of simulated bulk RNA-seq gene expression dataset with alternative aproach (4d_simulated_gene_expression_alternative_approach.Rmd)
- Simple synthetic NMF case just for testing (5_1_nmf_setup_ensure_working_with_simulated_data.Rmd)
- Picture unmixing
- Generate the data, solve problem with different methods (5_2_nmf_noizy_pictures.Rmd)
- Visualize results (5_3_make_result_nmf_figure_plots.Rmd )
- Method aplied to single cell clustering (6_single_cell_hnsc.Rmd)
- Brain/Liver/Lung mixtures GSE19830 (7a_gse19830.Rmd)
- 4 immune cell types GSE11058 (7b_gse11058_7b.Rmd)
- TCGA HNSC dataset (7c_bulk_hnsc_preprocessing.Rmd), (7de_bulk_hnsc_deconvolution.Rmd), (7f_bulk_hnsc_single_cell_validation.Rmd), (7g_bulk_hnsc_clinical_correlations.Rmd)
- Factorize simulated dataset with multiple methods (s3c_1_nmf_simulated_data.Rmd)
- Visualize results (s3c_1_nmf_simulated_data.Rmd)
Comparison of the clustering solutions for different methods (s4_single_cell_hnsc_different_k.Rmd)
Pathway analysis, signature genes expression heatmap, multple initializations (s5_hnsc_further.Rmd)
- Generate data (s6_1_signature_based_deconvolution_make_mixed_data.Rmd)
- Brain/Liver/Lung mixtures GSE19830 (s6_2_signature_based_deconvolution_GSE11058.Rmd)
- Simulated brain dataset (s6_3_signature_based_deconvolution_brain.Rmd)
Contributors names and contact info
- Denis Kleverov (@denis_kleverov) (linkedIn )
- Ekaterina Aladyeva (AladyevaE)
- Alexey Serdyukov (email)
- prof. Maxim Artyomov (@maxim_artyomov) (email)
Dependency: package 'xxx' is not available (for R version x.y.z)
Install package directly from source link from CRAN. For example:
install.packages(https://cran.r-project.org/src/contrib/RcppML_0.3.7.tar.gz, repos = NULL)
Can't plot UMAP with plot_projected on Mac
Unfortunately, umap library has a bug (only on MacOS) that doesn't allow to
add new points to umap after it's calculated, which is crucial for DualSimplex.
If that is the case for you, call plot_projected(use_dims = 2:3)
,
or other dimensions, to see simplexes without dimensionality reduction.