Skip to content

All necessary scripts to reproduce the Dual Simplex paper

License

Notifications You must be signed in to change notification settings

artyomovlab/dualsimplex_paper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

58 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Non-negative matrix factorization and deconvolution as dual simplex problem

This repository is an official starting point to explore Dual Simplex NMF/deconvolution method It contains code to reproduce figures from the paper, and at the same time, provides examples on how to use the DualSimplex package.

Non-negative matrix factorization and deconvolution as dual simplex problem
Denis Kleverov, Ekaterina Aladyeva, Alexey Serdyukov, Maxim Artyomov
bioRxiv 2024.04.09.588652; doi: https://doi.org/10.1101/2024.04.09.588652

Project structure

- data — all the external data, used in figures
- figures — notebooks for figures reproduction
- out — generated svgs and dualsimplex checkpoints will be placed here
- R — supporting code, imported in figures

Running

  1. Select a figure to reproduce.
  2. Script setup.R (executed at the beginnig of the each script) will install the DualSimplex package using the github
  3. If you chose Figure 6 or Figure 7, download and unpack the contents of large.tar.gz into data/large.
  4. Go to the figures directory and open the corresponding notebook.
  5. Run cells in the notebook one by one. Optionally, tweak some parameters to see alternative outcomes.
  6. See resulting figures in the out directory.

Figures in this repository

2. Sinkhorn procedure

Simple visualization of the Sinkhorn procedure applied to factorizable matrix (2_sinkhorn_visualization.Rmd)

3. Main algorithm

Deconvolution of simulated bulk RNA-seq gene expression dataset with main approach (3c_simulated_gene_expression_main_algorithm.Rmd)

4. Minimal formulation

Deconvolution of simulated bulk RNA-seq gene expression dataset with alternative aproach (4d_simulated_gene_expression_alternative_approach.Rmd)

5. Picture unmixing with NMF

6. Single cell data

7. Complete deconvolution of bulk RNA-seq data

S3. NMF with simulated data matrices

S4. Different number of clusters for single cell data

Comparison of the clustering solutions for different methods (s4_single_cell_hnsc_different_k.Rmd)

S5. Further analysis for TCGA HNSC bulk RNA-seq dataset

Pathway analysis, signature genes expression heatmap, multple initializations (s5_hnsc_further.Rmd)

S6. Signature base deconvolution with DualSimpelx approach

Authors

Contributors names and contact info

Troubleshooting

Dependency: package 'xxx' is not available (for R version x.y.z)

Install package directly from source link from CRAN. For example:

install.packages(https://cran.r-project.org/src/contrib/RcppML_0.3.7.tar.gz, repos = NULL)

Can't plot UMAP with plot_projected on Mac

Unfortunately, umap library has a bug (only on MacOS) that doesn't allow to add new points to umap after it's calculated, which is crucial for DualSimplex. If that is the case for you, call plot_projected(use_dims = 2:3), or other dimensions, to see simplexes without dimensionality reduction.

About

All necessary scripts to reproduce the Dual Simplex paper

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages