The multiplier/
folder contains the same data generated by the MultiPLIER project but separated by model matrices so they can be more easily downloaded.
Please, if you use any of these MultiPLIER files, cite the corresponding paper here.
All the files were downloaded from this figshare archive, and later processed by the notebooks in this folder.
Similarly, we used PhenomeXcan results (input/phenomexcan
folder).
Please, if you use any of those PhenomeXcan files, cite the corresponding paper here.
-
input/
: contains data downloaded from other projects (PhenomeXcan and MultiPLIER), usually with minor changes (such as filtering genes that contain NaN, etc) or no changes at (code that made those changes can be found in this folder). Data is provided in both binary and text formats. We always used binary formats for the analyses described here. Text formats are provided for convenience but always imply some loss of information, since decimal places are reduced. The files are:phenomexcan/smultixcan-mashr-zscores.*
: this is one version of matrix M shown in Figure 1 in the manuscript (genes x traits). These are the original PhenomeXcan S-MultiXcan results but converted from p-values to z-scores. See Methods in manuscript for more details.phenomexcan/smultixcan-efo_partial-mashr-zscores.*
: this is a modified version of matrix M (above) where traits were mapped to EFO (Experimental Factor Ontology), combined and standardized (combining very similar traits and adjusting for highly polygenic ones). It was used for the clustering of traits. See Methods in manuscript for more details, and associated jupyter notebooks innbs/01_preprocessing/
.multiplier/multiplier_model_z.*
: Matrix Z from MultiPLIER model files (genes x latent variables). A more end-user-friendly version of this file can be found ininput/multiplier/lv-gene_weights.xlsx
which contains one sheet per LV (987 in total) with the list of genes that belong to each one sorted by their weights.multiplier/multiplier_model_b.*
: Matrix B from MultiPLIER model files (latent variables x samples/conditions).multiplier/multiplier_model_summary.*
: contains information about pathways associated with each LV. A more end-user-friendly version of this file can be found ininput/multiplier/lv-pathways.xlsx
, where data is sorted by LV identifier and then by FDR.multiplier/*
: for the rest of the MultiPLIER files, you can check out the MultiPLIER paper to get an idea of the information they provide.lincs/lincs-data.*
: this is the drugs-related version of matrix M shown in Figure 1 of the manuscript. It is the LINCS L1000 data downloaded from here (specifically, fileconsensi-drugbank-tsv.bz2
), where gene Entrez IDs were mapped to Ensembl ID.
-
projections/
: contains data projected into the MultiPLIER latent space, such as gene-trait associations (S-MultiXcan or S-PrediXcan) and drug-induced transcriptional profiles (LINCS L1000). These projections are different versions of matrix M hat in the manuscript.projection-smultixcan-mashr-zscores.*
: projection of fileinput/phenomexcan/smultixcan-mashr-zscores.*
.projection-smultixcan-efo_partial-mashr-zscores.*
: projection of fileinput/phenomexcan/smultixcan-efo_partial-mashr-zscores.*
.projection-lincs.*
: projection of fileinput/lincs-data.pkl
.
-
data_transformations/
: contains different transformations of the S-MultiXcan results projected into the MultiPLIER latent space (in folderprojections/
). These include standardized, PCA and UMAP data versions that were used for clustering. -
gls/
: LV-trait association results (trats from PhenomeXcan) using a generalized least squares approach. See manuscript for more details. -
drug_disease_prediction/
: contains files for the drug-disease prediction.gold_standard_set/
: prediction files for using the gold standard set (PharmacotherapyDB,indications.tsv
). Only traits and drug pairs in this set are included in the predictions under this folder.phenomexcan/
: prediction files for all traits in PhenomeXcan and all drugs in LINCS L1000 separated by tissue. In the original manuscript, for each drug-disease pair, we took the maximum score across tissues, but other strategies (like computing the mean or median) could be used.
-
clustering/
: contains the clustering runs (both with base algorithms and consensus clustering approaches), and the interpretation analyses.interpretation/cluster_lvs/
: contains the set of LVs that distinguish traits in each cluster (separated by partitions) from the rest. This was obtained by the decision tree model.runs/
: runs with base/traditional clustering algorithms on different data transformations (standardized, PCA and UMAP).consensus_clustering
: runs with consensus clustering algorithms.
-
crispr/analyses/
: contains the fgsea analysis on our lipids-altering gene-sets found by a CRISPR screen.