-
Notifications
You must be signed in to change notification settings - Fork 6
Ensemble enrichment
Ensemble enrichment computes the enrichment of a given phenotype relative to an ensemble of randomized phenotypes. The approach is described in this bioRxiv preprint.
It proceeds through three steps:
- Compute the null phenotype ensemble.
- Compute the null distribution corresponding to this ensemble using
ComputeAllCategoryNulls
. - Perform enrichment for a phenotype of interest relative to this pre-computed null ensemble using
EnsembleEnrichment
.
Step 2 is the most computationally expensive step.
There are a a couple of common choices for null phenotype ensembles:
- Independent random maps
- Spatially autocorrelated random maps, e.g., fitted to a phenotype of interest and then generated using (brainSMASH)[https://github.com/murraylab/brainsmash].
In case (1), you can straightforwardly set enrichmentParams.whatEnsemble = 'randomMap'
in GiveMeDefaultEnrichmentParams
and you're good to go.
For any custom ensemble (such as an ensemble of spatially autocorrelated maps) you will need to make different modifications to GiveMeDefaultEnrichmentParams
.
First, set enrichmentParams.whatEnsemble = 'customEnsemble'
.
You will then need to specify the .mat
file containing these null phenotypes as, e.g., enrichmentParams.dataFileSurrogate = myNullPhenotypes.mat
.
This file should contain the matrix, nullMaps
, in the form region x map.
E.g., nullMaps
would be a 100 x 1000 matrix for 1000 null phenotypes defined across 100 brain regions (matching the 100 rows of the gene-expression data).
Before proceeding to Step 2, check that enrichmentParams
is looking sensible.
Using the parameters set in Step 1, you are then ready to compute null distributions for your category scores relative to the specified phenotype ensemble.
The results of these null distributions are saved to the .mat
file: enrichmentParams.fileNameOut
(check this looks ok before running).
You also need to set up the geneDataStruct
so that this function can match genes to their expression data, which should be a Matlab structure containing two elements: expressionMatrix
(a region x gene expression matrix) and entrezIDs
(a vector of entrez IDs labeling the columns of the expression matrix, used to match genes to their category annotations).
ComputeAllCategoryNulls(geneDataStruct,enrichmentParams,[],true,true);
Now that we have a null distribution for every gene category, we can assess the significance of the scores obtained for a given phenotype relative to these (precomputed and saved) nulls.
The results from Step 2 are saved in enrichmentParams.fileNameOut
, so you can specify this, as well as your specific phenotype to compute the enrichment results as a table:
GOTablePhenotype = EnsembleEnrichment(enrichmentParams.fileNameOut,phenotypeVector);