Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a different MAE data for more meaningful MOFA results #321

Open
artur-sannikov opened this issue Aug 4, 2023 · 1 comment
Open

Use a different MAE data for more meaningful MOFA results #321

artur-sannikov opened this issue Aug 4, 2023 · 1 comment

Comments

@artur-sannikov
Copy link
Contributor

Is your feature request related to a problem? Please describe.
At the moment, Hintikka data is used for basically all analysis in OMA book. However, I see a problem in the MOFA section because the model only finds one factor which explains the variability only in metabolomic data (see "Variance Explained per factor and assay" figures). So I have difficulties interpreting and discussing the results because it does not show much in my opinion.

In contrast, in the original MOFA+ paper, they found that the factors capture different pieces of information, for example the differences in methylation, classes of neurons, etc. The presence of these factors also allowed them to apply t-SNE to discover sub-populations of cell types. Well, in our case, we cannot do much of downstream analysis.

Describe the solution you'd like
I see two solutions here:

  1. Use some other multi-omic data from a different resource, build a MAE object (or find already existing data in MAE format) and show how we can perform downstream analysis on MOFA factors;
  2. Add a MAE object directly to mia (and use that for MOFA and downstream analysis), which might be more complicated but at the same it should become easier to work in the future.

These two solutions can be implemented simultaneously, and I do not have any preference to either as long as the data provides us with meaningful and interpretable results.

Additional context

  1. MAE package has a built-in multi-assay experiment miniACC as an example;
  2. If we use some other data, it'll break the flow of the analysis which at the moment uses CCA to uncover some interesting relationships and then MOFA to confirm and expand the previous findings;
  3. The dataset, of course, should be related to microbiome, although most of available multi-omic datasets come from cancer research (i.e., RNA-seq, methylation, mutations, etc.)
@antagomir
Copy link
Member

antagomir commented Aug 4, 2023

We can certainly add another MAE demo data set in mia, for instance. It should be about microbiome research (which is indeed so far less covered in terms of multiomics methodology than cancer studies).

Or we can use existing data set. The possible sources:

  1. borenstein-lab/microbiome-metabolome-curated-data/; does not support TreeSE/MAE as such so that would require additional work/code.
  2. curatedMetagenomicData; provides a list of TreeSEs for cases where multiomics is available but the MAE support is still under consideration, so our own code should convert the experiment list into MAE
  3. EBI MGnify API through MGnifyR pkg; I think this already provides outputs readily in MAE format. This is a central data resource for European microbiome research and open data sharing, I think that would be quite good source if a suitable data set can be identified.

I expect that more informative factors can be identified from data sets with larger sample sizes.

@artur-sannikov artur-sannikov changed the title Use a different MAE data for more meaninful MOFA results Use a different MAE data for more meaningful MOFA results Aug 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

2 participants