Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plotAbundance improvements #132

Closed
TuomasBorman opened this issue Jun 20, 2024 · 16 comments · Fixed by #156
Closed

plotAbundance improvements #132

TuomasBorman opened this issue Jun 20, 2024 · 16 comments · Fixed by #156
Assignees

Comments

@TuomasBorman
Copy link
Contributor

When sample names are plotted, one cannot read them as they are over each other

library(miaViz)
data("GlobalPatterns")

tse <- GlobalPatterns
plotAbundance(tse, rank = "Phylum", add_x_text = TRUE)

image
Some other functions seem to have angle_x_text parameter, but plotAbundance does not have option to rotate text.

Also, we could consider if sample names could be specified from colData(tse). For example, paired samples must have unique names currently, but better option would be to allow shared names so that one can easily see which samples are drawn from same patient.

I user wants to compare abundances between groups or if samples are paired for instance, our solution might be suboptimal.

library(patchwork)
library(miaViz)
data("GlobalPatterns")

tse <- GlobalPatterns
p <- plotAbundance(tse, rank = "Phylum", features = "SampleType")
wrap_plots(p, ncol = 1,  heights = c(0.95,0.05))

image
It might be hard to read the plot when there are multiple groups (space between groups might help).

Another option would be to plot abundances as shown here in figure 1b

@TuomasBorman
Copy link
Contributor Author

Also consider plotting more than 20 (maybe 25) taxa with discrete colors. As seen in plots above, the colors are in continuous scale which makes it hard to read. If there are 20 or less taxa, the color scale is discrete.

@Daenarys8
Copy link
Contributor

Also related: microbiome/OMA#197

@Daenarys8
Copy link
Contributor

There are three options to display sample names without cluttering.

  • User could pass parameter flipped = TRUE which changes orientation axis and flips graph counter clockwise.
  • User could use theme(axis.text.x = element_text(angle = 45, hjust = 1)) to change orientation at x-axis for example(more flexibility for user to control display)
  • We could update plotAbundance with more params or hardcode axis orientation..

@TuomasBorman
Copy link
Contributor Author

Thanks theme(axis.text.x = element_text(angle = 45, hjust = 1)) seems to solve the problem of sample names.

Couple more things came to my mind while generating plots in one project

# Prepare data
library(miaViz)
data("Tengeler2020")
tse <- Tengeler2020
tse <- tse[, 1:20]

colData(tse)[["patient"]] <- rep(paste0("patient", seq_len(4)), each = ncol(tse) / 4)
colData(tse)[["sampletype"]] <- factor(rep(paste0("sampletype", seq_len(2)), ncol(tse) / 10))
tse <- tse[, 1:19]
  1. Order of taxa

Sometimes user wants to define the order of taxa. For instance, there might be some specific taxa that user wants to be listed first. For example, here in figure 3 they have plotted "Other" first: https://www.researchgate.net/publication/347867791_The_Urinary_Microbiome_in_Postmenopausal_Women_with_Recurrent_Urinary_Tract_Infections/figures

For instance, below Firmicutes is plotted first. I am not sure what is the best way to achieve the desired behavior. (Maybe we could check if values are factors and get the order from levels?)

asd <- c("Firmicutes" = "1_Firmicutes")
rowData(tse)[["Phylum"]][ rowData(tse)[["Phylum"]] == names(asd) ] <- asd
plotAbundance(tse, rank = "Phylum", as.relative = TRUE)

image

  1. Displaying column variable

When we want to display sample type, for instance, the type is plotted as colors. However, it might be better to have it as own facet?

Below is our current solution

p <- plotAbundance(tse, rank = "Phylum", as.relative = TRUE, col.var = "sampletype", order.col.by = "sampletype")
library(patchwork)
wrap_plots(p, ncol = 1, heights = c(0.95,0.05))

image

Behind the link, in figure 2, you can see how the same thing is achieved with facets: https://www.researchgate.net/publication/347867791_The_Urinary_Microbiome_in_Postmenopausal_Women_with_Recurrent_Urinary_Tract_Infections/figures

  1. Paired samples

Sometimes we have samples that are drawn from same patient (for instance, time is varying). Currently, we do not have method for plotting that kind of plot. The best that can be done currently is this:


tse_list <- splitOn(tse, "sampletype")

plot_list <- lapply(tse_list, function(x){
    colnames(x) <- x$mappac_id
    p <- plotAbundance(x, as.relative = TRUE,, rank = "Phylum", add_x_text = TRUE) +
        labs(title = unique(colData(x)[["sampletype"]]))
    return(p)
})
wrap_plots(plot_list, ncol = 1)

image

but as you can see, the samples do not match. (Maybe we could add missing samples, for instance in the figure above, to sampletype2?)

@Daenarys8 Can you check if you can find solutions for these? We can then discuss more how to implement them.

@Daenarys8
Copy link
Contributor

I checked some of these and it is interesting because we do have

  1. order.col.by which can order the taxa but with the downside of ordering the counts as well. Perhaps we could modify it a little.

plotAbundance(tse, rank = "Phylum", order.col.by = "Firmicutes")
Rplot

  1. With some modification to .feature_plotter or .abund_plotter we can achieve displaying column values with facet_wrap. On second thought, if the whole idea of .features_plotter was for column plots, we could remove it totally and modify .abund_plotter to consume col.var as condition for such plot.

plotAbundance(tse, rank = "Phylum", order.col.by = "Firmicutes", col.var = "sampletype")

Rplot01
The above plot could be much better though.

  1. Hmm, I am a bit confused with this 3rd aspect. We earlier cut the data down to 19 samples with each corresponding to only one of sampletype. with 10 belonging to 1 and 9 the other. If I understand correctly, the sample is not missing in sampletype2, it is just not of its sampletype. However, perhaps I didn't understand and thought of it differently.
plot_list <- lapply(tse_list, function(x){
    p <- plotAbundance(x, as.relative = TRUE,, rank = "Phylum", add_x_text = TRUE, order.col.by = "Firmicutes")
    return(p)
})
wrap_plots(plot_list, ncol = 1)

Rplot02

@TuomasBorman
Copy link
Contributor Author

Looks very nice.

Perhaps 1 is enough. I still have to test it. 2. Looks good.

As you can see from my plot, sample 10 is missing from the sampletype2. You are correct that it is not there at the first place (we do not have sample for "sample10" - "sampletype2"). However, because there are missing sample, the samples are misaligned in plots. The plot would be tidier, if the sampletype2 and sampletype1 would align with each other. (Would be easier to read and in practice, we would not need the sample labels anymore.)

However, I am wondering what is the best way to showcase paired samples. One option is to add "empty sample" in place of missing samples (here "sample10" - "sampletype2").

Can you check if this is already solved in some papers? We could then get the idea from them

@TuomasBorman
Copy link
Contributor Author

That also orders the data based on certain feature. However, my collaborator wants that "unidentified" taxa is in the bottom of the graph.

We could add additional parameter to .order_abund_feature_data(?) that controls which feature is on the bottom of the graph. It could work little bit similarly to order.col.by but without ordering the samples (Just the order of color bars).

The idea of .features_plotter is to visualize colData variable. However, it can also visualize continuous variables which facets cannot. For me, facets look better for categorical variables. However, for some people the current option might look better.

That is why I think we should have option for this. Maybe, facet.cols = FALSE that creates facets from col.var

As already mentioned, we should handle missing samples if user wants to visualize paired samples. There could be paired=TRUE option that makes sure that the order of samples stays the same in all facets (so that they are comparable).

Can you create a draft that takes into account these? Let's then discuss what is the best approach as this might be little bit complex issue and requires re-structuring the function.

@antagomir
Copy link
Member

  1. Clarity relation with order.row.by argument; should this be "bottom.row" or should we just provide examples how the user can provide arbitrary sorting?

  2. not sure if I understood but sounds worth testing

  3. good

@TuomasBorman
Copy link
Contributor Author

  1. One option could be that user can specify order with factor levels. That might be the easiest perhaps. So instead of characters, rowData variable could be a factor

The point was that sample information is now plotted as separate plot. However, these groups could be plotted also as facets. However, facets are only for categorical variables, not for numeric variables. That is why we should still keep the current functionality also.

One problem is that it makes the function more complex for user if we have many different options

@antagomir
Copy link
Member

  1. User could provide ordering of the levels in the order.row.by?
  2. Ok. Either support both options, or provide separate solutions and explain all of them and their differences in a single place (function example manpage, and/or in OMA?)

@TuomasBorman
Copy link
Contributor Author

That is not possible. User can only specify either "name" (alphabetical order, "abund" (abundance), or "revabund" (reverse abundance).

The idea is to get this kind of plot. Here "Other" group is not interesting, so it is in the bottom. I found that some papers have this kind of plot.
image

@antagomir
Copy link
Member

  1. but it could be: if user provides a single string, then it is done as you write; if user provides a factor with many levels (number equaling the features) then it could be used to determine order?

@TuomasBorman
Copy link
Contributor Author

TuomasBorman commented Sep 24, 2024

That might be the easiest and most transparent solution. However, we should check that those elements in a vector match with features.

If user wants to agglimerate the data, it might not be clear what those names are. We could disable the vector option if user wants to agglomerate.

(The same solution could work for columns also)

@antagomir
Copy link
Member

Sounds good. There could be informative warning if user tries to do both.

@TuomasBorman
Copy link
Contributor Author

@Daenarys8 Would you be able to create a draft for these?

@TuomasBorman TuomasBorman self-assigned this Oct 8, 2024
@TuomasBorman
Copy link
Contributor Author

I am currently working with this and hopefully get something out tomorrow

@TuomasBorman TuomasBorman linked a pull request Oct 9, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants