plotAbundance improvements #132

TuomasBorman · 2024-06-20T11:58:45Z

When sample names are plotted, one cannot read them as they are over each other

library(miaViz)
data("GlobalPatterns")

tse <- GlobalPatterns
plotAbundance(tse, rank = "Phylum", add_x_text = TRUE)

Some other functions seem to have angle_x_text parameter, but plotAbundance does not have option to rotate text.

Also, we could consider if sample names could be specified from colData(tse). For example, paired samples must have unique names currently, but better option would be to allow shared names so that one can easily see which samples are drawn from same patient.

I user wants to compare abundances between groups or if samples are paired for instance, our solution might be suboptimal.

library(patchwork)
library(miaViz)
data("GlobalPatterns")

tse <- GlobalPatterns
p <- plotAbundance(tse, rank = "Phylum", features = "SampleType")
wrap_plots(p, ncol = 1,  heights = c(0.95,0.05))

It might be hard to read the plot when there are multiple groups (space between groups might help).

Another option would be to plot abundances as shown here in figure 1b

The text was updated successfully, but these errors were encountered:

TuomasBorman · 2024-06-20T12:07:22Z

Also consider plotting more than 20 (maybe 25) taxa with discrete colors. As seen in plots above, the colors are in continuous scale which makes it hard to read. If there are 20 or less taxa, the color scale is discrete.

Daenarys8 · 2024-08-12T12:17:56Z

Also related: microbiome/OMA#197

Daenarys8 · 2024-08-12T13:36:20Z

There are three options to display sample names without cluttering.

User could pass parameter flipped = TRUE which changes orientation axis and flips graph counter clockwise.
User could use theme(axis.text.x = element_text(angle = 45, hjust = 1)) to change orientation at x-axis for example(more flexibility for user to control display)
We could update plotAbundance with more params or hardcode axis orientation..

TuomasBorman · 2024-08-28T07:22:21Z

Thanks theme(axis.text.x = element_text(angle = 45, hjust = 1)) seems to solve the problem of sample names.

Couple more things came to my mind while generating plots in one project

# Prepare data
library(miaViz)
data("Tengeler2020")
tse <- Tengeler2020
tse <- tse[, 1:20]

colData(tse)[["patient"]] <- rep(paste0("patient", seq_len(4)), each = ncol(tse) / 4)
colData(tse)[["sampletype"]] <- factor(rep(paste0("sampletype", seq_len(2)), ncol(tse) / 10))
tse <- tse[, 1:19]

Order of taxa

Sometimes user wants to define the order of taxa. For instance, there might be some specific taxa that user wants to be listed first. For example, here in figure 3 they have plotted "Other" first: https://www.researchgate.net/publication/347867791_The_Urinary_Microbiome_in_Postmenopausal_Women_with_Recurrent_Urinary_Tract_Infections/figures

For instance, below Firmicutes is plotted first. I am not sure what is the best way to achieve the desired behavior. (Maybe we could check if values are factors and get the order from levels?)

asd <- c("Firmicutes" = "1_Firmicutes")
rowData(tse)[["Phylum"]][ rowData(tse)[["Phylum"]] == names(asd) ] <- asd
plotAbundance(tse, rank = "Phylum", as.relative = TRUE)

Displaying column variable

When we want to display sample type, for instance, the type is plotted as colors. However, it might be better to have it as own facet?

Below is our current solution

p <- plotAbundance(tse, rank = "Phylum", as.relative = TRUE, col.var = "sampletype", order.col.by = "sampletype")
library(patchwork)
wrap_plots(p, ncol = 1, heights = c(0.95,0.05))

Behind the link, in figure 2, you can see how the same thing is achieved with facets: https://www.researchgate.net/publication/347867791_The_Urinary_Microbiome_in_Postmenopausal_Women_with_Recurrent_Urinary_Tract_Infections/figures

Paired samples

Sometimes we have samples that are drawn from same patient (for instance, time is varying). Currently, we do not have method for plotting that kind of plot. The best that can be done currently is this:


tse_list <- splitOn(tse, "sampletype")

plot_list <- lapply(tse_list, function(x){
    colnames(x) <- x$mappac_id
    p <- plotAbundance(x, as.relative = TRUE,, rank = "Phylum", add_x_text = TRUE) +
        labs(title = unique(colData(x)[["sampletype"]]))
    return(p)
})
wrap_plots(plot_list, ncol = 1)

but as you can see, the samples do not match. (Maybe we could add missing samples, for instance in the figure above, to sampletype2?)

@Daenarys8 Can you check if you can find solutions for these? We can then discuss more how to implement them.

Daenarys8 · 2024-08-29T13:47:41Z

I checked some of these and it is interesting because we do have

order.col.by which can order the taxa but with the downside of ordering the counts as well. Perhaps we could modify it a little.

plotAbundance(tse, rank = "Phylum", order.col.by = "Firmicutes")

With some modification to .feature_plotter or .abund_plotter we can achieve displaying column values with facet_wrap. On second thought, if the whole idea of .features_plotter was for column plots, we could remove it totally and modify .abund_plotter to consume col.var as condition for such plot.

plotAbundance(tse, rank = "Phylum", order.col.by = "Firmicutes", col.var = "sampletype")

The above plot could be much better though.

Hmm, I am a bit confused with this 3rd aspect. We earlier cut the data down to 19 samples with each corresponding to only one of sampletype. with 10 belonging to 1 and 9 the other. If I understand correctly, the sample is not missing in sampletype2, it is just not of its sampletype. However, perhaps I didn't understand and thought of it differently.

plot_list <- lapply(tse_list, function(x){
    p <- plotAbundance(x, as.relative = TRUE,, rank = "Phylum", add_x_text = TRUE, order.col.by = "Firmicutes")
    return(p)
})
wrap_plots(plot_list, ncol = 1)

TuomasBorman · 2024-08-30T05:42:00Z

Looks very nice.

Perhaps 1 is enough. I still have to test it. 2. Looks good.

As you can see from my plot, sample 10 is missing from the sampletype2. You are correct that it is not there at the first place (we do not have sample for "sample10" - "sampletype2"). However, because there are missing sample, the samples are misaligned in plots. The plot would be tidier, if the sampletype2 and sampletype1 would align with each other. (Would be easier to read and in practice, we would not need the sample labels anymore.)

However, I am wondering what is the best way to showcase paired samples. One option is to add "empty sample" in place of missing samples (here "sample10" - "sampletype2").

Can you check if this is already solved in some papers? We could then get the idea from them

TuomasBorman · 2024-09-23T13:29:21Z

That also orders the data based on certain feature. However, my collaborator wants that "unidentified" taxa is in the bottom of the graph.

We could add additional parameter to .order_abund_feature_data(?) that controls which feature is on the bottom of the graph. It could work little bit similarly to order.col.by but without ordering the samples (Just the order of color bars).

The idea of .features_plotter is to visualize colData variable. However, it can also visualize continuous variables which facets cannot. For me, facets look better for categorical variables. However, for some people the current option might look better.

That is why I think we should have option for this. Maybe, facet.cols = FALSE that creates facets from col.var

As already mentioned, we should handle missing samples if user wants to visualize paired samples. There could be paired=TRUE option that makes sure that the order of samples stays the same in all facets (so that they are comparable).

Can you create a draft that takes into account these? Let's then discuss what is the best approach as this might be little bit complex issue and requires re-structuring the function.

antagomir · 2024-09-23T16:52:22Z

Clarity relation with order.row.by argument; should this be "bottom.row" or should we just provide examples how the user can provide arbitrary sorting?
not sure if I understood but sounds worth testing
good

TuomasBorman · 2024-09-23T18:08:59Z

One option could be that user can specify order with factor levels. That might be the easiest perhaps. So instead of characters, rowData variable could be a factor

The point was that sample information is now plotted as separate plot. However, these groups could be plotted also as facets. However, facets are only for categorical variables, not for numeric variables. That is why we should still keep the current functionality also.

One problem is that it makes the function more complex for user if we have many different options

antagomir · 2024-09-23T19:47:49Z

User could provide ordering of the levels in the order.row.by?
Ok. Either support both options, or provide separate solutions and explain all of them and their differences in a single place (function example manpage, and/or in OMA?)

TuomasBorman · 2024-09-24T05:31:06Z

That is not possible. User can only specify either "name" (alphabetical order, "abund" (abundance), or "revabund" (reverse abundance).

The idea is to get this kind of plot. Here "Other" group is not interesting, so it is in the bottom. I found that some papers have this kind of plot.

antagomir · 2024-09-24T07:29:49Z

but it could be: if user provides a single string, then it is done as you write; if user provides a factor with many levels (number equaling the features) then it could be used to determine order?

TuomasBorman · 2024-09-24T07:39:21Z

That might be the easiest and most transparent solution. However, we should check that those elements in a vector match with features.

If user wants to agglimerate the data, it might not be clear what those names are. We could disable the vector option if user wants to agglomerate.

(The same solution could work for columns also)

antagomir · 2024-09-24T11:24:35Z

Sounds good. There could be informative warning if user tries to do both.

TuomasBorman · 2024-10-01T14:54:31Z

@Daenarys8 Would you be able to create a draft for these?

TuomasBorman · 2024-10-08T19:03:46Z

I am currently working with this and hopefully get something out tomorrow

antagomir mentioned this issue Jul 23, 2024

plotAbundance problem #100

Closed

TuomasBorman self-assigned this Oct 8, 2024

TuomasBorman mentioned this issue Oct 9, 2024

Improve plotAbundance #156

Merged

TuomasBorman linked a pull request Oct 9, 2024 that will close this issue

Improve plotAbundance #156

Merged

TuomasBorman closed this as completed in #156 Nov 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

plotAbundance improvements #132

plotAbundance improvements #132

TuomasBorman commented Jun 20, 2024

TuomasBorman commented Jun 20, 2024

Daenarys8 commented Aug 12, 2024

Daenarys8 commented Aug 12, 2024

TuomasBorman commented Aug 28, 2024

Daenarys8 commented Aug 29, 2024

TuomasBorman commented Aug 30, 2024

TuomasBorman commented Sep 23, 2024

antagomir commented Sep 23, 2024

TuomasBorman commented Sep 23, 2024

antagomir commented Sep 23, 2024

TuomasBorman commented Sep 24, 2024

antagomir commented Sep 24, 2024

TuomasBorman commented Sep 24, 2024 •

edited

Loading

antagomir commented Sep 24, 2024

TuomasBorman commented Oct 1, 2024

TuomasBorman commented Oct 8, 2024

plotAbundance improvements #132

plotAbundance improvements #132

Comments

TuomasBorman commented Jun 20, 2024

TuomasBorman commented Jun 20, 2024

Daenarys8 commented Aug 12, 2024

Daenarys8 commented Aug 12, 2024

TuomasBorman commented Aug 28, 2024

Daenarys8 commented Aug 29, 2024

TuomasBorman commented Aug 30, 2024

TuomasBorman commented Sep 23, 2024

antagomir commented Sep 23, 2024

TuomasBorman commented Sep 23, 2024

antagomir commented Sep 23, 2024

TuomasBorman commented Sep 24, 2024

antagomir commented Sep 24, 2024

TuomasBorman commented Sep 24, 2024 • edited Loading

antagomir commented Sep 24, 2024

TuomasBorman commented Oct 1, 2024

TuomasBorman commented Oct 8, 2024

TuomasBorman commented Sep 24, 2024 •

edited

Loading