Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TreeSummarizedExperiment support #158

Open
TuomasBorman opened this issue Oct 4, 2024 · 12 comments
Open

TreeSummarizedExperiment support #158

TuomasBorman opened this issue Oct 4, 2024 · 12 comments

Comments

@TuomasBorman
Copy link

Hello!

I want to open discussion about adding a support for TreeSummarizedExperiment (TreeSE) object.

TreeSE is a extension to SingleCellExperiment (SCE) object by adding slots for row and column trees. These trees are especially relevant in microbiome field where species relations are illustrated as phylogeny trees (rowTree slot in TreeSE). You can find more info on microbiome data science and TreeSE class from here: https://microbiome.github.io/OMA/docs/devel/

In microbiome field, large population cohorts are rather common. For instance, Ruuskanen et al., studied large Finnish cohort on how microbiome relates to fatty liver disease. They also studied geographical regions.

There might not be as many applications for images as in spatial transcriptomics, and coordinates (or location groups) can be stored in colData. However, I think supporting also TreeSE might benefit both fields by allowing microbiome researchers to access tools used in spatial transcriptomics and vise versa. This might give us an additional synergy as it further extends the SummarizedExperiment ecosystem, ultimately reducing redundant efforts and enhancing collaboration.

Because TreeSE is SCE, we can already coarse TreeSE to SpatialExperiment, however, we lose TreeSE-specific slots.

library(TreeSummarizedExperiment)
library(ape)
library(SpatialExperiment)

assay_data <- rbind(rep(0, 4), matrix(1:20, nrow = 5))
colnames(assay_data) <- paste0("sample", 1:4)
rownames(assay_data) <- paste("entity", seq_len(6), sep = "")
row_data <- data.frame(Kingdom = "A",
                       Phylum = rep(c("B1", "B2"), c(2, 4)),
                       Class = rep(c("C1", "C2", "C3"), each = 2),
                       OTU = paste0("D", 1:6),
                       row.names = rownames(assay_data),
                       stringsAsFactors = FALSE)
set.seed(12)
row_tree <- rtree(5)
tip_lab <- row_tree$tip.label
row_lab <- tip_lab[c(1, 1:5)]
tse <- TreeSummarizedExperiment(assays = list(Count = assay_data),
                         rowData = row_data,
                         rowTree = row_tree,
                         rowNodeLab = row_lab
                         )
tse
as(tse, "SpatialExperiment")

-Tuomas

@HelenaLC
Copy link
Collaborator

HelenaLC commented Oct 4, 2024

Hey, thanks for bringing up TSE. This has actually come up in some discussions, however, (I think) it is not straightforward to implement. Specifically, both TSE and SPE inherit from SCE, so that we cannot inherit slots from one or the other or both, when we like. Instead, SPE would have to inherit from TSE, which inherits from SCE. That said, it's certainly possible, but would add another layer of dependency (& potential instability). Plus extra "cluttering" for those fine without the tree extras... So, I have no strong opinion here, just wanted to clarify the development side of things...

@TuomasBorman
Copy link
Author

I see, this seems to be more complicated thing. I don't have direct experience with analyzing spatial microbiome data, so I'm unsure about the necessity of combining SpatialExperiment with TreeSummarizedExperiment. I know this is an area people are working on, and it’s always preferable to enhance existing methods rather than create overlapping ones.

@antagomir might have more insights on this.

@antagomir
Copy link

It sounds potentially very interesting area for development but it also seems like a major undertaking if those updates should be implemented across the package ecosystem.

TSE adds row and col trees to SCE (plus a sequence slot which might be less essential here). In principle, one could just add the same (or similar) tree capacity as an extra feature to SPE directly without the need to inherit TSE. This would not be optimal in terms of SPE vs. TSE interoperability but it would allow development and testing of methods that use feature or sample trees in the spatial context.

@HelenaLC
Copy link
Collaborator

HelenaLC commented Oct 6, 2024

Just throwing this out there... Have you checked out SpatialFeatureExperiment? It extends the SPE by row/colGraphs. That is, graphs not trees; however, graphs are more appropriate in the context of ST data, I'd say. E.g., one can imagine spatial regions that contain subsets of cells, however, they needn't be hierarchically organized, but can have arbitrary relationships (e.g., nested/fully containing another, intersecting, disconnected etc.).

@drighelli
Copy link
Owner

drighelli commented Oct 7, 2024

Hi Everyone,

we already had on slack a similar conversation in 2020 and we already have a similar solution.

SingleCellExperiment objects have colPairs and rowPairs for graph-like representations, still don't know the difference with row/colGraphs in SpatialFeatureExperiment, but the colPairs/rowPairs are already implemented in SCE objects and so in SPE objects.

I hope this could be helpful.

Ciao,
Dario

edit: as you already know, in several programming languages it is possible to inherit from multiple classes.
The same can be done in R for S4 classes, but in the end I think everything could become very complicated when the classes inherit from the same original class (the SummarizedExperiment in this case), especially in R which is not a real Object Oriented language.

@TuomasBorman
Copy link
Author

Thanks! I still feel these solutions are somewhat suboptimal and don't fully address the need. Ideally, the object should inherit from both TreeSE, since the entire microbiome ecosystem is built around it, and SpatialExperiment, to allow the use of spatial analysis tools. I'm not sure if there's an optimal solution, but if the inheritance issues could be resolved, there might be potential for a "TreeSpatialExperiment." That said, I'm not an expert in this area, so I'm unsure how necessary this feature is or how much effort should be invested in it.

@HelenaLC
Copy link
Collaborator

HelenaLC commented Oct 7, 2024

Just tried & this works...

> setClass("TSPE", contains=c(
+     "SpatialExperiment", 
+     "TreeSummarizedExperiment"))
> spe <- SpatialExperiment()
> tspe <- as(spe, "TSPE")
> # SPE & TSE accessors work...
> spatialCoords(tspe) 
<0 x 0 matrix>
> rowTree(tspe)
NULL

...i.e., one could define a class that inherits from both, e.g., defined in an independent package.

We could, in principle, also define such a class in SPE, granted it doesn't add any extra dependencies. A specialized show method is really all that'd take. (the other way around is probably suboptimal, since we got magick on our end... then again, users would also be required to install it whether or not they need it...) - open to discuss.

@TuomasBorman
Copy link
Author

Cool! I quickly checked and seems to work with TreeSE demoset.

If that's all it takes, I believe creating a new class would be beneficial. "Real" class would make microbiome and spatial tools closer to each other. It could be easy to just add to your existing package, but not sure if it is good to add TreeSummarizedExperiment as dependancy as it would be rarely used by most of the users (at least currently)

@HelenaLC
Copy link
Collaborator

HelenaLC commented Oct 7, 2024

I'll give it a try/some more thought... haven't seen it in action, but wondering if there's a way to cheat our way into defining a class using Suggests: only, e.g., not depending on TSE per se... let's see, not sure that's even possible.

@LiNk-NY
Copy link
Contributor

LiNk-NY commented Oct 7, 2024

I think it would be better to start with a real-world analysis use case before embarking on potentially creating a new class / merging classes. We could also consider discussing this in the Classes working group.
IIUC, using the contains argument would mean that you'd have to duplicate the assay data in both the SpatialExperiment and the TreeSummarizedExperiment.

@HelenaLC
Copy link
Collaborator

HelenaLC commented Oct 7, 2024

Agreed! Would be great to have this discussed as it did come up before. - could you perhaps clarify that last point? I am not (directly) spotting any duplication

> spe <- SpatialExperiment(
+     list(foo=matrix(1,2,3)))
> tspe <- as(spe, "TSPE")
> names(attributes(tspe))
 [1] "int_elementMetadata" "int_colData"        
 [3] "int_metadata"        "rowRanges"          
 [5] "colData"             "assays"             
 [7] "NAMES"               "elementMetadata"    
 [9] "metadata"            "rowTree"            
[11] "colTree"             "rowLinks"           
[13] "colLinks"            "referenceSeq"       
[15] "class"              
> tspe@assays
An object of class "SimpleAssays"
Slot "data":
List of length 1
names(1): foo

@LiNk-NY
Copy link
Contributor

LiNk-NY commented Oct 8, 2024

Hi Helena, @HelenaLC

Presumably if you have a composed class, you'd need one of each class to create a complete instance of TSPE. In the example above, the coercion can happen but you won't be able to fully use the interface for the TreeSummarizedExperiment unless the data for that object is populated (I am guessing that one would use the same assay for each class).

Thanks Dario for creating the issue for the working group.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants