Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plotting functions with sparse matrices #45

Closed
lazappi opened this issue Jan 17, 2018 · 14 comments
Closed

Plotting functions with sparse matrices #45

lazappi opened this issue Jan 17, 2018 · 14 comments

Comments

@lazappi
Copy link

lazappi commented Jan 17, 2018

I've noticed some of the plotting functions fail if you have a SingleCellExperiment with sparse matrices like those returned by read10xResults.

Here are a couple of examples:

library(scater, quietly = TRUE)
sce10x <- read10xResults(system.file("extdata", package = "scater"))
colData(sce10x)$test <- rnorm(300)
sce10x <- normalise(sce10x)
#> Warning in .local(object, ...): using library sizes as size factors

# PlotQC
plotQC(sce10x, type = "explanatory", variables = "test")
#> Error in storage.mode(y) <- "double": no method for coercing this S4 class to a vector

# PlotRLE
plotRLE(sce10x)
#> Error in matrixStats::rowMedians(exprs_mat): Argument 'x' must be a matrix or a vector.
@LTLA
Copy link
Collaborator

LTLA commented Jan 17, 2018

Anything that relies on matrixStats will not be robust to alternative matrix representations. I have fixed everything I use frequently (mostly those computing row/column means or variances), but there are a few pockets of resistance left throughout the package that I haven't gotten around to crushing.

I believe that many of these should be easily fixed with @PeteHaitch's DelayedMatrixStats package, whenever that comes out. In the meantime, feel free to put in a PR for these specific functions; there are other things with scater that are taking higher priority right now.

@PeteHaitch
Copy link
Contributor

PeteHaitch commented Jan 17, 2018 via email

@LTLA
Copy link
Collaborator

LTLA commented Jan 17, 2018

@PeteHaitch I wonder if you could add an ANY method that would just wrap the input object in a DelayedArray and pass it to a central method for DelayedArray objects. This would allow me to directly call rowMedians etc. on a sparse matrix without having to manually wrap it in a delayed array.

@lazappi
Copy link
Author

lazappi commented Jan 17, 2018

@LTLA I figured that would probably be the case, just thought I would bring it up as it took me a while to work out that was the problem.

Would it be worth changing the default data for tests to the sparse version to find any other places there are problems? I figure anything that works on a sparse matrix will probably work on a regular one (but maybe not?).

@LTLA
Copy link
Collaborator

LTLA commented Jan 17, 2018

That's not a bad idea, and I'm sort of doing it anyway - I have been adding more tests to the package ever since I got full write access on Tuesday. However, most of the remaining failure points are located in the various plotting functions that I don't personally use, so they've been less of a priority (as I would have had to write C++ thingies to replace them). Should be a quick fix with DelayedMatrixStats.

@PeteHaitch
Copy link
Contributor

PeteHaitch commented Jan 17, 2018

@LTLA My hesitance is that an ANY method means people can pass in any old junk into it. Of course, it'll eventually error in that case, but it can be a bit cryptic to unpack where the error arose. From memory, I also hit some S4 inheritance complications when defining a ANY method for matrixStats functionality but I don't recall the details off hand.

Is it mostly an (understandable) annoyance having to wrap the input in a DelayedArray()? My understanding is calling DelayedArray() should be very cheap but there may be a performance hit I'm unaware of. If you wrap an ordinary matrix in a DelayedArray() it will still defer to the fast matrixStats implementation of e.g, rowMedians(), when DelayedMatrixStats is loaded so the effect should be negligible.

@LTLA
Copy link
Collaborator

LTLA commented Jan 17, 2018

Oh, speed isn't an issue. It's just that I'm sure I'll forget to wrap it in a DelayedArray() 50% of the time.

One option is to do something like this in your rowMedians.R source file (for example):

for (classname in c("dgCMatrix", "other acceptable classes")) {
    setMethod("rowMedians", classname, function(x, ...) {
         rowMedians(DelayedArray(x), ...)
    })
}

... which gives us what we need while still raising an error for weird crap. Mind you, this will only work for classes that are pretty standard (i.e., the Matrix classes). People who define their own classes are probably on their own; though one could define a helper function in DelayedMatrixStats that can be called in other packages to automatically define the above methods for custom classes.

@PeteHaitch
Copy link
Contributor

I'll revisit this when I return to DelayedMatrixStats; now that I've written the initial package, I think there are many places where inheritance can be simplified and repeated code can be eliminated.

Keeping up to date with "other acceptable classes" is a bit annoying. And DelayedMatrixStats is really designed for DelayedMatrix objects, rather than their seed classes. Ideally, seed-specific methods would go in their parent packages (e.g., rowMedians,dgCMatrix-method in Matrix), all of matrixStats functionality in base R, but this is complicated. And I'm pretty sure I'm hypocritically already doing such shenanigans of defining seed-specific methods in DelayedMatrixStats (but I'd like to reduce this).

@LTLA
Copy link
Collaborator

LTLA commented Jan 18, 2018

If you think keeping up with acceptable classes is annoying at the R level, spare a thought for how I'm going to have to deal with requests for extending beachmat!

Anyway - fine - I'll just wrap the damn thing in a DelayedArray. grumble. I doubt the Matrix maintainers will ever implement these methods though - I sent them a bug report about a segmentation fault in their QR decomposition and they still haven't dealt with it. Don't know if anyone cares about it over there.

@PeteHaitch
Copy link
Contributor

I've made an issue to remind myself to revisit this (PeteHaitch/DelayedMatrixStats#37). It's annoying, I know.

I'm also doubtful of seeing changes in base R or *Matrix, but I'd certainly like to implement a proper colMedians()/rowMedians() (and more) for sparse matrix. If we can't get them into Matrix then perhaps DelayedMatrixStats is the best home we can hope for?

@LTLA
Copy link
Collaborator

LTLA commented Jan 18, 2018

Maybe they just delete emails from whiny postdocs, and perhaps we'd have a better chance if Martin M. or Michael L. or someone closer to the R core team contacted them about this instead.

@LTLA
Copy link
Collaborator

LTLA commented Jan 18, 2018

All matrixStats instances have been replaced with their DelayedMatrixStats equivalents in b615be7.

plotQC now works as expected with a sparse matrix. plotRLE should also do so, provided you specify a colour_by argument. Admittedly this is not entirely apparent from the documentation.

There is a strong case for completely overhauling the plotting functions, which are currently very fragile to the different input types. I blame the use of ggplot2 and reshape2 and all that crap.

@Fougere87
Copy link

Hello,
I still have this issue trying to use plotExplanatoryVariables, Are there any turnarounds ???
Thamks !

@LTLA
Copy link
Collaborator

LTLA commented Jun 5, 2018

Make sure you're using the latest version of scater. For example, the following works for me:

library(scater)
data("sc_example_counts")
data("sc_example_cell_info")

library(Matrix)
spmat <- as(sc_example_counts, "dgCMatrix")
example_sce <- SingleCellExperiment(
     assays = list(counts = spmat), 
     colData = sc_example_cell_info)

example_sce <- normalize(example_sce)
example_sce <- calculateQCMetrics(example_sce)
vars <- names(colData(example_sce))[c(2:3, 5:14)]
plotExplanatoryVariables(example_sce, variables=vars)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants