-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Mp/dispersion smoothing #145
base: development
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what the unfinished tasks' status is, but the PR as it stands looks good. I would say unit-tests are a must but that's up to David at the end of the day.
if isinstance(genes_log_gmean, dask.array.core.Array): | ||
genes_log_gmean = genes_log_gmean.compute() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because I had trouble with this downstream. It was easier to transform to a dense numpy array before. I will check again to see if I can delay this further.
genes_log_gmean = genes_log_gmean.compute() | ||
|
||
# specify which kind of regularization is performed | ||
scale_param = np.exp(self.model_container.theta_scale[0]) # TODO check if this is always correct |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why an exponentiation here? is this meant to be link-function specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how to resolve this (thus the TODO). Indeed, I think this depends on the noise model. I will have to look at that again.
Hi, "implement edgeR approach" this is actually a very important use-case for batchGLM. We are planning to use batchGLM/diffxpy for a pure Python implementation of MILO and we promised to be able to 1:1 replace
eventually. I would kindly ask you to also strongly consider implementing this. Having the edgeR and DEseq2 approaches being implemented here will also greatly boost the impact. I have no doubt about this. |
Most definitely. I had a look at the edgeR source code for already. It shouldn't be too complicated to transfer this to batchGLM. I will start implementing this tomorrow but cannot give an estimate for the time it'll take at this point in time. I think the main part would be to take over |
Amazing @picciama
Would it be too crazy to just compare for example the DE results of the edgeR reimplementation and the new Python version for a small dataset/simulation? I know that the edgeR model does a lot, but this might be the eventual goal? Thank you! |
This branch contains the dispersion-smoothing functionality:
Optional TODOs: