-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reducing dataset size to improve run time #223
Comments
Hi @fluentin44 We generally recommend supplying the entire count matrix (if possible given memory requirements) and then supply the genes you would like to fit using the Hope this helps. |
Ok much appreciated! Thanks, |
Hi, -Focusing on 2kgenes, it seems that subsetting counts prior to fitGAM and provoding full counts with genes=2kgenes gives different results, is that possible? Is there anything else happening a side from normalization which includes informations from other genes during the fitting? -In the case that highly variable genes are scored as a consequence of capturing differences among lineages, wouldn't be that a source of bias during normalization? Thanks a lot |
Yes, that is possible. If you first subset the 2K genes and then run
If there are large systematic differences between the groups you are comparing, this can indeed be an issue in normalization. In tradeSeq, we are relying on TMM normalization as described here. One of the main assumptions is that the majority of genes are not differentially expressed. I would advice against only providing the subsetted count matrix to |
Hi,
I have a dataset of ~25k cells and 130 samples so computation time and memory to run fitGAM are going to be an issue for me. With respect to that I have seen reccomendations to reduce the number of genes put into the function just to the top 2k variable features, however can I clarify - is that reducing the whole counts matrix down to 2k features, or keeping the whole counts matrix and putting the names of the top 2k variable features into the genes argument?
Thanks,
Matt
The text was updated successfully, but these errors were encountered: