You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using ENIGMA to estimate cell fraction to estimate cell type-specific gene expression. I created an ENIGMA object using the 'single_cell' type, where the bulk is a matrix with 971 samples and 28,285 genes, and the reference is from 10 other samples, containing 12,506 cells and 28,285 genes. However, when I use the created ENIGMA object with the RLR method to estimate cell fractions, it shows that the program needs three days to run (ncores=5). I would like to ask if this is normal.
At the same time, I tried selecting 20-1000 genes in the Reference to get results faster. But an error occurs: Error in rlm.default(x, y, weights, method = method, wt.method = wt.method, : 'x' is singular: singular fits are not implemented in 'rlm'.
Kindly regards,
Zhao
The text was updated successfully, but these errors were encountered:
QiongZhao
changed the title
Question about time of cell fraction rstimation (RLR)
Question about time of cell fraction estimation (RLR)
Nov 26, 2024
Hi, Weixu
I have already found the answer to my previous question. I am working with single-cell format data and have not performed batch effect correction.
I have a question about normalization method in the ENIGMA_L2_max_norm function for deconvolution.
Both my single-cell (reference) and bulk data are already TPM- ormalized.
I noticed the function has a do_cpm parameter, which defaults to TRUE.
Through my experiments, I found that when the data is TPM input, setting do_cpm=T and preprocess='log' yields the best results.
I am confused about whether the do_cpm parameter would re-normalize TPM-transformed reference data using CPM, and whether it would apply log transformation to TPM-input bulk data. Moreover, I couldn't find the implementation of the do_cpm parameter in the source code.
Could you kindly provide some advice on the following:
What are the most reasonable parameter settings when my reference and bulk data are already TPM-normalized?
Should I instead use raw count data and then set do_cpm=T and preprocess='log'?
Do the reference and bulk data need to be normalized using exactly the same method?
I would greatly appreciate your insights and guidance.
Hi Weixu,
I am using ENIGMA to estimate cell fraction to estimate cell type-specific gene expression. I created an ENIGMA object using the 'single_cell' type, where the bulk is a matrix with 971 samples and 28,285 genes, and the reference is from 10 other samples, containing 12,506 cells and 28,285 genes. However, when I use the created ENIGMA object with the RLR method to estimate cell fractions, it shows that the program needs three days to run (ncores=5). I would like to ask if this is normal.
At the same time, I tried selecting 20-1000 genes in the Reference to get results faster. But an error occurs:
Error in rlm.default(x, y, weights, method = method, wt.method = wt.method, : 'x' is singular: singular fits are not implemented in 'rlm'.
Kindly regards,
Zhao
The text was updated successfully, but these errors were encountered: