Question about time of cell fraction estimation (RLR) #14

QiongZhao · 2024-11-26T06:55:21Z

Hi Weixu,

I am using ENIGMA to estimate cell fraction to estimate cell type-specific gene expression. I created an ENIGMA object using the 'single_cell' type, where the bulk is a matrix with 971 samples and 28,285 genes, and the reference is from 10 other samples, containing 12,506 cells and 28,285 genes. However, when I use the created ENIGMA object with the RLR method to estimate cell fractions, it shows that the program needs three days to run (ncores=5). I would like to ask if this is normal.

At the same time, I tried selecting 20-1000 genes in the Reference to get results faster. But an error occurs: Error in rlm.default(x, y, weights, method = method, wt.method = wt.method, : 'x' is singular: singular fits are not implemented in 'rlm'.

Kindly regards,

Zhao

The text was updated successfully, but these errors were encountered:

WWXkenmo · 2024-12-03T08:28:31Z

Hey,

Sorry for being late response, I am busy on my submission these weeks.

Is definately not normal, how are you actually implement deconvolution, could you share your code please?

All the best,
Weixu

QiongZhao · 2025-01-09T11:59:04Z

Hi, Weixu
I have already found the answer to my previous question. I am working with single-cell format data and have not performed batch effect correction.

I have a question about normalization method in the ENIGMA_L2_max_norm function for deconvolution.

Both my single-cell (reference) and bulk data are already TPM- ormalized.
I noticed the function has a do_cpm parameter, which defaults to TRUE.
Through my experiments, I found that when the data is TPM input, setting do_cpm=T and preprocess='log' yields the best results.
I am confused about whether the do_cpm parameter would re-normalize TPM-transformed reference data using CPM, and whether it would apply log transformation to TPM-input bulk data. Moreover, I couldn't find the implementation of the do_cpm parameter in the source code.

Could you kindly provide some advice on the following:

What are the most reasonable parameter settings when my reference and bulk data are already TPM-normalized?
Should I instead use raw count data and then set do_cpm=T and preprocess='log'?
Do the reference and bulk data need to be normalized using exactly the same method?

I would greatly appreciate your insights and guidance.

Thank you very much for your help!

Best regards,
Qiong Zhao

QiongZhao changed the title ~~Question about time of cell fraction rstimation (RLR)~~ Question about time of cell fraction estimation (RLR) Nov 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about time of cell fraction estimation (RLR) #14

Question about time of cell fraction estimation (RLR) #14

QiongZhao commented Nov 26, 2024

WWXkenmo commented Dec 3, 2024

QiongZhao commented Jan 9, 2025

Question about time of cell fraction estimation (RLR) #14

Question about time of cell fraction estimation (RLR) #14

Comments

QiongZhao commented Nov 26, 2024

WWXkenmo commented Dec 3, 2024

QiongZhao commented Jan 9, 2025