Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about time of cell fraction estimation (RLR) #14

Open
QiongZhao opened this issue Nov 26, 2024 · 2 comments
Open

Question about time of cell fraction estimation (RLR) #14

QiongZhao opened this issue Nov 26, 2024 · 2 comments

Comments

@QiongZhao
Copy link

Hi Weixu,

I am using ENIGMA to estimate cell fraction to estimate cell type-specific gene expression. I created an ENIGMA object using the 'single_cell' type, where the bulk is a matrix with 971 samples and 28,285 genes, and the reference is from 10 other samples, containing 12,506 cells and 28,285 genes. However, when I use the created ENIGMA object with the RLR method to estimate cell fractions, it shows that the program needs three days to run (ncores=5). I would like to ask if this is normal.

At the same time, I tried selecting 20-1000 genes in the Reference to get results faster. But an error occurs: Error in rlm.default(x, y, weights, method = method, wt.method = wt.method, : 'x' is singular: singular fits are not implemented in 'rlm'.

Kindly regards,

Zhao

@QiongZhao QiongZhao changed the title Question about time of cell fraction rstimation (RLR) Question about time of cell fraction estimation (RLR) Nov 26, 2024
@WWXkenmo
Copy link
Owner

WWXkenmo commented Dec 3, 2024

Hey,

Sorry for being late response, I am busy on my submission these weeks.

Is definately not normal, how are you actually implement deconvolution, could you share your code please?

All the best,
Weixu

@QiongZhao
Copy link
Author

Hi, Weixu
I have already found the answer to my previous question. I am working with single-cell format data and have not performed batch effect correction.

I have a question about normalization method in the ENIGMA_L2_max_norm function for deconvolution.

  1. Both my single-cell (reference) and bulk data are already TPM- ormalized.
  2. I noticed the function has a do_cpm parameter, which defaults to TRUE.
  3. Through my experiments, I found that when the data is TPM input, setting do_cpm=T and preprocess='log' yields the best results.
  4. I am confused about whether the do_cpm parameter would re-normalize TPM-transformed reference data using CPM, and whether it would apply log transformation to TPM-input bulk data. Moreover, I couldn't find the implementation of the do_cpm parameter in the source code.

Could you kindly provide some advice on the following:

  1. What are the most reasonable parameter settings when my reference and bulk data are already TPM-normalized?

  2. Should I instead use raw count data and then set do_cpm=T and preprocess='log'?

  3. Do the reference and bulk data need to be normalized using exactly the same method?

I would greatly appreciate your insights and guidance.

Thank you very much for your help!

Best regards,
Qiong Zhao

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants