Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CTS-DE Analysis #11

Open
GUnit-Lab opened this issue Feb 24, 2023 · 4 comments
Open

CTS-DE Analysis #11

GUnit-Lab opened this issue Feb 24, 2023 · 4 comments

Comments

@GUnit-Lab
Copy link

Thank you so much for designing this wonderful program. For the Cell Type Specific Differential Expression (CTS-DE) Analysis could you please offer more context for setting this up after normalizing the ENIGMA? In vignette the analysis is performed by:
p <- 100
y <- as.numeric(gl(2, p/2)) - 1
DEG <- FindCSE_DEG(egm, y)

How are the values and formula for 'p' and 'y' determined? Doing 'Help?' for FindCSE_DEG does not define 'y'

Currently I cannot get this step to work and get error:

"CSE estimates and y have different length"
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'object' in selecting a method for function 'coef': variable lengths differ (found for 'Epithelium')

@WWXkenmo
Copy link
Owner

WWXkenmo commented Mar 9, 2023

Hey,

Thanks for using our tool, and sorry for my late response as I went to Italy for vacation in past two weeks.

The provided exampleDataset is used for only run for demo, it's a simulation dataset so we know that y is [1,1,1...,1,0,0,0,...,0] vector.

Here y is phenotype vector, in real case, suppose we have three samples with sample-1,2,3, and the phenotype you interest with is ['control', 'case', 'control']. Then we encoded the phenotype vector as [0,1,0]. In this case, the positive ExpressionDifference means gene is up-regulated in 'case' sample.

Following above way to provide your y vector, p is not important and only used for demo.

Note: I updated the package, when run ENIGMA_trace_norm, please use following function

egm = ENIGMA_trace_norm(egm, do_cpm = FALSE, solver = "proximalpoint",pos=FALSE,preprocess = "none",alpha = 0.1,verbose = TRUE)

the newest added parameter do_cpm means whether we need to perform cpm normalisation perform deconvolution. in our simulation dataset, because gene expression is sampled from normal distribution, so you could regard the input gene expression matrix as Z-score transformed matrix, then perform deconvolution. Therefore any preprocessing is unnecessary, so do_cpm = FALSE, pos = FALSE (convert the resulted gene expression matrix is positive in all entries) and preprocess = "none" (don't need log or sqrt transformation)

Best,
Ken

@GUnit-Lab
Copy link
Author

Hi Ken,
Thank you for the added explanation. So in your new example above I would need to do:
y <- as.numeric(gl(0,1,0))
DEG <- FindCSE_DEG(egm, y)

@WWXkenmo
Copy link
Owner

No, gl is a function used to generate factor level, what you need is just

y = c(0,1,0)

in my provided case

Best,
Ken

@GUnit-Lab
Copy link
Author

GUnit-Lab commented Apr 6, 2023

Yay thank you so much, this corrected the initial issue for me and now my 3 control and 3 case samples are entered correctly. But with:

y <- c(0,0,0,1,1,1)
DEG <- FindCSE_DEG(egm_tn, y)

I now have new issue I cannot figure out how to solve:
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'object' in selecting a method for function 'coef': Argument mu must be a nonempty numeric vector

I looked at the traceback:
15.
h(simpleError(msg, call))
14.
.handleSimpleError(function (cond)
.Internal(C_tryCatchHelper(addr, 1L, cond)), "Argument mu must be a nonempty numeric vector",
base::quote(family$linkfun(mustart)))
13.
family$linkfun(mustart)
12.
etastart %||% {
if (!is.null(start))
if (length(start) != nvars)
stop(gettextf("length of 'start' should equal %d and correspond to initial coefs for %s", ...
11.
glm.fit(x = structure(numeric(0), dim = c(0L, 14L), dimnames = list(
NULL, c("(Intercept)", "Epithelium", "Glans.Mesenchyme",
"Preputial.Mesenchyme", "Proliferating.Mesenchyme", "Glans.Distal.Dorsal",
"Glans.Distal.Ventral.Late", "Glans.Proximal", "Glans.Proximal.Late", ...
10.
eval(call(if (is.function(method)) "method" else method, x = X,
y = Y, weights = weights, start = start, etastart = etastart,
mustart = mustart, offset = offset, family = family, control = control,
intercept = attr(mt, "intercept") > 0L, singular.ok = singular.ok))
9.
eval(call(if (is.function(method)) "method" else method, x = X,
y = Y, weights = weights, start = start, etastart = etastart,
mustart = mustart, offset = offset, family = family, control = control,
intercept = attr(mt, "intercept") > 0L, singular.ok = singular.ok))
8.
glm(y ~ ., data = data.frame(t(x)), family = "binomial")
7.
summary(glm(y ~ ., data = data.frame(t(x)), family = "binomial"))
6.
coef(summary(glm(y ~ ., data = data.frame(t(x)), family = "binomial")))
5.
FUN(array(newX[, i], d.call, dn.call), ...)
4.
apply(A, 1, function(x) {
pval = coef(summary(glm(y ~ ., data = data.frame(t(x)), family = "binomial")))[,
4]
return(get_pval(pval, cell_type, K)) ...
3.
test(O, y, covariate)
2.
DEG_test(Exp, y, covariate)
1.
FindCSE_DEG(egm_tn, y)

I cannot figure out how to correct this issue with mu.
Do you have ideas what I have done wrong and suggestions how I can fix?
Thank you kindly
Victor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants