Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tascCODA to analyse compositional changes in scRNAseq between case and control (taking in account covariates as age and sex) #3

Open
mohebg opened this issue Nov 28, 2022 · 6 comments

Comments

@mohebg
Copy link

mohebg commented Nov 28, 2022

Hi, Good day, thank you for the nice package.

I have some questions on how to use tascCODA to regress covariants as age and sex in addressing the compositional changes between case and control in scRNAseq.

In your paper you state:
"More generally, however, tascCODA enables to determine how host phenotype, such as disease status, host covariates such as age, gender, or an individual’s demographics, or environmental factors jointly influence the compositional counts"

Shall the formula be written like this:

tree_mod= ana.CompositionalAnalysisTree(
datax.copy(),
reference_cell_type="automatic",
formula="PATH+age+sex",
reg="scaled_3",
pen_args={"phi": 0, "lambda_1": 1.7}
)

  • "PATH" is the metadata with "Case" vs "Control" labels.
  • Would making the formula "PATH+age+sex" regress out age/sex in the case vs control comparison?
  • I also wanted to ask what the following arguments mean?
    "reg="scaled_3"
    pen_args={"phi": 0, "lambda_1": 1.7}

Thank you very much in advance.

Best
Moheb

@johannesostner
Copy link
Member

Hi @mohebg,
thanks for your interest in tascCODA!

The "formula" parameter determines, like in R's lm function, which covariates are considered for modeling. Currently, tascCODA performs model selection for all covariates in the formula, meaning that we look whether effects are significant for all covariate/tree node pairs. It's not possible at the moment to just adjust for a covariate without running model selection on it, although this might be possible in a future update.

Regarding the other arguments, you can ignore the reg parameter. This is only needed for switching between earlier versions of the tree-aggregated penalization scheme. The one described in the paper is "reg_3", which is also the default.
With the pen_args parameter, you can set the phi (aggregation bias) and lambda_1 (regularization strength) values, like they are described in the paper.

I hope that this answers your questions!

@mohebg
Copy link
Author

mohebg commented Nov 28, 2022

Hi @johannesostner ,

Thanks alot for your prompt reply.
According to my understanding, the best practice for adjusting for a covariate (or the statistical elimination of a covariate) is to simply add the covariate to the linear model.
As you have stated the formula is an R style, so in order regress out age and sex, shall the formula be written like this: formula="PATH+age+sex".

  • "PATH" is the metadata with "Case" vs "Control" labels.

So, would making the formula "PATH+age+sex" regress out age/sex in the case vs control comparison?

Thanks alot

@johannesostner
Copy link
Member

Yes, just add the covariate to the model. That's what I would do as well. As I said earlier, this does not "regress out" age/sex, but tascCODA will try to find significant impacts of age/sex and adjust for them accordingly. If age/sex don't have a significant impact on the composition, they also won't be adjusted for.
In that regard, it's not a standard adjustment for the covariates.

@johannesostner
Copy link
Member

Also, please make sure that all covariates are scaled to the same range (i.e. [0-1]), as the selection of significant associations will otherwise be biased

@mohebg
Copy link
Author

mohebg commented Nov 29, 2022

@johannesostner , thanks alot for your reply, I appreciate.
I am not sure if I fully understand the sentence "covariates are scaled to the same range (i.e. [0-1])".

I have there levels of covariants:

  • pathology vs control - which is a categorical covariate
  • male vs female - which is a categorical covariate
  • age is a numeric continuous covariate

@johannesostner
Copy link
Member

Just make sure that age is also scaled to a range between 0 and 1 (i.e. via min-max scaling like we did in the microbiome application of our paper). Otherwise the effects for age (since its range is so much bigger than for the categorical covariates, which will be encoded as 0/1) will be very small numerically and thus never selected to be significantly different from 0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants