Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enquiry on downsampling to ensure balance of conditions along pseudotime for conditionTest #219

Open
JesseRop opened this issue Oct 24, 2022 · 2 comments

Comments

@JesseRop
Copy link

Dear developers,

Thank you very much for this great tool.
I am doing DE between 2 conditions which are unbalanced along pseudotime and I am trying to understand whether downsampling is necessary as this results in loss of cells and hence power.

This is my original dataset

image

I have then downsampled to this

image

But I lose lots of cells and I also have other similar datasets where I am loosing cells when downsampling.

My question is whether I can run tradeseq on the original dataset without downsampling or whether the downsampling approach I have applied to ensure balance along pseudotime is the correct way to go about it.

thanks,
Jesse

@HectorRDB
Copy link
Collaborator

Hello,
There is no need to downsample. The covariance matrix of the coefficients of each condition will incorporate the number of samples in that uncertainty, which will also be reflected in the conditionTest.

In your case though, the red condition does not seem to follow a trajectory be to form very clear clusters. I would therefore just be careful to the number of knots and be sure to test using a log-fold change cutoff using the l2fc argument

@JesseRop
Copy link
Author

JesseRop commented Nov 12, 2022

Dear @HectorRDB ,

Many thanks for your response. It is very helpful!
Biologically we expect that the cluster in the red condition at the very terminal end (bottom right) is a much more developmentally advanced population than all the other cells.

I have been able to run fitGAM (with nknots = 6) and then conditionTest with 'log2(1.25)' threshold in l2fc argument.

Using plotSmoothers to visualize expression per gene, it seems the smoother lines for each condition are always below the average expression of the first population on the left half of the plots. I have given 2 examples below. I have tried playing around with a range of knots (3-7) but it doesn't change much. Kindly advise on whether this is expected. Could be due to the gap between the red condition populations?
Thanks!

image

image

Below are plots for the same genes generated in ggplot (+ geom_point() + geom_smooth(method = 'gam', formula = y ~ s(x, bs = "cs")))

image

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants