-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Looping over cell types to use as reference: discrepancy? #53
Comments
Hi Koen,
Since inference in I hope that this explains your observation! |
One additional remark: |
Thanks @johannesostner. Am I right to think that, as I increase the number of posterior samples, the between-run variability should decrease since the posterior is estimated with higher precision? |
As another sanity check, when looping over all cell types to use as a reference, I've set |
Thanks @koenvandenberge for pointing these things out! Yes, a larger number of posterior samples can decrease the between-run variability, but only up to a certain point. We observed some fluctuation in the significance of some cell types even after running longer chains before, but not to the extent that you describe here. As for your second comment, this is a result that surprises me. An FDR of 1 should definitely lead to all cell types (except the reference) being significant. I'm really curious what went wrong there. Could you please give me some info about your data (number of samples and cell types, the chain lengths you run, version of Thanks in advance! |
Thanks @johannesostner, this is incredibly useful. Unfortunately, this is sensitive data and I cannot share it as such. But I will try to make a reproducible example for this. I am using Acceptance rates of the runs mentioned above are quite similar each time, ranging from 52-60%. Hope to get back to you soon with a reproducible example. |
Apologies for the delay in getting back to you. Further, when re-evaluating the code when using Let me know if anything is unclear or something else would be required. Thanks in advance for taking a look. Import in R
Analyze in PythonSet-up
Automatic selection of reference populationHere, we find no significant cell types, except for high FDR values.
Manual selection of reference cell type:
|
Hi @koenvandenberge! Thanks for taking the time to prepare a mock dataset. I've looked at the data and your analysis - here's what's going on: Setting FDR to 0.4As I mentioned earlier in the discussion, scCODA uses the posterior inclusion probabailities (PIPs) of the cell types to calculate a decision boundary. This boundary is placed such that the average rejection probability (1-PIP) of all selected cell types is less than the desired FDR. Taking 10 runs with Looping over different referencesHere, I feel like you misinterpreted the results. I ran your code (using each cell type as reference with FDR=1) and get this result, telling me that every cell type was selected in all runs but one: I hope that this answers your questions. For reference, you can find my code in the attached notebook as well. If you have more questions, don't hesitate to continue this discussion! |
Thanks so much. I agree with all your points, and the figure on inclusion probabilities is incredibly insightful! It does look like we are having a discrepancy when looping over the different cell types. I agree that if I would have had the result you have, my interpretation would have been wrong. However, these are the results I am getting, and this is what actually sparked my concerns initially... So, in the end, could all of this be due to a legacy version issue? Or is something wrong with my code of looping over cell types? |
I tried running the code you posted again and still get the same result as before. In the screenshot you just posted, it looks like only the result of the very last run (reference I have a few ideas why this might have happened, none of them directly related to
Could you please check at least points 1 and 2? |
Just already letting you know that I have now run your shared code in a Jupyter Notebook and I indeed get similar results. I am working in a container with fixed software versions, so it definitely is not a version issue. Will investigate further. |
Hi @johannesostner, |
Hi All,
I am using
scCODA
for differential abundance analysis of a scRNA-seq dataset.If I perform a manual reference cell type selection using e.g. B-cells as reference, and another time using dendritic cells as reference, I find that in both cases, for example, T-cells are differentially abundant between conditions, at a permissive FDR of
0.4
.However, when looping over the different cell types to use each as a reference sequentially to check the stability of the results, I notice that the, e.g., T-cells are only significant once. How can this discrepancy be explained?
Here is the code for automatic/manual selection of these cell types (automatic selection selects another cell type than B-cells).
To loop over the cell types, I am using the code from the vignette (slightly adapted to allow for permissive FDR).
The text was updated successfully, but these errors were encountered: