a question about Merging sample first or filtering cells/genes first #6171
Replies: 5 comments
-
Not a bioinformatics person, but my understanding is that any filtering should be uniform across your entire dataset. Treating each replicate or condition differently probably risks creating false positives or negatives. |
Beta Was this translation helpful? Give feedback.
-
I believe that cell QC can be done both at the sample or merged data level. It might not make difference since the threshold parameters are not relative to other cells' values - as long as you apply the same cutoffs for all groups, of course. But I don't see any advantage in doing this sample by sample. At most, you might have a more complete view of sample CQ. |
Beta Was this translation helpful? Give feedback.
-
Doublets can only occur in samples that were physically processed and prepped together. Therefore doublet detection methods like scrublet is computed on a sample-per-sample basis. If a tool allows usage after merging, note that it's pretty likely, under-the-hood computing is done on a individual sample process. Scrublet docs:
As others mention though, I think other QC steps like nGene, nUMI, mtGene fraction etc. can be done after merging as those thresholds are often done in a sample-agnostic fashion anyways. |
Beta Was this translation helpful? Give feedback.
-
Hi, I'm still confused. Does it mean that as long as the QC cutoffs are the same across multiple datasets (Seurat objects), I can merge/integrate filtered datasets (the objects after QC, Normalization, and removing doublets) into one? The reason I'm asking this is that I am going to combine multiple large datasets, so using filtered datasets may be smaller and easier to handle. Thanks! |
Beta Was this translation helpful? Give feedback.
-
so if we have 4 data sets , first we need to individually remove doublets from them , normalize them , merge , then filter base on QC metrics ? |
Beta Was this translation helpful? Give feedback.
-
Dear Developer,
I saw many scRNA-seq workshops:
such as "https://github.com/hbctraining/scRNA-seq/tree/master/lessons" and "https://nbisweden.github.io/workshop-scRNAseq/labs/compiled/seurat/seurat_01_qc.html"
They did not remove doublet, and they created the merged object first, and then performed QC and filtered cells and genes.
My question is: Does it make sense? The pipeline should be 1. QC for each sample, 2.filter cells and genes for each sample separately, 3. remove doublet, 4. create a merged or integrated object.
Is my understanding correct?
Thanks.
Beta Was this translation helpful? Give feedback.
All reactions