a question about Merging sample first or filtering cells/genes first #6171

xiaowenchenjax · 2022-07-08T22:35:49Z

xiaowenchenjax
Jul 8, 2022

Dear Developer,
I saw many scRNA-seq workshops:
such as "https://github.com/hbctraining/scRNA-seq/tree/master/lessons" and "https://nbisweden.github.io/workshop-scRNAseq/labs/compiled/seurat/seurat_01_qc.html"
They did not remove doublet, and they created the merged object first, and then performed QC and filtered cells and genes.
My question is: Does it make sense? The pipeline should be 1. QC for each sample, 2.filter cells and genes for each sample separately, 3. remove doublet, 4. create a merged or integrated object.
Is my understanding correct?
Thanks.

klpara · 2022-07-25T19:44:16Z

klpara
Jul 25, 2022

Not a bioinformatics person, but my understanding is that any filtering should be uniform across your entire dataset. Treating each replicate or condition differently probably risks creating false positives or negatives.

0 replies

carolgirardi · 2022-08-05T14:03:17Z

carolgirardi
Aug 5, 2022

I believe that cell QC can be done both at the sample or merged data level. It might not make difference since the threshold parameters are not relative to other cells' values - as long as you apply the same cutoffs for all groups, of course. But I don't see any advantage in doing this sample by sample. At most, you might have a more complete view of sample CQ.
However, I would say that gene QC and all following preprocessing steps must be done with the whole dataset, for the reason klpara mentioned before. I am not sure about the pipeline for doublet removal, but I would probably run this in the merged dataset for the same reason.
Best,
Carol

0 replies

wvictor14 · 2022-11-15T17:26:24Z

wvictor14
Nov 15, 2022

Doublets can only occur in samples that were physically processed and prepped together. Therefore doublet detection methods like scrublet is computed on a sample-per-sample basis. If a tool allows usage after merging, note that it's pretty likely, under-the-hood computing is done on a individual sample process.

Scrublet docs:

Best practices:
When working with data from multiple samples, run Scrublet on each sample separately. Because Scrublet is designed to detect technical doublets formed by the random co-encapsulation of two cells, it may perform poorly on merged datasets where the cell type proportions are not representative of any single sample.

As others mention though, I think other QC steps like nGene, nUMI, mtGene fraction etc. can be done after merging as those thresholds are often done in a sample-agnostic fashion anyways.

0 replies

qdong2023 · 2023-07-16T22:58:55Z

qdong2023
Jul 16, 2023

Hi, I'm still confused. Does it mean that as long as the QC cutoffs are the same across multiple datasets (Seurat objects), I can merge/integrate filtered datasets (the objects after QC, Normalization, and removing doublets) into one? The reason I'm asking this is that I am going to combine multiple large datasets, so using filtered datasets may be smaller and easier to handle. Thanks!

0 replies

Raminyazdani · 2024-11-10T20:49:36Z

Raminyazdani
Nov 10, 2024

so if we have 4 data sets , first we need to individually remove doublets from them , normalize them , merge , then filter base on QC metrics ?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

a question about Merging sample first or filtering cells/genes first #6171

{{title}}

Replies: 5 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

a question about Merging sample first or filtering cells/genes first #6171

xiaowenchenjax Jul 8, 2022

Replies: 5 comments

klpara Jul 25, 2022

carolgirardi Aug 5, 2022

wvictor14 Nov 15, 2022

qdong2023 Jul 16, 2023

Raminyazdani Nov 10, 2024

xiaowenchenjax
Jul 8, 2022

klpara
Jul 25, 2022

carolgirardi
Aug 5, 2022

wvictor14
Nov 15, 2022

qdong2023
Jul 16, 2023

Raminyazdani
Nov 10, 2024