Analysis questions #1

Nusob888 · 2024-01-08T10:38:40Z

Many thanks for this fantastic tool. My questions are not really an issue but more theoretical.

I presume if one does not have labels from the original datasets to be integrated, one could choose clustering resolutions for each dataset independently and then use cell hint to harmonise clustering across datasets?
Would it be possible to perform cell hint within a dataset? e.g. to harmonise different resolutions of clustering to find and optimise consensus clusters? This would an incredibly useful way to automate cluster resolution optimisation. If that is the case, would there be any appetite to develop this as an additional function?

ChuanXu1 · 2024-01-09T17:53:43Z

Yes
Thank you for this nice suggestion although currently it is not possible (at least not directly feasible) with CellHint. I will make such function late this month.

ChuanXu1 · 2024-01-20T19:49:34Z

@Nusob888, please try cellhint.selfmatch to harmonize different annotations for the same set of cells. This function has been added in version 1.0.0. One thing to note is that harmonization was initially designed to unify cell type annotations from different datasets, this self-match function is thus a modified version for dealing with cells from only a single dataset.

Nusob888 · 2024-01-29T10:38:55Z

Hi, I am going to try this function this week.
Can I check a few things for the use_rep option?

Would you recommend:

Calculating an embedding per dataset? or a latent embedding on the whole dataset?
Using the raw expression matrix rather than use_rep? Presumably this will better suited for datasets with strong batch effects?

Additionally:

Would you advise against using cell hint on pre-integrated data embeddings such as scVI? as that might defeat the point of correction agnostic harmonisation?

Thanks again for all the input

ChuanXu1 · 2024-02-12T13:26:47Z

@Nusob888, use_rep is usually suggested rather than raw expression matrix, as the latter is time-consuming. A latent space on the whole dataset is preferred. For the choice of latent representation, it's flexible. Using PCA is correction-agnostic, but may be noisy in terms of batch effect (CellHint has an internal procedure to mitigate this but cannot exclude its influence). Pre-integrated embeddings such as scVI are also good alternatives, but note that the result will be tuned towards the structure defined by scVI.

ChuanXu1 added a commit that referenced this issue Jan 20, 2024

Add selfmatch #1

2ff2652

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Analysis questions #1

Analysis questions #1

Nusob888 commented Jan 8, 2024

ChuanXu1 commented Jan 9, 2024

ChuanXu1 commented Jan 20, 2024

Nusob888 commented Jan 29, 2024

ChuanXu1 commented Feb 12, 2024

Analysis questions #1

Analysis questions #1

Comments

Nusob888 commented Jan 8, 2024

ChuanXu1 commented Jan 9, 2024

ChuanXu1 commented Jan 20, 2024

Nusob888 commented Jan 29, 2024

ChuanXu1 commented Feb 12, 2024