Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analysis questions #1

Open
Nusob888 opened this issue Jan 8, 2024 · 4 comments
Open

Analysis questions #1

Nusob888 opened this issue Jan 8, 2024 · 4 comments

Comments

@Nusob888
Copy link

Nusob888 commented Jan 8, 2024

Many thanks for this fantastic tool. My questions are not really an issue but more theoretical.

  • I presume if one does not have labels from the original datasets to be integrated, one could choose clustering resolutions for each dataset independently and then use cell hint to harmonise clustering across datasets?
  • Would it be possible to perform cell hint within a dataset? e.g. to harmonise different resolutions of clustering to find and optimise consensus clusters? This would an incredibly useful way to automate cluster resolution optimisation. If that is the case, would there be any appetite to develop this as an additional function?
@ChuanXu1
Copy link
Collaborator

ChuanXu1 commented Jan 9, 2024

@Nusob888

  1. Yes
  2. Thank you for this nice suggestion although currently it is not possible (at least not directly feasible) with CellHint. I will make such function late this month.

ChuanXu1 added a commit that referenced this issue Jan 20, 2024
@ChuanXu1
Copy link
Collaborator

@Nusob888, please try cellhint.selfmatch to harmonize different annotations for the same set of cells. This function has been added in version 1.0.0. One thing to note is that harmonization was initially designed to unify cell type annotations from different datasets, this self-match function is thus a modified version for dealing with cells from only a single dataset.

@Nusob888
Copy link
Author

Hi, I am going to try this function this week.
Can I check a few things for the use_rep option?

Would you recommend:

  • Calculating an embedding per dataset? or a latent embedding on the whole dataset?
  • Using the raw expression matrix rather than use_rep? Presumably this will better suited for datasets with strong batch effects?

Additionally:

  • Would you advise against using cell hint on pre-integrated data embeddings such as scVI? as that might defeat the point of correction agnostic harmonisation?

Thanks again for all the input

@ChuanXu1
Copy link
Collaborator

@Nusob888, use_rep is usually suggested rather than raw expression matrix, as the latter is time-consuming. A latent space on the whole dataset is preferred. For the choice of latent representation, it's flexible. Using PCA is correction-agnostic, but may be noisy in terms of batch effect (CellHint has an internal procedure to mitigate this but cannot exclude its influence). Pre-integrated embeddings such as scVI are also good alternatives, but note that the result will be tuned towards the structure defined by scVI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants