Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gene identifier mapping #10

Open
Souhatifour opened this issue Dec 20, 2023 · 0 comments
Open

Gene identifier mapping #10

Souhatifour opened this issue Dec 20, 2023 · 0 comments

Comments

@Souhatifour
Copy link

Souhatifour commented Dec 20, 2023

Issue
In Part 2 of the tutorial, gene identifier mapping is not explicitly mentioned, but there are instances where gene mapping is needed. Especially, the tutorial involves various steps related to gene expression data and the selection of specific genes like selecting, intersecting, and manipulating gene sets based on their relevance to the analysis.

Example Scenario:
Consider Task 2 of the BuDDI analysis tutorial, where gene IDs are formatted as 'ENSG00000000003', 'ENSG00000000005', 'ENSG00000000419', 'ENSG00000000457', and the goal is to transform these gene IDs into a different format like 'MIR1302-2HG', 'FAM138A', 'OR4F5', 'AL627309.1', 'AL627309.3'.

For your specific situation, the single-cell matched tissue has the mapping of the genes.

Suggested Approach:
In certain scenarios, the gene identifier mapping may not be available for all genes when transitioning from Ensembl IDs to ontology-based names. To address this, it is recommended to leverage the gene mapping from the single-cell matched tissue, as it likely contains a more comprehensive set of mappings.

In the provided example, a gene mapping is demonstrated using a Pandas DataFrame. The mapping includes columns for gene names ("Name") and Ensembl identifiers ("Ens"). The mapping can be done follow:

Create an empty DataFrame with columns for gene names and Ensembl IDs

gene_maps = pd.DataFrame(columns=["Name", "Ens"])  

Populate the "Name" column with gene names from the single cell AnnData object

gene_maps["Name"] = adata.var.index  

Populate the "Ens" column with Ensembl IDs from the AnnData object

gene_maps["Ens"] = adata.var["gene_ids"].values  

Save the gene mapping DataFrame to a CSV file

gene_maps.to_csv(f'{data_path}/gene_maps.csv')  

Extract the gene names for later use

gene_ids = gene_maps["Name"] 
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant