Gene identifier mapping #10

Souhatifour · 2023-12-20T20:15:19Z

Issue
In Part 2 of the tutorial, gene identifier mapping is not explicitly mentioned, but there are instances where gene mapping is needed. Especially, the tutorial involves various steps related to gene expression data and the selection of specific genes like selecting, intersecting, and manipulating gene sets based on their relevance to the analysis.

Example Scenario:
Consider Task 2 of the BuDDI analysis tutorial, where gene IDs are formatted as 'ENSG00000000003', 'ENSG00000000005', 'ENSG00000000419', 'ENSG00000000457', and the goal is to transform these gene IDs into a different format like 'MIR1302-2HG', 'FAM138A', 'OR4F5', 'AL627309.1', 'AL627309.3'.

For your specific situation, the single-cell matched tissue has the mapping of the genes.

Suggested Approach:
In certain scenarios, the gene identifier mapping may not be available for all genes when transitioning from Ensembl IDs to ontology-based names. To address this, it is recommended to leverage the gene mapping from the single-cell matched tissue, as it likely contains a more comprehensive set of mappings.

In the provided example, a gene mapping is demonstrated using a Pandas DataFrame. The mapping includes columns for gene names ("Name") and Ensembl identifiers ("Ens"). The mapping can be done follow:

Create an empty DataFrame with columns for gene names and Ensembl IDs

gene_maps = pd.DataFrame(columns=["Name", "Ens"])

Populate the "Name" column with gene names from the single cell AnnData object

gene_maps["Name"] = adata.var.index

Populate the "Ens" column with Ensembl IDs from the AnnData object

gene_maps["Ens"] = adata.var["gene_ids"].values

Save the gene mapping DataFrame to a CSV file

gene_maps.to_csv(f'{data_path}/gene_maps.csv')

Extract the gene names for later use

gene_ids = gene_maps["Name"]

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gene identifier mapping #10

Gene identifier mapping #10

Souhatifour commented Dec 20, 2023 •

edited

Loading

Gene identifier mapping #10

Gene identifier mapping #10

Comments

Souhatifour commented Dec 20, 2023 • edited Loading

Create an empty DataFrame with columns for gene names and Ensembl IDs

Populate the "Name" column with gene names from the single cell AnnData object

Populate the "Ens" column with Ensembl IDs from the AnnData object

Save the gene mapping DataFrame to a CSV file

Extract the gene names for later use

Souhatifour commented Dec 20, 2023 •

edited

Loading