Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mitochondria (m.) assembly mapping #663

Open
davmlaw opened this issue Jul 5, 2023 · 7 comments
Open

mitochondria (m.) assembly mapping #663

davmlaw opened this issue Jul 5, 2023 · 7 comments
Labels
keep alive exempt issue from staleness checks

Comments

@davmlaw
Copy link
Contributor

davmlaw commented Jul 5, 2023

RefSeq does not have MT transcripts, but Ensembl does, eg ENST00000361381.2

There is currently no "m" (ie m_to_c) etc methods in AssemblyMapper, so you can't convert these easily.

Q. Should we add m_to_x and x_to_m methods in AssemblyMapper?

var_m = parser.parse("NC_012920.1:m.11038delA")
am.g_to_c(var_m, "ENST00000361381.2")
File /usr/local/lib/python3.10/dist-packages/hgvs/variantmapper.py:235, in VariantMapper.g_to_c(self, var_g, tx_ac, alt_aln_method)
    222 """Given a parsed g. variant, return a c. variant on the specified
    223 transcript using the specified alignment method (default is
    224 "splign" from NCBI).
   (...)
    231 
    232 """
    234 if not (var_g.type == "g"):
--> 235     raise HGVSInvalidVariantError("Expected a g. variant; got " + str(var_g))
    236 if self._validator:
    237     self._validator.validate(var_g)

HGVSInvalidVariantError: Expected a g. variant; got NC_012920.1:m.11038del

It's pretty trivial to get it to work, by switching the 'm' to a 'g' (this is probably how you'd implement it)

var_g = deepcopy(var_m)
var_g.type = 'g'
In [19]: print(am.g_to_c(var_g, "ENST00000361381.2"))
ENST00000361381.2:c.279del

I don't think you can reproduce this using UTA as it doesn't have Ensembl transcripts, but here's the cdot code:

import hgvs.assemblymapper

from copy import deepcopy
from hgvs.parser import Parser
from cdot.hgvs.dataproviders import JSONDataProvider, RESTDataProvider, FastaSeqFetcher

seqfetcher = FastaSeqFetcher("/data/annotation/fasta/GCF_000001405.39_GRCh38.p13_genomic.fna.gz")
hdp = JSONDataProvider(["/data/annotation/cdot/ensembl/GRCh38/cdot-0.2.17.ensembl.grch38.json.gz"], seqfetcher=seqfetcher)
am = hgvs.assemblymapper.AssemblyMapper(hdp, assembly_name='GRCh38', alt_aln_method='splign', replace_reference=True)

parser = Parser()

var_m = parser.parse("NC_012920.1:m.11038delA")
am.g_to_c(var_m, "ENST00000361381.2")
@AngieHinrichs
Copy link

There are a few codons that are translated differently by mitochondria: https://en.wikipedia.org/wiki/Human_mitochondrial_genetics#Genetic_code_variants so the wrapper would need a bit more code to deal with those.

@andreasprlic
Copy link
Member

HGVS uses the translate_cds method from bioutils. That one already supports alternative translation tables (eg. for selenoproteins). We need another translation table there for mitochondria. That would be similar to this. Then this needs to get enabled with the AltTranscriptData / AltSeqBuilder somehow.

About "Refseq does not have MT transcripts": Would this data be sufficient for us to offer "m_to_p"?

@andreasprlic
Copy link
Member

andreasprlic commented Sep 13, 2023

A comment on the initial request: the transcript model for mitochondria is prob a slightly different for mitochondria. I am not sure if we would express mitochondrial variants using c. nomenclature. I think people would refer to them as m. or p. variants. See some test variants that have been provided as part of biocommons/hackathon-2023#4 . We won't get to work on the mito ticket as part of the hackathon, but will follow up on this afterwards.

Copy link

github-actions bot commented Dec 2, 2023

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale Issue is stale and subject to automatic closing label Dec 2, 2023
Copy link

github-actions bot commented Dec 9, 2023

This issue was closed because it has been stalled for 7 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 9, 2023
@reece reece reopened this Feb 19, 2024
@reece reece added resurrected and removed stale Issue is stale and subject to automatic closing labels Feb 19, 2024
@reece
Copy link
Member

reece commented Feb 19, 2024

This issue was closed by stalebot. It has been reopened to give more time for community review. See biocommons coding guidelines for stale issue and pull request policies. This resurrection is expected to be a one-time event.

Copy link

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the stale Issue is stale and subject to automatic closing label May 20, 2024
@jsstevenson jsstevenson added keep alive exempt issue from staleness checks and removed stale Issue is stale and subject to automatic closing resurrected labels May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keep alive exempt issue from staleness checks
Projects
None yet
Development

No branches or pull requests

5 participants