Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly tag selenoproteins #66

Closed
holtgrewe opened this issue Nov 10, 2023 · 2 comments
Closed

Properly tag selenoproteins #66

holtgrewe opened this issue Nov 10, 2023 · 2 comments

Comments

@holtgrewe
Copy link
Contributor

NCBI denotes recoding of UGA to selenocysteine by a Note=UGA stop codon recoded as selenocysteine.

This could be interpreted and a new tag. Overall, we should look for matches of the following in the Note tag:

UGA stop codon recoded as selenocysteine
UGA stop codons recoded as selenocysteine

Example lines

# "codon"
NC_000001.10    BestRefSeq      CDS     54375604        54375672        .       +       0       ID=cds-NP_998758.1;Parent=rna-NM_213593.5;Dbxref=CCDS:CCDS53320.1,GeneID:1733,Genbank:NP_998758.1,HGNC:HGNC:2883,MIM:147892;Name=NP_998758.1;Note=UGA stop codon recoded as selenocysteine%3B isoform b is encoded by transcript variant 2;gbkey=CDS;gene=DIO1;product=type I iodothyronine deiodinase isoform b;protein_id=NP_998758.1;transl_except=(pos:54370377..54370379%2Caa:Sec)

# "codons"
NC_000005.9     BestRefSeq      CDS     42804758        42804875        .       -       1       ID=cds-NP_005401.3;Parent=rna-NM_005410.4;Dbxref=CCDS:CCDS43311.1,GeneID:6414,Genbank:NP_005401.3,HGNC:HGNC:10751,MIM:601484;Name=NP_005401.3;Note=UGA stop codons recoded as selenocysteine%3B isoform 1 precursor is encoded by transcript variant 1;gbkey=CDS;gene=SELENOP;product=selenoprotein P isoform 1 precursor;protein_id=NP_005401.3;tag=RefSeq Select;transl_except=(pos:complement(42808279..42808281)%2Caa:Sec),(pos:complement(42801068..42801070)%2Caa:Sec),(pos:complement(42801014..42801016)%2Caa:Sec),(pos:complement(42800978..42800980)%2Caa:Sec),(pos:complement(42800933..42800935)%2Caa:Sec),(pos:complement(42800912..42800914)%2Caa:Sec),(pos:complement(42800867..42800869)%2Caa:Sec),(pos:complement(42800861..42800863)%2Caa:Sec),(pos:complement(42800840..42800842)%2Caa:Sec),(pos:complement(42800834..42800836)%2Caa:Sec)

Also see here:

@davmlaw
Copy link
Contributor

davmlaw commented Nov 13, 2023

I've started collecting the "Note" fields as "note"

The HGVS change will probably take a few years, they need to add a column to UTA SQL schema, change their scripts to capture this from RefSeq, change the data provider API to include this, then have the HGVS client handle it.

But at least our data will have it for when HGVS can use it

@davmlaw davmlaw closed this as completed Nov 13, 2023
@holtgrewe
Copy link
Contributor Author

Excellen, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants