Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Negative 3'UTR lengths #93

Open
Apistogramma-2 opened this issue Aug 13, 2024 · 2 comments
Open

Negative 3'UTR lengths #93

Apistogramma-2 opened this issue Aug 13, 2024 · 2 comments
Labels
question Further information is requested

Comments

@Apistogramma-2
Copy link

Hello,

I was applying your pipeline to some sc-data from the allen brain atlas and i got a handful of negative values for the 3'UTR length in my sce object. I was using the utrome_mm10_v2. One example would be the 3'UTR-isoform ENSMUST00000135807.1-UTR-4987
with a length of -2595 nbs. All the others look great but i do not know how to interpret these ones. Hope you can help me. Thanks in advance!

Best,
Janus

@mfansler mfansler added the question Further information is requested label Aug 13, 2024
@mfansler
Copy link
Collaborator

mfansler commented Aug 13, 2024

Hi Janus,

Thanks for the interest! Indeed this is obviously not intuitive. 🤔

These should correspond to novel cleavage sites that were identified upstream of any GENCODE-annotated 3' ends. Since we did not have full transcript sequencing, we were reluctant to make any assertion about what the full splice isoform was - only that cleavage is happening there. As such, we didn't have a proper STOP codon identified for these, and so this isn't really a "3'UTR length". Rather, the negative value should reflect the (genomic) distance to the annotated STOP codon of the reference transcript to which it references.

So, concretely we have that "ENSMUST00000135807.1-UTR-4987" is a cleavage site found 4,987 nts upstream of the cleavage site of the Ensembl mouse transcript ENSMUST00000135807.1. This cleavage site is 2595 nts upstream of the original STOP codon of ENSMUST00000135807.1.

This is admittedly not so informative and so we provided a column in the published annotations to flag this, namely is_improper_utr_length in the annotations (e.g., Mouse annotation Supplemental Data 4).

I think it would be a nice enhancement to our annotation if someone would identify where for such transcripts their STOP codon lies so we could associate a proper 3'UTR length with them.

Hope that clarifies things!

@Apistogramma-2
Copy link
Author

Thanks for the fast reply and the clarification. That explains it to me :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants