Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on CIGAR strings in UTA #266

Open
budsonjelmont opened this issue Oct 21, 2024 · 0 comments
Open

Question on CIGAR strings in UTA #266

budsonjelmont opened this issue Oct 21, 2024 · 0 comments

Comments

@budsonjelmont
Copy link

Lately I've been trying to understand how to interpret CIGAR strings in UTA and running into some confusion. This might just be due to some incorrect assumptions about CIGAR, but any advice is appreciated.

Here I have a query to UTA for an alignment that contains a 3bp deletion:

uta=> select cigar, tx_ac, alt_ac, ord, (tx_end_i - tx_start_i) as tx_ex_len, (alt_end_i - alt_start_i) as alt_ex_len  
from tx_exon_aln_v where tx_ac = 'NM_001256326.1' and cigar !~ '^[0-9]+=$' and alt_ac = 'NC_000017.10' order by ord;
   cigar   |     tx_ac      |    alt_ac    | ord | tx_ex_len | alt_ex_len 
-----------+----------------+--------------+-----+-----------+------------
 1453=3D2= | NM_001256326.1 | NC_000017.10 |  35 |      1458 |       1455

I've been assuming that this alignment means that there is a deletion of 3 bases in the transcript relative to to the genome (i.e. transcript is the "query", genome is the "reference"). However based on the tx_ex_len and alt_ex_len columns computed in that query, it seems I have this backwards: there are 1455 bases in the aligned region of the genome, and 1455+3 bases in the transcript's aligned region.

So in UTA's transcript-genome alignments, is the genome considered the "query" sequence and the transcript the "reference"? Meaning that, when reading CIGAR strings that are describing indels, should I be assuming that a deletion event means a deletion of bases from the genome that ARE present in the transcript (and vice versa for insertions)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant