You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Lately I've been trying to understand how to interpret CIGAR strings in UTA and running into some confusion. This might just be due to some incorrect assumptions about CIGAR, but any advice is appreciated.
Here I have a query to UTA for an alignment that contains a 3bp deletion:
uta=> select cigar, tx_ac, alt_ac, ord, (tx_end_i - tx_start_i) as tx_ex_len, (alt_end_i - alt_start_i) as alt_ex_len
from tx_exon_aln_v where tx_ac = 'NM_001256326.1' and cigar !~ '^[0-9]+=$' and alt_ac = 'NC_000017.10' order by ord;
cigar | tx_ac | alt_ac | ord | tx_ex_len | alt_ex_len
-----------+----------------+--------------+-----+-----------+------------
1453=3D2= | NM_001256326.1 | NC_000017.10 | 35 | 1458 | 1455
I've been assuming that this alignment means that there is a deletion of 3 bases in the transcript relative to to the genome (i.e. transcript is the "query", genome is the "reference"). However based on the tx_ex_len and alt_ex_len columns computed in that query, it seems I have this backwards: there are 1455 bases in the aligned region of the genome, and 1455+3 bases in the transcript's aligned region.
So in UTA's transcript-genome alignments, is the genome considered the "query" sequence and the transcript the "reference"? Meaning that, when reading CIGAR strings that are describing indels, should I be assuming that a deletion event means a deletion of bases from the genome that ARE present in the transcript (and vice versa for insertions)?
The text was updated successfully, but these errors were encountered:
Lately I've been trying to understand how to interpret CIGAR strings in UTA and running into some confusion. This might just be due to some incorrect assumptions about CIGAR, but any advice is appreciated.
Here I have a query to UTA for an alignment that contains a 3bp deletion:
I've been assuming that this alignment means that there is a deletion of 3 bases in the transcript relative to to the genome (i.e. transcript is the "query", genome is the "reference"). However based on the tx_ex_len and alt_ex_len columns computed in that query, it seems I have this backwards: there are 1455 bases in the aligned region of the genome, and 1455+3 bases in the transcript's aligned region.
So in UTA's transcript-genome alignments, is the genome considered the "query" sequence and the transcript the "reference"? Meaning that, when reading CIGAR strings that are describing indels, should I be assuming that a deletion event means a deletion of bases from the genome that ARE present in the transcript (and vice versa for insertions)?
The text was updated successfully, but these errors were encountered: