You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Create a test interval tree object that can be used to develop downstream processes without waiting for the actual interval tree implementation
implement a interval tree constructor which takes the n GTF and n fasta, and also the reference genome that was used to create these transriptomes
maybe the reference genome should be optional -- don't know what the landscape is like in terms of reference guided vs reference free methods for long read RNAseq
Create something like the current IsoformLibrary that takes the interval tree and the fasta files and can extract "clusters" and sequences (not sure if this will be useful or not, but i think it would be)
Write a method which classifies coordinate mismatches at the transcript level -- this will take some thinking to come up with classifications and definitions of those classifications. A single tx might have multiple labels, too
There are a lot of places we can reference for this -- the best i can think of is the gffCompare docs. They define these categories
A "identical transcript" (suitable for pairwise-alignment) should be defined something like as follows: a Transcript where every exon overlaps by a user defined amount (eg, 95%)
It is these identical transcripts where the sequence comparison should happen. BUT that sequence comparison should exclusively be over places where two exons overlap. There should never be a time that we are aligning across splice sites, for instance
figure out how to report all of this information -- there will likely be multiple outputs. This requires thinking about users and what they want
The text was updated successfully, but these errors were encountered:
Create a test interval tree object that can be used to develop downstream processes without waiting for the actual interval tree implementation
implement a interval tree constructor which takes the n GTF and n fasta, and also the reference genome that was used to create these transriptomes
Create something like the current IsoformLibrary that takes the interval tree and the fasta files and can extract "clusters" and sequences (not sure if this will be useful or not, but i think it would be)
Write a method which classifies coordinate mismatches at the transcript level -- this will take some thinking to come up with classifications and definitions of those classifications. A single tx might have multiple labels, too
A "identical transcript" (suitable for pairwise-alignment) should be defined something like as follows: a Transcript where every exon overlaps by a user defined amount (eg, 95%)
It is these identical transcripts where the sequence comparison should happen. BUT that sequence comparison should exclusively be over places where two exons overlap. There should never be a time that we are aligning across splice sites, for instance
figure out how to report all of this information -- there will likely be multiple outputs. This requires thinking about users and what they want
The text was updated successfully, but these errors were encountered: