sqanti3_rescue - tags transcript_id, gene_id are absent from the attribute field [BUG] #317
Open
2 tasks done
Labels
rescue
SQ3 rescue-related issues
Is there an existing issue for this?
Have you loaded the SQANTI3.env conda environment?
Problem description
I'm annotating a plant genome using IsoSeq reads. The prior annotation was short read only, so this is a significant improvement.
I'm trying to run sqanti3_rescue, but it crashes before it finishes.
The error says that the transcript_id and gene_id tags are missing, but I've checked both input .gft files, and they are present:
Isoseq-filtered.filtered.gtf
Bna.4DH.A01 PacBio transcript 20335 21285 . + . transcript_id "PB.2.1"; gene_id "BnaA01g000050.4DH";
Bna.4DH.A01 PacBio exon 20335 21285 . + . transcript_id "PB.2.1"; gene_id "BnaA01g000050.4DH";
Bna.4DH.A01 PacBio transcript 21075 22955 . - . transcript_id "PB.4.1"; gene_id "BnaA01g000070.4DH_BnaA01g000060.4DH";
Bna.4DH.A01 PacBio exon 21075 21381 . - . transcript_id "PB.4.1"; gene_id "BnaA01g000070.4DH_BnaA01g000060.4DH";
Bna.4DH.A01 PacBio exon 21450 21674 . - . transcript_id "PB.4.1"; gene_id "BnaA01g000070.4DH_BnaA01g000060.4DH";
Bna.4DH.A01 PacBio exon 21770 21907 . - . transcript_id "PB.4.1"; gene_id "BnaA01g000070.4DH_BnaA01g000060.4DH";
Bna.4DH.A01 PacBio exon 21995 22257 . - . transcript_id "PB.4.1"; gene_id "BnaA01g000070.4DH_BnaA01g000060.4DH";
Bna.4DH.A01 PacBio exon 22343 22955 . - . transcript_id "PB.4.1"; gene_id "BnaA01g000070.4DH_BnaA01g000060.4DH";
Bna.4DH.A01 PacBio CDS 21277 21381 . - . transcript_id "PB.4.1"; gene_id "BnaA01g000070.4DH_BnaA01g000060.4DH";
Bna.4DH.A01 PacBio CDS 21450 21674 . - . transcript_id "PB.4.1"; gene_id "BnaA01g000070.4DH_BnaA01g000060.4DH";
Bna.4DH.A01 PacBio CDS 21770 21907 . - . transcript_id "PB.4.1"; gene_id "BnaA01g000070.4DH_BnaA01g000060.4DH";
Bna.4DH.A01 PacBio CDS 21995 22257 . - . transcript_id "PB.4.1"; gene_id "BnaA01g000070.4DH_BnaA01g000060.4DH";
Bna.4DH.A01 PacBio CDS 22343 22910 . - . transcript_id "PB.4.1"; gene_id "BnaA01g000070.4DH_BnaA01g000060.4DH";
...
B_napus.gtf
Bna.4DH.A01 AAFC_GIFS gene 3830 6473 . - . gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS mRNA 3830 6473 . - . transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS exon 6456 6473 . - . transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS exon 6025 6348 . - . transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS exon 5572 5703 . - . transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS exon 5090 5387 . - . transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS exon 4845 4977 . - . transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS exon 4569 4768 . - . transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS exon 3830 4185 . - . transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS CDS 6456 6473 . - 0 transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS CDS 6025 6348 . - 0 transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS CDS 5572 5703 . - 0 transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS CDS 5090 5387 . - 0 transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS CDS 4845 4977 . - 2 transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS CDS 4569 4768 . - 1 transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS CDS 3830 4185 . - 2 transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS mRNA 3830 4969 . - . transcript_id "BnaA01g000010.4DH.2"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS exon 4845 4969 . - . transcript_id "BnaA01g000010.4DH.2"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS exon 4569 4768 . - . transcript_id "BnaA01g000010.4DH.2"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS exon 3830 4185 . - . transcript_id "BnaA01g000010.4DH.2"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS CDS 4845 4969 . - 0 transcript_id "BnaA01g000010.4DH.2"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS CDS 4569 4768 . - 1 transcript_id "BnaA01g000010.4DH.2"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS CDS 3830 4185 . - 2 transcript_id "BnaA01g000010.4DH.2"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS mRNA 5992 6473 . - . transcript_id "BnaA01g000010.4DH.3"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS exon 6456 6473 . - . transcript_id "BnaA01g000010.4DH.3"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS exon 5992 6348 . - . transcript_id "BnaA01g000010.4DH.3"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS CDS 6456 6473 . - 0 transcript_id "BnaA01g000010.4DH.3"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS CDS 5992 6348 . - 0 transcript_id "BnaA01g000010.4DH.3"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
...
Could you help me figure out what I'm doing wrong?
Note that I used the github sqanti conda environment, as the tarball was not the most recent version.
I've included all my sqanti3 analysis commands and screen printouts in Sqanti_run_Aug_2.txt, in case this is a problem caused by an earlier step.
Code sample
~/bin/SQANTI3/sqanti3_rescue.py ml --isoforms Isoseq_corrected.fasta --gtf Isoseq-filtered.filtered.gtf -g B_napus.gtf -f B_napus.fasta -k B_napus_classification.txt --mode full -e all -o Isoseq-rescued -r randomforest.RData -j 0.7 Isoseq-filtered_MLresult_classification.txt
Error
Error in check_tag_present(c(transcript_id, gene_id), tags, error = TRUE) :
Tags transcript_id, gene_id are absent from the attribute field.
Calls: -> tr2g_GRanges -> check_tag_present
Execution halted
Traceback (most recent call last):
File "/home/AGR.GC.CA/coutuc/bin/SQANTI3/sqanti3_rescue.py", line 660, in
main()
File "/home/AGR.GC.CA/coutuc/bin/SQANTI3/sqanti3_rescue.py", line 557, in main
auto_result = run_automatic_rescue(args)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/AGR.GC.CA/coutuc/bin/SQANTI3/sqanti3_rescue.py", line 59, in run_automatic_rescue
if subprocess.check_call(auto_cmd, shell = True) != 0:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/AGR.GC.CA/coutuc/.conda/envs/SQANTI3.env/lib/python3.11/subprocess.py", line 413, in check_call
raise CalledProcessError(retcode, cmd)
Anything else?
Sqanti_run_Aug_2.txt
The text was updated successfully, but these errors were encountered: