Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sqanti3_rescue - tags transcript_id, gene_id are absent from the attribute field [BUG] #317

Open
2 tasks done
cathycoutu opened this issue Aug 2, 2024 · 1 comment
Open
2 tasks done
Labels
rescue SQ3 rescue-related issues

Comments

@cathycoutu
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Have you loaded the SQANTI3.env conda environment?

  • I have loaded the SQANTI3.env conda environment

Problem description

I'm annotating a plant genome using IsoSeq reads. The prior annotation was short read only, so this is a significant improvement.
I'm trying to run sqanti3_rescue, but it crashes before it finishes.
The error says that the transcript_id and gene_id tags are missing, but I've checked both input .gft files, and they are present:

Isoseq-filtered.filtered.gtf
Bna.4DH.A01 PacBio transcript 20335 21285 . + . transcript_id "PB.2.1"; gene_id "BnaA01g000050.4DH";
Bna.4DH.A01 PacBio exon 20335 21285 . + . transcript_id "PB.2.1"; gene_id "BnaA01g000050.4DH";
Bna.4DH.A01 PacBio transcript 21075 22955 . - . transcript_id "PB.4.1"; gene_id "BnaA01g000070.4DH_BnaA01g000060.4DH";
Bna.4DH.A01 PacBio exon 21075 21381 . - . transcript_id "PB.4.1"; gene_id "BnaA01g000070.4DH_BnaA01g000060.4DH";
Bna.4DH.A01 PacBio exon 21450 21674 . - . transcript_id "PB.4.1"; gene_id "BnaA01g000070.4DH_BnaA01g000060.4DH";
Bna.4DH.A01 PacBio exon 21770 21907 . - . transcript_id "PB.4.1"; gene_id "BnaA01g000070.4DH_BnaA01g000060.4DH";
Bna.4DH.A01 PacBio exon 21995 22257 . - . transcript_id "PB.4.1"; gene_id "BnaA01g000070.4DH_BnaA01g000060.4DH";
Bna.4DH.A01 PacBio exon 22343 22955 . - . transcript_id "PB.4.1"; gene_id "BnaA01g000070.4DH_BnaA01g000060.4DH";
Bna.4DH.A01 PacBio CDS 21277 21381 . - . transcript_id "PB.4.1"; gene_id "BnaA01g000070.4DH_BnaA01g000060.4DH";
Bna.4DH.A01 PacBio CDS 21450 21674 . - . transcript_id "PB.4.1"; gene_id "BnaA01g000070.4DH_BnaA01g000060.4DH";
Bna.4DH.A01 PacBio CDS 21770 21907 . - . transcript_id "PB.4.1"; gene_id "BnaA01g000070.4DH_BnaA01g000060.4DH";
Bna.4DH.A01 PacBio CDS 21995 22257 . - . transcript_id "PB.4.1"; gene_id "BnaA01g000070.4DH_BnaA01g000060.4DH";
Bna.4DH.A01 PacBio CDS 22343 22910 . - . transcript_id "PB.4.1"; gene_id "BnaA01g000070.4DH_BnaA01g000060.4DH";
...

B_napus.gtf
Bna.4DH.A01 AAFC_GIFS gene 3830 6473 . - . gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS mRNA 3830 6473 . - . transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS exon 6456 6473 . - . transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS exon 6025 6348 . - . transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS exon 5572 5703 . - . transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS exon 5090 5387 . - . transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS exon 4845 4977 . - . transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS exon 4569 4768 . - . transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS exon 3830 4185 . - . transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS CDS 6456 6473 . - 0 transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS CDS 6025 6348 . - 0 transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS CDS 5572 5703 . - 0 transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS CDS 5090 5387 . - 0 transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS CDS 4845 4977 . - 2 transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS CDS 4569 4768 . - 1 transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS CDS 3830 4185 . - 2 transcript_id "BnaA01g000010.4DH.1"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS mRNA 3830 4969 . - . transcript_id "BnaA01g000010.4DH.2"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS exon 4845 4969 . - . transcript_id "BnaA01g000010.4DH.2"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS exon 4569 4768 . - . transcript_id "BnaA01g000010.4DH.2"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS exon 3830 4185 . - . transcript_id "BnaA01g000010.4DH.2"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS CDS 4845 4969 . - 0 transcript_id "BnaA01g000010.4DH.2"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS CDS 4569 4768 . - 1 transcript_id "BnaA01g000010.4DH.2"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS CDS 3830 4185 . - 2 transcript_id "BnaA01g000010.4DH.2"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS mRNA 5992 6473 . - . transcript_id "BnaA01g000010.4DH.3"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS exon 6456 6473 . - . transcript_id "BnaA01g000010.4DH.3"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS exon 5992 6348 . - . transcript_id "BnaA01g000010.4DH.3"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS CDS 6456 6473 . - 0 transcript_id "BnaA01g000010.4DH.3"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
Bna.4DH.A01 AAFC_GIFS CDS 5992 6348 . - 0 transcript_id "BnaA01g000010.4DH.3"; gene_id "BnaA01g000010.4DH"; gene_name "BnaA01g000010.4DH"
...

Could you help me figure out what I'm doing wrong?
Note that I used the github sqanti conda environment, as the tarball was not the most recent version.

I've included all my sqanti3 analysis commands and screen printouts in Sqanti_run_Aug_2.txt, in case this is a problem caused by an earlier step.

Code sample

~/bin/SQANTI3/sqanti3_rescue.py ml --isoforms Isoseq_corrected.fasta --gtf Isoseq-filtered.filtered.gtf -g B_napus.gtf -f B_napus.fasta -k B_napus_classification.txt --mode full -e all -o Isoseq-rescued -r randomforest.RData -j 0.7 Isoseq-filtered_MLresult_classification.txt

Error


RETRIEVING RESCUE TARGETS...

 Rescue targets: validated LR or reference isoforms that could replace an artifact from the same gene.


 Retrieving target genes...


 Finding target isoforms from long read transcriptome...


 Finding target isoforms from reference transcriptome...

Error in check_tag_present(c(transcript_id, gene_id), tags, error = TRUE) :
Tags transcript_id, gene_id are absent from the attribute field.
Calls: -> tr2g_GRanges -> check_tag_present
Execution halted
Traceback (most recent call last):
File "/home/AGR.GC.CA/coutuc/bin/SQANTI3/sqanti3_rescue.py", line 660, in
main()
File "/home/AGR.GC.CA/coutuc/bin/SQANTI3/sqanti3_rescue.py", line 557, in main
auto_result = run_automatic_rescue(args)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/AGR.GC.CA/coutuc/bin/SQANTI3/sqanti3_rescue.py", line 59, in run_automatic_rescue
if subprocess.check_call(auto_cmd, shell = True) != 0:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/AGR.GC.CA/coutuc/.conda/envs/SQANTI3.env/lib/python3.11/subprocess.py", line 413, in check_call
raise CalledProcessError(retcode, cmd)

Anything else?

Sqanti_run_Aug_2.txt

@cathycoutu cathycoutu added the triage For developers to check label Aug 2, 2024
@cathycoutu
Copy link
Author

sqanti3_rescue.py works with the same command in --mode automatic, just not in --mode full.
Also, I've now tested this and the error is present when using ml filtering or rules filtering.

@carolinamonzo carolinamonzo added rescue SQ3 rescue-related issues and removed triage For developers to check labels Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rescue SQ3 rescue-related issues
Projects
None yet
Development

No branches or pull requests

2 participants