[BUG] Running SQANTI3_QC with chunks and no chunks produces differences in classification.txt #340

Fabian-RY · 2024-10-08T08:56:43Z

Is there an existing issue for this?

I have searched the existing issues

Have you loaded the SQANTI3.env conda environment?

I have loaded the SQANTI3.env conda environment

Problem description

Reported by @dudududu12138 in #201 Whenever sqanti3_qc is run with the same input files, but using paralelization options (-n option with n >= 2), there are some differences in the resulting classification.txt. At least, the detected differences are:

For transcripts whose associated gene is a novelGene, novelGenes are numbered according to chunk and the number of novelGenes in that chunk. A future enhancement could unify the novelGenes numeration.
For FSM clases, there are different values for the same transcript, but a transcript can only be classified to one class. The classes (A, B and C) are explained in the wiki. The classification of one transcript is related to the presence or absence of transcripts from the same gene.
There are disagreement in RTS stage in some transcripts classified as NIC or NNC

When the --chunk option is used, transcriptome is divided into chunks and each chunk is processed independently, and then concatenated. Therefore there is a chance that trancripts are splited between chunks, and therefore, calculation that depend on the presence of all transcripts of the gene are different.

While using parallelization options, computations that rely on having all transcripts together should be performed after the chunks are finished and merged.

Code sample

Bug replicated using the example:
python3 sqanti3_qc.py -d test_no_chunk -o test_no_chunk example/UHR_chr22.gtf example/gencode.v38.basic_chr22.gtf example/GRCh38.p13_chr22.fasta
python3 sqanti3_qc.py -d test_chunk -o test_chunk example/UHR_chr22.gtf example/gencode.v38.basic_chr22.gtf example/GRCh38.p13_chr22.fasta -n 2

Error

No response

Anything else?

No response

Fabian-RY added bug Something isn't working enhancement New feature or request QC SQ3 Quality Control related issues labels Oct 8, 2024

Fabian-RY self-assigned this Oct 8, 2024

Fabian-RY mentioned this issue Oct 8, 2024

The _corrected.gtf.cds.gff is not generated when running SQANTI3 QC module #201

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Running SQANTI3_QC with chunks and no chunks produces differences in classification.txt #340

[BUG] Running SQANTI3_QC with chunks and no chunks produces differences in classification.txt #340

Fabian-RY commented Oct 8, 2024

[BUG] Running SQANTI3_QC with chunks and no chunks produces differences in classification.txt #340

[BUG] Running SQANTI3_QC with chunks and no chunks produces differences in classification.txt #340

Comments

Fabian-RY commented Oct 8, 2024

Is there an existing issue for this?

Have you loaded the SQANTI3.env conda environment?

Problem description

Code sample

Error

Anything else?