[BUG] Running SQANTI3_QC with chunks and no chunks produces differences in classification.txt #340
Labels
bug
Something isn't working
enhancement
New feature or request
QC
SQ3 Quality Control related issues
Is there an existing issue for this?
Have you loaded the SQANTI3.env conda environment?
Problem description
Reported by @dudududu12138 in #201 Whenever sqanti3_qc is run with the same input files, but using paralelization options (-n option with n >= 2), there are some differences in the resulting classification.txt. At least, the detected differences are:
When the --chunk option is used, transcriptome is divided into chunks and each chunk is processed independently, and then concatenated. Therefore there is a chance that trancripts are splited between chunks, and therefore, calculation that depend on the presence of all transcripts of the gene are different.
While using parallelization options, computations that rely on having all transcripts together should be performed after the chunks are finished and merged.
Code sample
Bug replicated using the example:
python3 sqanti3_qc.py -d test_no_chunk -o test_no_chunk example/UHR_chr22.gtf example/gencode.v38.basic_chr22.gtf example/GRCh38.p13_chr22.fasta
python3 sqanti3_qc.py -d test_chunk -o test_chunk example/UHR_chr22.gtf example/gencode.v38.basic_chr22.gtf example/GRCh38.p13_chr22.fasta -n 2
Error
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered: