Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Running SQANTI3_QC with chunks and no chunks produces differences in classification.txt #340

Open
2 tasks done
Fabian-RY opened this issue Oct 8, 2024 · 0 comments
Assignees
Labels
bug Something isn't working enhancement New feature or request QC SQ3 Quality Control related issues

Comments

@Fabian-RY
Copy link
Collaborator

Is there an existing issue for this?

  • I have searched the existing issues

Have you loaded the SQANTI3.env conda environment?

  • I have loaded the SQANTI3.env conda environment

Problem description

Reported by @dudududu12138 in #201 Whenever sqanti3_qc is run with the same input files, but using paralelization options (-n option with n >= 2), there are some differences in the resulting classification.txt. At least, the detected differences are:

  • For transcripts whose associated gene is a novelGene, novelGenes are numbered according to chunk and the number of novelGenes in that chunk. A future enhancement could unify the novelGenes numeration.
  • For FSM clases, there are different values for the same transcript, but a transcript can only be classified to one class. The classes (A, B and C) are explained in the wiki. The classification of one transcript is related to the presence or absence of transcripts from the same gene.
  • There are disagreement in RTS stage in some transcripts classified as NIC or NNC

When the --chunk option is used, transcriptome is divided into chunks and each chunk is processed independently, and then concatenated. Therefore there is a chance that trancripts are splited between chunks, and therefore, calculation that depend on the presence of all transcripts of the gene are different.

While using parallelization options, computations that rely on having all transcripts together should be performed after the chunks are finished and merged.

Code sample

Bug replicated using the example:
python3 sqanti3_qc.py -d test_no_chunk -o test_no_chunk example/UHR_chr22.gtf example/gencode.v38.basic_chr22.gtf example/GRCh38.p13_chr22.fasta
python3 sqanti3_qc.py -d test_chunk -o test_chunk example/UHR_chr22.gtf example/gencode.v38.basic_chr22.gtf example/GRCh38.p13_chr22.fasta -n 2

Error

No response

Anything else?

No response

@Fabian-RY Fabian-RY added bug Something isn't working enhancement New feature or request QC SQ3 Quality Control related issues labels Oct 8, 2024
@Fabian-RY Fabian-RY self-assigned this Oct 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request QC SQ3 Quality Control related issues
Projects
None yet
Development

No branches or pull requests

1 participant