Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG -4294967296 #14

Open
sguizard opened this issue Jun 22, 2022 · 8 comments
Open

BUG -4294967296 #14

sguizard opened this issue Jun 22, 2022 · 8 comments

Comments

@sguizard
Copy link

sguizard commented Jun 22, 2022

Hi Kristoffer,

During nf-core/isoseq test jobs, I randomly have a bug:

total_flanks2: 20352
total_flank_size 20395551
total_unique_segment_counter 7303161
total_segments_bad 5517888
bad 89736
total parts size: 7391334
total exons size: 20040918
min_intron: 1
Number of ref seqs in gff: 15570
Number of ref seqs in fasta: 1
6622546.0 Unique kmers in reference part sequences with abundance > 1
AAAAAAAAAAAAAAAAAAAA 538
TTTTTTTTTTTTTTTTTTTT 496
CCTCCCAAAGTGCTGGGATT 385
GCCTCCCAAAGTGCTGGGAT 382
CTCCCAAAGTGCTGGGATTA 376
CCAAAGTGCTGGGATTACAG 368
CCCAAAGTGCTGGGATTACA 367
TCCCAAAGTGCTGGGATTAC 366
TAATCCCAGCACTTTGGGAG 360
AATCCCAGCACTTTGGGAGG 359
CAAAGTGCTGGGATTACAGG 357
ATCCCAGCACTTTGGGAGGC 350
GTAATCCCAGCACTTTGGGA 339
CTGTAATCCCAGCACTTTGG 338
TGTAATCCCAGCACTTTGGG 333
CCTGTAATCCCAGCACTTTG 330
TCTCTACTAAAAATACAAAA 317
TTTTGTATTTTTAGTAGAGA 316
ATTCTCCTGCCTCAGCCTCC 289
AAAGTGCTGGGATTACAGGC 282
TTTTTGTATTTTTAGTAGAG 281
[...]
CCAGGAGGTGGAGGTTGCAG 200
GCCTGACCAACATGGTGAAA 200
11038 11038 out of 20352 sequences has been modified in masking step.
Environment set:
Using 2 cores.
Filtering reads aligned to unindexed regions with minimap2
Done filtering. Reads filtered:21
batch nt: 97383 total_nt: 194765
37
36
Using SLAMEM
Using SLAMEM
Using SLAMEM
Using SLAMEM
Time for slaMEM to find mems:76.54179167747498 seconds.
Starting aligning reads.
Nr reads: 73 nr batches: 2 [37, 36]
BUG -4294967296

Here is the command used:

uLTRA \
    pipeline \
    --t 2 \
    --prefix alz.chunk3 \
    --isoseq --disable_infer \
    Homo_sapiens.GRCh38.dna.chromosome.19.fasta \
    Homo_sapiens.GRCh38.104.chr.13_18_19.gtf \
    alz.chunk3_tama.fa \
    ./

Re-running is enough to get the job done.
I never noticed this when I ran it locally.

@ksahlin
Copy link
Owner

ksahlin commented Jul 5, 2022

Do you have the possibility to share/link to your test data so that I can see if I can reproduce it?

@sguizard
Copy link
Author

sguizard commented Jul 6, 2022

I recreated the alz.chunk3_tama.fa fasta file.
You'll find everything here.

@ksahlin
Copy link
Owner

ksahlin commented Jul 6, 2022

Thank you! downloaded the data and ran with your command on my MacBook. I could not reproduce the bug.

I ran both with head of master and head of develop (commit c0ac67c and commit 8313e9b, respectively)

This is my call:

./uLTRA     pipeline     --t 2     --prefix alz.chunk3     --isoseq --disable_infer     ~/Downloads/ultra_bug/Homo_sapiens.GRCh38.dna.chromosome.19.fasta     ~/Downloads/ultra_bug/Homo_sapiens.GRCh38.104.chr.13_18_19.gtf     ~/Downloads/ultra_bug/alz.chunk3_tama.fa     ~/Downloads/ultra_bug/ultra_out

This is my output log of uLTRA:

creating /Users/kxs624/Downloads/ultra_bug/ultra_out
total_flanks2: 20352
total_flank_size 20395551
total_unique_segment_counter 7303161
total_segments_bad 5517888
bad 89736
total parts size: 7391334
total exons size: 20040918
min_intron: 1
Number of ref seqs in gff: 15570
Number of ref seqs in fasta: 1
6622546.0 Unique kmers in reference part sequences with abundance > 1
AAAAAAAAAAAAAAAAAAAA 538
TTTTTTTTTTTTTTTTTTTT 496
CCTCCCAAAGTGCTGGGATT 385
GCCTCCCAAAGTGCTGGGAT 382
[...]
ACGGTGAAACCCCGTCTCTA 200
GAGTGCAGTGGCGTGATCTC 200
AAAAAATTAGCCGGGCGTGG 200
CCAAAGTGCTAGGATTACAG 200
CACACCCAGCTAATTTTTGT 200
GATTGCGCCACTGCACTCCA 200
CCAGGAGGTGGAGGTTGCAG 200
11038 11038 out of 20352 sequences has been modified in masking step.
<multiprocessing.context.SpawnContext object at 0x7fc20382c820>
Environment set: <multiprocessing.context.SpawnContext object at 0x7fc20382c820>
Using 2 cores.
Filtering reads aligned to unindexed regions with minimap2 
Done filtering. Reads filtered:21
batch nt: 97383 total_nt: 194765
37
36
Using SLAMEM
Using SLAMEM
Using SLAMEM
Using SLAMEM
Time for slaMEM to find mems:45.91997790336609 seconds.
Starting aligning reads.
Nr reads: 73 nr batches: 2 [37, 36]
READ 37 RECORDS.
READ 37 RECORDS.
Number of instances solved with quadratic collinear chainer solution: 84
Number of instances solved with n*log n collinear chainer solution: 15
READ 36 RECORDS.
READ 36 RECORDS.
Number of instances solved with quadratic collinear chainer solution: 103
Number of instances solved with n*log n collinear chainer solution: 8
Time elapesd multiprocessing: 1.5507452487945557
Time to align reads:1.5510809421539307 seconds.
Time to merge SAM-files:0.00915384292602539 seconds.
21 reads were not attempted to be aligned with ultra, instead alternative aligner was used.
6 reads with primary alignments were replaced with alternative aligner because they were unaligned with uLTRA.
1 primary alignments had best fit with uLTRA.
4 primary alignments had equal fit.
3 primary alignments had best fit with alternative aligner.
Table of score-difference between alignment methods (negative number: alternative aligner better fit, positive number is uLTRA better fit)
Score is calculated as sum(identities) - sum(ins, del, subs)
Format: Score difference: Number of primary alignments 
[-4294967296 - -100): 2
[-100 - -50): 0
[-50 - -20): 1
[-20 - -10): 0
[-10 - -5): 0
[-5 - -4): 0
[-4 - -3): 0
[-3 - -2): 0
[-2 - -1): 0
[-1 - 0): 0
[0 - 1): 4
[1 - 2): 0
[2 - 3): 0
[3 - 4): 0
[4 - 5): 0
[5 - 10): 0
[10 - 20): 0
[20 - 50): 0
[50 - 100): 0
[100 - 4294967296): 1
defaultdict(<class 'int'>, {'unaligned': 65, 'FSM': 4, 'NO_SPLICE': 2, 'Insufficient_junction_coverage_unclassified': 1, 'NIC_novel': 1})
total alignment coverage: 7.350311311286634
Deleting temporary files...
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/seeds_batch_0_rc.txt
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/seeds_batch_1.txt
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/seeds_batch_0.txt
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/seeds_batch_1_rc.txt
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/slamem_stdout_1.1
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/slamem_stderr_0_rc.1
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/slamem_stderr_0.1
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/slamem_stderr_1_rc.1
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/slamem_stdout_0_rc.1
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/slamem_stderr_1.1
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/slamem_stdout_1_rc.1
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/slamem_stdout_0.1
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/reads_batch_0.fa
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/reads_batch_1_rc.fa
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/reads_batch_1.fa
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/reads_batch_0.sam
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/reads_batch_1.sam
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/reads_batch_0_rc.fa
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/minimap2.sam
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/minimap2_errors.1
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/uLTRA_batch_0.stderr
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/uLTRA_batch_1.stderr
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/reads_after_genomic_filtering.fasta
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/indexed.sam
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/unindexed.sam
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/refs_sequences.fa
Done.

@ksahlin
Copy link
Owner

ksahlin commented Jul 6, 2022

My version of python (that executes uLTRA) is Python 3.8.2.

@sguizard
Copy link
Author

sguizard commented Jul 6, 2022

I'm not surprised, it seems to be very random :/
On the 22 To of isoseq a analyzed, I noticed some failing uLTRA jobs, but each time they succeeded after resubmit.
uLTRA is install packed with python 3.9.13.

@ksahlin
Copy link
Owner

ksahlin commented Jul 6, 2022

ouch..

I wonder if it is because of slaMEM that uLTRA fails (as this is a package I have no control/understanding about). If --use_NAM_seeds option is used, which calls StrobeMap instead, I would be the developer of all the steps in uLTRA, but --use_NAM_seeds is slightly lower in accuracy than the default. Still debating if it is worth the tradeoff (would probablyneed eval on more datasets to make this decision).

@sguizard
Copy link
Author

sguizard commented Jul 6, 2022

From my point of view, it's not a big deal.
It's a very small number of failing job, 0, 1, 2, 3 on 300 jobs. And as I said, they succeed at the second try.
The only annoying point (in nf-core/isoseq context) is that the job do not stop and is killed by the walltime.
It won't stop me to use it :)

@ksahlin
Copy link
Owner

ksahlin commented Jul 7, 2022

I see :) let’s see if this happens for others and I'll revisit then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants