BUG -4294967296 #14

sguizard · 2022-06-22T13:36:20Z

Hi Kristoffer,

During nf-core/isoseq test jobs, I randomly have a bug:

total_flanks2: 20352
total_flank_size 20395551
total_unique_segment_counter 7303161
total_segments_bad 5517888
bad 89736
total parts size: 7391334
total exons size: 20040918
min_intron: 1
Number of ref seqs in gff: 15570
Number of ref seqs in fasta: 1
6622546.0 Unique kmers in reference part sequences with abundance > 1
AAAAAAAAAAAAAAAAAAAA 538
TTTTTTTTTTTTTTTTTTTT 496
CCTCCCAAAGTGCTGGGATT 385
GCCTCCCAAAGTGCTGGGAT 382
CTCCCAAAGTGCTGGGATTA 376
CCAAAGTGCTGGGATTACAG 368
CCCAAAGTGCTGGGATTACA 367
TCCCAAAGTGCTGGGATTAC 366
TAATCCCAGCACTTTGGGAG 360
AATCCCAGCACTTTGGGAGG 359
CAAAGTGCTGGGATTACAGG 357
ATCCCAGCACTTTGGGAGGC 350
GTAATCCCAGCACTTTGGGA 339
CTGTAATCCCAGCACTTTGG 338
TGTAATCCCAGCACTTTGGG 333
CCTGTAATCCCAGCACTTTG 330
TCTCTACTAAAAATACAAAA 317
TTTTGTATTTTTAGTAGAGA 316
ATTCTCCTGCCTCAGCCTCC 289
AAAGTGCTGGGATTACAGGC 282
TTTTTGTATTTTTAGTAGAG 281
[...]
CCAGGAGGTGGAGGTTGCAG 200
GCCTGACCAACATGGTGAAA 200
11038 11038 out of 20352 sequences has been modified in masking step.
Environment set:
Using 2 cores.
Filtering reads aligned to unindexed regions with minimap2
Done filtering. Reads filtered:21
batch nt: 97383 total_nt: 194765
37
36
Using SLAMEM
Using SLAMEM
Using SLAMEM
Using SLAMEM
Time for slaMEM to find mems:76.54179167747498 seconds.
Starting aligning reads.
Nr reads: 73 nr batches: 2 [37, 36]
BUG -4294967296

Here is the command used:

uLTRA \
    pipeline \
    --t 2 \
    --prefix alz.chunk3 \
    --isoseq --disable_infer \
    Homo_sapiens.GRCh38.dna.chromosome.19.fasta \
    Homo_sapiens.GRCh38.104.chr.13_18_19.gtf \
    alz.chunk3_tama.fa \
    ./

Re-running is enough to get the job done.
I never noticed this when I ran it locally.

The text was updated successfully, but these errors were encountered:

ksahlin · 2022-07-05T14:55:06Z

Do you have the possibility to share/link to your test data so that I can see if I can reproduce it?

sguizard · 2022-07-06T09:30:15Z

I recreated the alz.chunk3_tama.fa fasta file.
You'll find everything here.

ksahlin · 2022-07-06T13:31:14Z

Thank you! downloaded the data and ran with your command on my MacBook. I could not reproduce the bug.

I ran both with head of master and head of develop (commit c0ac67c and commit 8313e9b, respectively)

This is my call:

./uLTRA     pipeline     --t 2     --prefix alz.chunk3     --isoseq --disable_infer     ~/Downloads/ultra_bug/Homo_sapiens.GRCh38.dna.chromosome.19.fasta     ~/Downloads/ultra_bug/Homo_sapiens.GRCh38.104.chr.13_18_19.gtf     ~/Downloads/ultra_bug/alz.chunk3_tama.fa     ~/Downloads/ultra_bug/ultra_out

This is my output log of uLTRA:

creating /Users/kxs624/Downloads/ultra_bug/ultra_out
total_flanks2: 20352
total_flank_size 20395551
total_unique_segment_counter 7303161
total_segments_bad 5517888
bad 89736
total parts size: 7391334
total exons size: 20040918
min_intron: 1
Number of ref seqs in gff: 15570
Number of ref seqs in fasta: 1
6622546.0 Unique kmers in reference part sequences with abundance > 1
AAAAAAAAAAAAAAAAAAAA 538
TTTTTTTTTTTTTTTTTTTT 496
CCTCCCAAAGTGCTGGGATT 385
GCCTCCCAAAGTGCTGGGAT 382
[...]
ACGGTGAAACCCCGTCTCTA 200
GAGTGCAGTGGCGTGATCTC 200
AAAAAATTAGCCGGGCGTGG 200
CCAAAGTGCTAGGATTACAG 200
CACACCCAGCTAATTTTTGT 200
GATTGCGCCACTGCACTCCA 200
CCAGGAGGTGGAGGTTGCAG 200
11038 11038 out of 20352 sequences has been modified in masking step.
<multiprocessing.context.SpawnContext object at 0x7fc20382c820>
Environment set: <multiprocessing.context.SpawnContext object at 0x7fc20382c820>
Using 2 cores.
Filtering reads aligned to unindexed regions with minimap2 
Done filtering. Reads filtered:21
batch nt: 97383 total_nt: 194765
37
36
Using SLAMEM
Using SLAMEM
Using SLAMEM
Using SLAMEM
Time for slaMEM to find mems:45.91997790336609 seconds.
Starting aligning reads.
Nr reads: 73 nr batches: 2 [37, 36]
READ 37 RECORDS.
READ 37 RECORDS.
Number of instances solved with quadratic collinear chainer solution: 84
Number of instances solved with n*log n collinear chainer solution: 15
READ 36 RECORDS.
READ 36 RECORDS.
Number of instances solved with quadratic collinear chainer solution: 103
Number of instances solved with n*log n collinear chainer solution: 8
Time elapesd multiprocessing: 1.5507452487945557
Time to align reads:1.5510809421539307 seconds.
Time to merge SAM-files:0.00915384292602539 seconds.
21 reads were not attempted to be aligned with ultra, instead alternative aligner was used.
6 reads with primary alignments were replaced with alternative aligner because they were unaligned with uLTRA.
1 primary alignments had best fit with uLTRA.
4 primary alignments had equal fit.
3 primary alignments had best fit with alternative aligner.
Table of score-difference between alignment methods (negative number: alternative aligner better fit, positive number is uLTRA better fit)
Score is calculated as sum(identities) - sum(ins, del, subs)
Format: Score difference: Number of primary alignments 
[-4294967296 - -100): 2
[-100 - -50): 0
[-50 - -20): 1
[-20 - -10): 0
[-10 - -5): 0
[-5 - -4): 0
[-4 - -3): 0
[-3 - -2): 0
[-2 - -1): 0
[-1 - 0): 0
[0 - 1): 4
[1 - 2): 0
[2 - 3): 0
[3 - 4): 0
[4 - 5): 0
[5 - 10): 0
[10 - 20): 0
[20 - 50): 0
[50 - 100): 0
[100 - 4294967296): 1
defaultdict(<class 'int'>, {'unaligned': 65, 'FSM': 4, 'NO_SPLICE': 2, 'Insufficient_junction_coverage_unclassified': 1, 'NIC_novel': 1})
total alignment coverage: 7.350311311286634
Deleting temporary files...
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/seeds_batch_0_rc.txt
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/seeds_batch_1.txt
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/seeds_batch_0.txt
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/seeds_batch_1_rc.txt
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/slamem_stdout_1.1
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/slamem_stderr_0_rc.1
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/slamem_stderr_0.1
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/slamem_stderr_1_rc.1
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/slamem_stdout_0_rc.1
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/slamem_stderr_1.1
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/slamem_stdout_1_rc.1
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/slamem_stdout_0.1
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/reads_batch_0.fa
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/reads_batch_1_rc.fa
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/reads_batch_1.fa
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/reads_batch_0.sam
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/reads_batch_1.sam
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/reads_batch_0_rc.fa
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/minimap2.sam
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/minimap2_errors.1
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/uLTRA_batch_0.stderr
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/uLTRA_batch_1.stderr
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/reads_after_genomic_filtering.fasta
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/indexed.sam
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/unindexed.sam
removed: /Users/kxs624/Downloads/ultra_bug/ultra_out/refs_sequences.fa
Done.

ksahlin · 2022-07-06T13:32:55Z

My version of python (that executes uLTRA) is Python 3.8.2.

sguizard · 2022-07-06T13:41:10Z

I'm not surprised, it seems to be very random :/
On the 22 To of isoseq a analyzed, I noticed some failing uLTRA jobs, but each time they succeeded after resubmit.
uLTRA is install packed with python 3.9.13.

ksahlin · 2022-07-06T13:53:47Z

ouch..

I wonder if it is because of slaMEM that uLTRA fails (as this is a package I have no control/understanding about). If --use_NAM_seeds option is used, which calls StrobeMap instead, I would be the developer of all the steps in uLTRA, but --use_NAM_seeds is slightly lower in accuracy than the default. Still debating if it is worth the tradeoff (would probablyneed eval on more datasets to make this decision).

sguizard · 2022-07-06T14:06:24Z

From my point of view, it's not a big deal.
It's a very small number of failing job, 0, 1, 2, 3 on 300 jobs. And as I said, they succeed at the second try.
The only annoying point (in nf-core/isoseq context) is that the job do not stop and is killed by the walltime.
It won't stop me to use it :)

ksahlin · 2022-07-07T04:33:07Z

I see :) let’s see if this happens for others and I'll revisit then.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG -4294967296 #14

BUG -4294967296 #14

sguizard commented Jun 22, 2022 •

edited

Loading

ksahlin commented Jul 5, 2022

sguizard commented Jul 6, 2022

ksahlin commented Jul 6, 2022

ksahlin commented Jul 6, 2022

sguizard commented Jul 6, 2022

ksahlin commented Jul 6, 2022

sguizard commented Jul 6, 2022

ksahlin commented Jul 7, 2022

BUG -4294967296 #14

BUG -4294967296 #14

Comments

sguizard commented Jun 22, 2022 • edited Loading

ksahlin commented Jul 5, 2022

sguizard commented Jul 6, 2022

ksahlin commented Jul 6, 2022

ksahlin commented Jul 6, 2022

sguizard commented Jul 6, 2022

ksahlin commented Jul 6, 2022

sguizard commented Jul 6, 2022

ksahlin commented Jul 7, 2022

sguizard commented Jun 22, 2022 •

edited

Loading