Parallel processors mode is not working #22

jordana-olive · 2022-11-23T15:46:11Z

Hi guys, I'm trying to run with parallel processors, but I realized that it is not working. It is running with only one processor, that is why it is taking forever. What should I do? Do I need to set something up during the installation?

My command line (it is a 40 processors with 500GB RAM server): (The conda environment is activated)
run_isoncorrect --t 20 --fastq_folder 01-isonclustering/02-clustered-fastq/ --outfolder 03-ONT-fastq-corrected

jordana-olive · 2022-11-23T15:57:43Z

I just saw the specifications again. I'll try to run from .sh script.

ksahlin · 2022-11-23T17:00:54Z

Ok, great. It should work with multiple cores using the run_isoncorrect --t 20 command. Let me know how it goes.

jordana-olive · 2022-11-28T14:23:47Z

Hi Sahlin,
I realized that the last clusters are running with fewer processors than the first ones. Now it is taking forever (~8 hours per cluster).
Are these clusters the longest ones?

ksahlin · 2022-11-28T14:32:41Z

If you have a few very large clusters, you can(/should) use --split_wrt_batches.

According to the documentation, this option

--split_wrt_batches   Process reads per batch (of max_seqs sequences) 
                       instead of per cluster. Significantly decrease runtime when few 
                       very large clusters are less than the number of cores used.

Here max_seqs is typically 1000 or 2000, this speeds it up a lot when few very large clusters are present. We used this mode for the SIRV dataset (in the paper) which had one of the clusters being half of the reads.

jordana-olive · 2022-11-30T17:45:51Z

Hi Sahlin,
I canceled my last script (without --split_wrt_branches), but now I have another issue. It seems stopped in the last cluster, and the program can not finish properly. I run the test data (100 reads) and it worked well. I don't know what is going on:

My script:
#!/bin/bash

Pipeline to get high-quality full-length reads from ONT cDNA sequencing

Set path to output and number of cores

root_out="03-correction"
cores=20
mkdir -p $root_out
run_isoncorrect --t $cores --fastq_folder 01-isonclustering/02-clustered-fastq/ --outfolder $root_out --split_wrt_batches

The error:

Running isoncorrect batch_id:100000_0... multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/eniac/miniconda3/envs/isoncorrect/lib/python3.11/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
^^^^^^^^^^^^^^^^^^^
File "/home/eniac/miniconda3/envs/isoncorrect/bin/run_isoncorrect", line 94, in isoncorrect
subprocess.check_call([ "/usr/bin/time", isoncorrect_exec, "--fastq", read_fastq_file, "--outfolder", outfolder,
File "/home/eniac/miniconda3/envs/isoncorrect/lib/python3.11/subprocess.py", line 413, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/usr/bin/time', '/home/eniac/miniconda3/envs/isoncorrect/bin/isONcorrect', '--fastq', '/tmp/tmpl_shyp5i/split_in_batches/100000_0.fastq', '--outfolder', '03-correction/100000_0', '--exact_instance_limit', '50', '--max_seqs', '2000', '--k', '9', '--w', '20', '--xmin', '18', '--xmax', '80', '--T', '0.1']' returned non-zero exit status 1.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/eniac/miniconda3/envs/isoncorrect/bin/run_isoncorrect", line 365, in
main(args)
File "/home/eniac/miniconda3/envs/isoncorrect/bin/run_isoncorrect", line 281, in main
for x in pool.imap_unordered(isoncorrect, instances):
File "/home/eniac/miniconda3/envs/isoncorrect/lib/python3.11/multiprocessing/pool.py", line 873, in next
raise value
subprocess.CalledProcessError: Command '['/usr/bin/time', '/home/eniac/miniconda3/envs/isoncorrect/bin/isONcorrect', '--fastq', '/tmp/tmpl_shyp5i/split_in_batches/100000_0.fastq', '--outfolder', '03-correction/100000_0', '--exact_instance_limit', '50', '--max_seqs', '2000', '--k', '9', '--w', '20', '--xmin', '18', '--xmax', '80', '--T', '0.1']' returned non-zero exit status 1.

jordana-olive · 2022-11-30T18:49:29Z

Hi Sahlin, I canceled my last script (without --split_wrt_branches), but now I have another issue. It seems stopped in the last cluster, and the program can not finish properly. I run the test data (100 reads) and it worked well. I don't know what is going on:

My script: #!/bin/bash

Pipeline to get high-quality full-length reads from ONT cDNA sequencing

Set path to output and number of cores

root_out="03-correction" cores=20 mkdir -p $root_out run_isoncorrect --t $cores --fastq_folder 01-isonclustering/02-clustered-fastq/ --outfolder $root_out --split_wrt_batches

The error:

Running isoncorrect batch_id:100000_0... multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/eniac/miniconda3/envs/isoncorrect/lib/python3.11/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) ^^^^^^^^^^^^^^^^^^^ File "/home/eniac/miniconda3/envs/isoncorrect/bin/run_isoncorrect", line 94, in isoncorrect subprocess.check_call([ "/usr/bin/time", isoncorrect_exec, "--fastq", read_fastq_file, "--outfolder", outfolder, File "/home/eniac/miniconda3/envs/isoncorrect/lib/python3.11/subprocess.py", line 413, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['/usr/bin/time', '/home/eniac/miniconda3/envs/isoncorrect/bin/isONcorrect', '--fastq', '/tmp/tmpl_shyp5i/split_in_batches/100000_0.fastq', '--outfolder', '03-correction/100000_0', '--exact_instance_limit', '50', '--max_seqs', '2000', '--k', '9', '--w', '20', '--xmin', '18', '--xmax', '80', '--T', '0.1']' returned non-zero exit status 1. """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/eniac/miniconda3/envs/isoncorrect/bin/run_isoncorrect", line 365, in main(args) File "/home/eniac/miniconda3/envs/isoncorrect/bin/run_isoncorrect", line 281, in main for x in pool.imap_unordered(isoncorrect, instances): File "/home/eniac/miniconda3/envs/isoncorrect/lib/python3.11/multiprocessing/pool.py", line 873, in next raise value subprocess.CalledProcessError: Command '['/usr/bin/time', '/home/eniac/miniconda3/envs/isoncorrect/bin/isONcorrect', '--fastq', '/tmp/tmpl_shyp5i/split_in_batches/100000_0.fastq', '--outfolder', '03-correction/100000_0', '--exact_instance_limit', '50', '--max_seqs', '2000', '--k', '9', '--w', '20', '--xmin', '18', '--xmax', '80', '--T', '0.1']' returned non-zero exit status 1.

*update, with the test data (100 reads), it did not work as well. But as it is a small dataset, the program was able to generate the final fastq for each cluster.

ksahlin · 2022-11-30T19:14:04Z

If the file /tmp/tmpl_shyp5i/split_in_batches/100000_0.fastq is still there, could you try running:

/usr/bin/time /home/eniac/miniconda3/envs/isoncorrect/bin/isONcorrect --fastq \ 
           /tmp/tmpl_shyp5i/split_in_batches/100000_0.fastq --outfolder 03-correction/100000_0 \
            --exact_instance_limit 50 --max_seqs 2000 --k 9 --w 20 --xmin 18 --xmax 80 --T 0.1

This is the instance that generates an error.

ksahlin · 2022-11-30T19:16:50Z

Perhaps isONcorrect also logs the error for this in a file .stderr somewhere in the output older in 03-correction/100000_0. I forgot if i Implemented that. In that case you could check the error in that file.

jordana-olive · 2022-11-30T22:51:22Z

Thank you for replying.
I realized that depends on python version the error change a little bit, but it is still not working.
I've tried to install via github (python 3.8), via conda is 3.11 automatically. Then, I reinstalled via conda forcing python=3.10 version, it seems that run more clusters, but I still have the same issue above.

I also realized that just few clusters were done, so it is not viable to run cluster by cluster in the tmp folder (and as I had this issue, I'm not sure if all my clusters were there).

It seems that some clusters I got the final run, others stopped in the middle of the process and others just crashed. If you have some idea what is going on, I really appreciate that. Thanks :)

Plus, the stderr file in the failed clusters said:

Traceback (most recent call last):
File "/home/eniac/miniconda3/envs/isoncorrect/bin/isONcorrect", line 1551, in
main(args)
File "/home/eniac/miniconda3/envs/isoncorrect/bin/isONcorrect", line 1213, in main
all_reads = { i + 1 : (acc, seq, qual) for i, (acc, (seq, qual)) in enumerate(help_functions.readfq(open(args.fastq, 'r')))}
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpiq04o1ed/split_in_batches/100000_0.fastq'
Command exited with non-zero status 1
0.62user 1.60system 0:00.27elapsed 805%CPU (0avgtext+0avgdata 36808maxresident)k
0inputs+0outputs (0major+5432minor)pagefaults 0swaps

ksahlin · 2022-12-01T07:54:55Z

The error you reported is just because the file is not there anymore (these files get flushed from the tmp folder regularly by the system). It is not the actual error you encounter when running.

Another guess: remove the output folder 03-correction, perhaps some old files there interfere with your output from other attempts.

Otherwise, perhaps you could copy the offending file from the temp folder when it is present and run the command on that file as

/usr/bin/time /home/eniac/miniconda3/envs/isoncorrect/bin/isONcorrect --fastq \ 
           THE_FAILING_TMP_FILE.fastq --outfolder 03-correction/100000_0 \
            --exact_instance_limit 50 --max_seqs 2000 --k 9 --w 20 --xmin 18 --xmax 80 --T 0.1

isONcorrect will let you know at start of the run which tmp folder it is working in by writing Temporary workdirectory: [HERE IS THE PATH]

jordana-olive · 2022-12-01T16:22:12Z

Hi, Sahlin.
Yes, I checked the last issues, my error is similar to other, with tmp folder and etc...
(I'm deleting the previous out file before running). I checked the run_isoncorrect script and found the lines "/usr/bin/time", now replaced to "time", but I see that the symbolic link can't be opened in tmp folder: (my version is 0.0.8).

As I noticed that several clusters are with the same problem, I'm not sure if is viable to copy and run all that by using isONcorrect.

When I check the fail cluster, I have this:
ls -lah /tmp/tmpzgp8a2tb/split_in_batches/100000_0.fastq
lrwxrwxrwx 1 eniac eniac 49 Dec 1 11:04 /tmp/tmpzgp8a2tb/split_in_batches/100000_0.fastq -> 01-isonclustering/02-clustered-fastq/100000.fastq

Nonetheless, I can't open the file.
head /tmp/tmpzgp8a2tb/split_in_batches/100000_0.fastq
head: cannot open '/tmp/tmpzgp8a2tb/split_in_batches/100000_0.fastq' for reading: No such file or directory

I'll keep trying to solve this. If you have some tip, please, I appreciate that.
Thanks

ksahlin · 2022-12-01T16:47:56Z

Not sure why you have a symbolic link to the file?

You need to copy it completely cp /tmp/tmpzgp8a2tb/split_in_batches/100000_0.fastq THE_FAILING_TMP_FILE.fastq.

At the moment you only seem to have a symbolic link, which means that if the tmp file disappear you no longer have access to it.

jordana-olive · 2022-12-01T17:20:20Z

I think we are not discussing the same page...
The program create the symlink to the files to run --split_in_batches...

ksahlin · 2022-12-02T08:28:49Z

Okay, how about this.

On line 218 in run_isoncorrect here, please change this line to tmp_work_dir = "XX-correction/" (or whatever path you want on your system). This way all the files will be present in XX-correction/ and you have them there should anything break along the way.

Then we can locate the file(s) where it is going wrong and run them individually for error message.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallel processors mode is not working #22

Parallel processors mode is not working #22

jordana-olive commented Nov 23, 2022

jordana-olive commented Nov 23, 2022

ksahlin commented Nov 23, 2022

jordana-olive commented Nov 28, 2022

ksahlin commented Nov 28, 2022 •

edited

Loading

jordana-olive commented Nov 30, 2022

jordana-olive commented Nov 30, 2022

Pipeline to get high-quality full-length reads from ONT cDNA sequencing

Set path to output and number of cores

ksahlin commented Nov 30, 2022

ksahlin commented Nov 30, 2022 •

edited

Loading

jordana-olive commented Nov 30, 2022

ksahlin commented Dec 1, 2022

jordana-olive commented Dec 1, 2022

ksahlin commented Dec 1, 2022

jordana-olive commented Dec 1, 2022

ksahlin commented Dec 2, 2022 •

edited

Loading

Parallel processors mode is not working #22

Parallel processors mode is not working #22

Comments

jordana-olive commented Nov 23, 2022

jordana-olive commented Nov 23, 2022

ksahlin commented Nov 23, 2022

jordana-olive commented Nov 28, 2022

ksahlin commented Nov 28, 2022 • edited Loading

jordana-olive commented Nov 30, 2022

Pipeline to get high-quality full-length reads from ONT cDNA sequencing

Set path to output and number of cores

jordana-olive commented Nov 30, 2022

Pipeline to get high-quality full-length reads from ONT cDNA sequencing

Set path to output and number of cores

ksahlin commented Nov 30, 2022

ksahlin commented Nov 30, 2022 • edited Loading

jordana-olive commented Nov 30, 2022

ksahlin commented Dec 1, 2022

jordana-olive commented Dec 1, 2022

ksahlin commented Dec 1, 2022

jordana-olive commented Dec 1, 2022

ksahlin commented Dec 2, 2022 • edited Loading

ksahlin commented Nov 28, 2022 •

edited

Loading

ksahlin commented Nov 30, 2022 •

edited

Loading

ksahlin commented Dec 2, 2022 •

edited

Loading