Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel processors mode is not working #22

Open
jordana-olive opened this issue Nov 23, 2022 · 14 comments
Open

Parallel processors mode is not working #22

jordana-olive opened this issue Nov 23, 2022 · 14 comments

Comments

@jordana-olive
Copy link

Hi guys, I'm trying to run with parallel processors, but I realized that it is not working. It is running with only one processor, that is why it is taking forever. What should I do? Do I need to set something up during the installation?

My command line (it is a 40 processors with 500GB RAM server): (The conda environment is activated)
run_isoncorrect --t 20 --fastq_folder 01-isonclustering/02-clustered-fastq/ --outfolder 03-ONT-fastq-corrected

@jordana-olive
Copy link
Author

I just saw the specifications again. I'll try to run from .sh script.

@ksahlin
Copy link
Owner

ksahlin commented Nov 23, 2022

Ok, great. It should work with multiple cores using the run_isoncorrect --t 20 command. Let me know how it goes.

@jordana-olive
Copy link
Author

Hi Sahlin,
I realized that the last clusters are running with fewer processors than the first ones. Now it is taking forever (~8 hours per cluster).
Are these clusters the longest ones?

Screenshot from 2022-11-28 09-22-51

@ksahlin
Copy link
Owner

ksahlin commented Nov 28, 2022

If you have a few very large clusters, you can(/should) use --split_wrt_batches.

According to the documentation, this option

--split_wrt_batches   Process reads per batch (of max_seqs sequences) 
                       instead of per cluster. Significantly decrease runtime when few 
                       very large clusters are less than the number of cores used.

Here max_seqs is typically 1000 or 2000, this speeds it up a lot when few very large clusters are present. We used this mode for the SIRV dataset (in the paper) which had one of the clusters being half of the reads.

@jordana-olive
Copy link
Author

Hi Sahlin,
I canceled my last script (without --split_wrt_branches), but now I have another issue. It seems stopped in the last cluster, and the program can not finish properly. I run the test data (100 reads) and it worked well. I don't know what is going on:

My script:
#!/bin/bash

Pipeline to get high-quality full-length reads from ONT cDNA sequencing

Set path to output and number of cores

root_out="03-correction"
cores=20
mkdir -p $root_out
run_isoncorrect --t $cores --fastq_folder 01-isonclustering/02-clustered-fastq/ --outfolder $root_out --split_wrt_batches

The error:

Running isoncorrect batch_id:100000_0... multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "/home/eniac/miniconda3/envs/isoncorrect/lib/python3.11/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
^^^^^^^^^^^^^^^^^^^
File "/home/eniac/miniconda3/envs/isoncorrect/bin/run_isoncorrect", line 94, in isoncorrect
subprocess.check_call([ "/usr/bin/time", isoncorrect_exec, "--fastq", read_fastq_file, "--outfolder", outfolder,
File "/home/eniac/miniconda3/envs/isoncorrect/lib/python3.11/subprocess.py", line 413, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/usr/bin/time', '/home/eniac/miniconda3/envs/isoncorrect/bin/isONcorrect', '--fastq', '/tmp/tmpl_shyp5i/split_in_batches/100000_0.fastq', '--outfolder', '03-correction/100000_0', '--exact_instance_limit', '50', '--max_seqs', '2000', '--k', '9', '--w', '20', '--xmin', '18', '--xmax', '80', '--T', '0.1']' returned non-zero exit status 1.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/home/eniac/miniconda3/envs/isoncorrect/bin/run_isoncorrect", line 365, in
main(args)
File "/home/eniac/miniconda3/envs/isoncorrect/bin/run_isoncorrect", line 281, in main
for x in pool.imap_unordered(isoncorrect, instances):
File "/home/eniac/miniconda3/envs/isoncorrect/lib/python3.11/multiprocessing/pool.py", line 873, in next
raise value
subprocess.CalledProcessError: Command '['/usr/bin/time', '/home/eniac/miniconda3/envs/isoncorrect/bin/isONcorrect', '--fastq', '/tmp/tmpl_shyp5i/split_in_batches/100000_0.fastq', '--outfolder', '03-correction/100000_0', '--exact_instance_limit', '50', '--max_seqs', '2000', '--k', '9', '--w', '20', '--xmin', '18', '--xmax', '80', '--T', '0.1']' returned non-zero exit status 1.

@jordana-olive
Copy link
Author

Hi Sahlin, I canceled my last script (without --split_wrt_branches), but now I have another issue. It seems stopped in the last cluster, and the program can not finish properly. I run the test data (100 reads) and it worked well. I don't know what is going on:

My script: #!/bin/bash

Pipeline to get high-quality full-length reads from ONT cDNA sequencing

Set path to output and number of cores

root_out="03-correction" cores=20 mkdir -p $root_out run_isoncorrect --t $cores --fastq_folder 01-isonclustering/02-clustered-fastq/ --outfolder $root_out --split_wrt_batches

The error:

Running isoncorrect batch_id:100000_0... multiprocessing.pool.RemoteTraceback: """ Traceback (most recent call last): File "/home/eniac/miniconda3/envs/isoncorrect/lib/python3.11/multiprocessing/pool.py", line 125, in worker result = (True, func(*args, **kwds)) ^^^^^^^^^^^^^^^^^^^ File "/home/eniac/miniconda3/envs/isoncorrect/bin/run_isoncorrect", line 94, in isoncorrect subprocess.check_call([ "/usr/bin/time", isoncorrect_exec, "--fastq", read_fastq_file, "--outfolder", outfolder, File "/home/eniac/miniconda3/envs/isoncorrect/lib/python3.11/subprocess.py", line 413, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['/usr/bin/time', '/home/eniac/miniconda3/envs/isoncorrect/bin/isONcorrect', '--fastq', '/tmp/tmpl_shyp5i/split_in_batches/100000_0.fastq', '--outfolder', '03-correction/100000_0', '--exact_instance_limit', '50', '--max_seqs', '2000', '--k', '9', '--w', '20', '--xmin', '18', '--xmax', '80', '--T', '0.1']' returned non-zero exit status 1. """

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/eniac/miniconda3/envs/isoncorrect/bin/run_isoncorrect", line 365, in main(args) File "/home/eniac/miniconda3/envs/isoncorrect/bin/run_isoncorrect", line 281, in main for x in pool.imap_unordered(isoncorrect, instances): File "/home/eniac/miniconda3/envs/isoncorrect/lib/python3.11/multiprocessing/pool.py", line 873, in next raise value subprocess.CalledProcessError: Command '['/usr/bin/time', '/home/eniac/miniconda3/envs/isoncorrect/bin/isONcorrect', '--fastq', '/tmp/tmpl_shyp5i/split_in_batches/100000_0.fastq', '--outfolder', '03-correction/100000_0', '--exact_instance_limit', '50', '--max_seqs', '2000', '--k', '9', '--w', '20', '--xmin', '18', '--xmax', '80', '--T', '0.1']' returned non-zero exit status 1.

*update, with the test data (100 reads), it did not work as well. But as it is a small dataset, the program was able to generate the final fastq for each cluster.

@ksahlin
Copy link
Owner

ksahlin commented Nov 30, 2022

If the file /tmp/tmpl_shyp5i/split_in_batches/100000_0.fastq is still there, could you try running:

/usr/bin/time /home/eniac/miniconda3/envs/isoncorrect/bin/isONcorrect --fastq \ 
           /tmp/tmpl_shyp5i/split_in_batches/100000_0.fastq --outfolder 03-correction/100000_0 \
            --exact_instance_limit 50 --max_seqs 2000 --k 9 --w 20 --xmin 18 --xmax 80 --T 0.1

This is the instance that generates an error.

@ksahlin
Copy link
Owner

ksahlin commented Nov 30, 2022

Perhaps isONcorrect also logs the error for this in a file .stderr somewhere in the output older in 03-correction/100000_0. I forgot if i Implemented that. In that case you could check the error in that file.

@jordana-olive
Copy link
Author

Thank you for replying.
I realized that depends on python version the error change a little bit, but it is still not working.
I've tried to install via github (python 3.8), via conda is 3.11 automatically. Then, I reinstalled via conda forcing python=3.10 version, it seems that run more clusters, but I still have the same issue above.

I also realized that just few clusters were done, so it is not viable to run cluster by cluster in the tmp folder (and as I had this issue, I'm not sure if all my clusters were there).

It seems that some clusters I got the final run, others stopped in the middle of the process and others just crashed. If you have some idea what is going on, I really appreciate that. Thanks :)

Plus, the stderr file in the failed clusters said:

Traceback (most recent call last):
File "/home/eniac/miniconda3/envs/isoncorrect/bin/isONcorrect", line 1551, in
main(args)
File "/home/eniac/miniconda3/envs/isoncorrect/bin/isONcorrect", line 1213, in main
all_reads = { i + 1 : (acc, seq, qual) for i, (acc, (seq, qual)) in enumerate(help_functions.readfq(open(args.fastq, 'r')))}
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpiq04o1ed/split_in_batches/100000_0.fastq'
Command exited with non-zero status 1
0.62user 1.60system 0:00.27elapsed 805%CPU (0avgtext+0avgdata 36808maxresident)k
0inputs+0outputs (0major+5432minor)pagefaults 0swaps

@ksahlin
Copy link
Owner

ksahlin commented Dec 1, 2022

The error you reported is just because the file is not there anymore (these files get flushed from the tmp folder regularly by the system). It is not the actual error you encounter when running.

Another guess: remove the output folder 03-correction, perhaps some old files there interfere with your output from other attempts.

Otherwise, perhaps you could copy the offending file from the temp folder when it is present and run the command on that file as

/usr/bin/time /home/eniac/miniconda3/envs/isoncorrect/bin/isONcorrect --fastq \ 
           THE_FAILING_TMP_FILE.fastq --outfolder 03-correction/100000_0 \
            --exact_instance_limit 50 --max_seqs 2000 --k 9 --w 20 --xmin 18 --xmax 80 --T 0.1

isONcorrect will let you know at start of the run which tmp folder it is working in by writing Temporary workdirectory: [HERE IS THE PATH]

@jordana-olive
Copy link
Author

Hi, Sahlin.
Yes, I checked the last issues, my error is similar to other, with tmp folder and etc...
(I'm deleting the previous out file before running). I checked the run_isoncorrect script and found the lines "/usr/bin/time", now replaced to "time", but I see that the symbolic link can't be opened in tmp folder: (my version is 0.0.8).

As I noticed that several clusters are with the same problem, I'm not sure if is viable to copy and run all that by using isONcorrect.

When I check the fail cluster, I have this:
ls -lah /tmp/tmpzgp8a2tb/split_in_batches/100000_0.fastq
lrwxrwxrwx 1 eniac eniac 49 Dec 1 11:04 /tmp/tmpzgp8a2tb/split_in_batches/100000_0.fastq -> 01-isonclustering/02-clustered-fastq/100000.fastq

Nonetheless, I can't open the file.
head /tmp/tmpzgp8a2tb/split_in_batches/100000_0.fastq
head: cannot open '/tmp/tmpzgp8a2tb/split_in_batches/100000_0.fastq' for reading: No such file or directory

I'll keep trying to solve this. If you have some tip, please, I appreciate that.
Thanks

@ksahlin
Copy link
Owner

ksahlin commented Dec 1, 2022

Not sure why you have a symbolic link to the file?

You need to copy it completely cp /tmp/tmpzgp8a2tb/split_in_batches/100000_0.fastq THE_FAILING_TMP_FILE.fastq.

At the moment you only seem to have a symbolic link, which means that if the tmp file disappear you no longer have access to it.

@jordana-olive
Copy link
Author

I think we are not discussing the same page...
The program create the symlink to the files to run --split_in_batches...

@ksahlin
Copy link
Owner

ksahlin commented Dec 2, 2022

Okay, how about this.

On line 218 in run_isoncorrect here, please change this line to tmp_work_dir = "XX-correction/" (or whatever path you want on your system). This way all the files will be present in XX-correction/ and you have them there should anything break along the way.

Then we can locate the file(s) where it is going wrong and run them individually for error message.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants