-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RepEnrich2 gets stuck without error message #24
Comments
Hi, thanks for your interest in our project, While RepEnrich2 can sometimes take a long time on larger files, I've never observed it completely hanging before. Is there a simple way to reproduce this? For example, does it always hang on the exact same data files if you attempt a second run, or is it random to some degree? How large are the datasets, if you are needing to use 130 CPUs and 1TB of ram? The only things I can think of off the top of my head (at least without any error messages) are either some kind of update/change to one of the other software dependencies resulting in some unintended behavior, or if you are running on some unusually large datasets perhaps there is some other kind of limitation or edge case there. I would first look at the versions of the various other dependencies bowtie/bedtools/samtools/biopython and try to maybe see if any of them have been recently updated since you started having the issue. |
Thanks for your reply. I'm not sure whether it's always stuck at the same file or not. I'll rerun my most recent run and note down where it got stuck, then I'll be able to answer whether it hangs there again or not (It'll probably take 2 days or so to get there). My .fastq files are around 18-22GB and I'm using a whole unfiltered mm10 repeatmasker index. Even for my successful run in the past, it did take about 2-3 days per sample. I thought, this was normal and probably caused by the many repeats to test(?) I'm using the following tool versions: Which versions do you use/usually recommend? |
Update: Upon running it again on the same data with the same settings, it hangs again at exactly the same repeat. So do you think, the tool versions I mentioned at the end of my last post are fine? Which versions do you recommend? |
The only version I can see that might have some kind of issue is Samtools (I believe we originally tested with v1.3.1). The .fastqs do seem fairly large, so long runtimes like you described are pretty typical. If it were due to this though, I wouldn't expect other samples to run without issue. Do you remember which repeat it hangs on? Without being able to reproduce the error on my end it is difficult to diagnose. |
Sorry for the late reply, I got back to it just now.
The last repeats that were written to to pair1_ are _CACGGT_n.txt and _CACAG_n.txt, but these files look fine. Is there any way to find out which repeat would have been next?
If we can't solve it like this, in case you would like to have a look, I'd ask my supervisor whether I can share one of the files that hang for me with you. For now, I will try to re-run everything with samtools 1.3.1 and see if that makes a difference. |
Update: If it wouldn't work for other samples, I'd almost suspect that this is some weird restriction of our HPC/filesystem. But since others work, it's probably really RepEnrich getting stuck at the same repeat and being able write this number of files before hanging(?) If you'd like to have a look, I can ask my supervisor whether I can share one of these files with you. |
Sure, I'd be interested to take a look if you get permission to share one of the files! |
Again sorry for the late reply, I am doing this currently more on the the side. I uploaded my data on our university's file sharing service: https://seafile.rlp.net/d/8c325fc2eefd495faa71/ Please let me know as soon as you copied the data, such that I know when I can delete it again. |
All finished copying, I'll try and run it when I have some time |
Thanks a lot, I deleted the upload again. I hope, all files are fine, since there were some issues while uploading. In case they're not, just let me know. |
Hello,
I hope this tool is still actively maintained.
I was running RepEnrich2 successfully on our HPC with an old dataset in the past with the whole, unfiltered repeatmasker repeats. However, recently, with other datasets, running the RepEnrich2/RepEnrich2.py command would lead to the program getting stuck without any error message. Strangely, this only happens for some samples from a sequencing run, others do finish. Upon checking the pair1_ folder for affected samples, I always see that after creating .txt files for some thousands of repeats, at some point no new .txt files are created any more for several days (even though it's not done yet), until the job times out.
I'm running it on 130 CPUs with 1TB of RAM.
I checked stdout and stderr, but both are empty. Do you have any suggestions?
The text was updated successfully, but these errors were encountered: