You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using getorganelle to extract mitochondrial genomes from insects. I first do an assembly using SPAdes (because we use that output for other downstream tasks anyway), and then I run get_organelle_from_assembly.py with animal_mt on the assembly_graph.fastg output from spades.
I'm running this on a lot of genomes, and on maybe 10% of them, getorganelle just gets stuck on the INFO: Slimming assembly graph ... step (while its running slim_graph.py), taking up 100% of one CPU core, for hours and hours until I kill it. There is no other log output, even with --verbose.
When it works, it usually works in 5-20 minutes, for assemblies of similar genomes. But sometimes it takes hours, and sometimes it never converges.
My genome assemblies from SPAdes are far from good, its a boatload of small contigs of size 5-10 kb in the contigs.fasta output. But most of the time getorganelle finds a circular genome.
Apart from just setting a timer and simply killing getorganelle if it hasn't found anything in say an hour... is there anything to check - can this be a bug, or simply an artefact of some kind of complicated / buggy .fastg file from spades? Are there some internal cutoff/limits to set in getorganelle to avoid it getting stuck like this?
I can share a assembly_graph.fastg file somewhere if someone wants to test it.
BTW I switched from running get_organelle_from_reads to running on the spades graph output because I found that the latter was much better on average on finding circular mitogenomes.
The text was updated successfully, but these errors were encountered:
A little update - the same genome as processed through the getorganelle from_reads mode works fine and finds a circular genome after just 15 minutes or so. I guess the graph output from spades in this case is just too massive, I had a look in Bandage and it's a million nodes or so, most of them small unconnected islands, and I guess this complexity is handled differently in both modes. In the from_reads mode all those bad short reads never enter the pipeline I guess in the first place.
As I haven't found any sane options in Spades to limit this in the graph output, it might be a good idea to be able to get GetOrganelle to limit/threshold the graph input? Maybe there is an option I'm just not aware of. Kind of similar to how the from_reads mode "tastes" the input reads?
Further update - I discovered that --min-depth has an effect on this mode (from_assembly), it seems it does some kind of filtering before slimming that reduces the load significantly if there is a lot of un-assembled reads in the graph as in my case. Just setting min-depth to anything over 1.0 made it converge even though it took 12 hours of processing. Setting min-depth to 10 allowed it to find a solution in 1 hour.
Unfortunately it's not possible to use it consistently - other genome assembly graphs I have require a min-depth of 1.2 to converge to a circular genome (anything over that and you get a bunch of separate scaffolds instead).
So I guess what could be done here is some kind of analysis at the start of the from_assembly run, that tries to figure out an adaptive min-depth threshold of some sort?
I'm using getorganelle to extract mitochondrial genomes from insects. I first do an assembly using SPAdes (because we use that output for other downstream tasks anyway), and then I run get_organelle_from_assembly.py with animal_mt on the assembly_graph.fastg output from spades.
I'm running this on a lot of genomes, and on maybe 10% of them, getorganelle just gets stuck on the INFO: Slimming assembly graph ... step (while its running slim_graph.py), taking up 100% of one CPU core, for hours and hours until I kill it. There is no other log output, even with --verbose.
When it works, it usually works in 5-20 minutes, for assemblies of similar genomes. But sometimes it takes hours, and sometimes it never converges.
My genome assemblies from SPAdes are far from good, its a boatload of small contigs of size 5-10 kb in the contigs.fasta output. But most of the time getorganelle finds a circular genome.
Apart from just setting a timer and simply killing getorganelle if it hasn't found anything in say an hour... is there anything to check - can this be a bug, or simply an artefact of some kind of complicated / buggy .fastg file from spades? Are there some internal cutoff/limits to set in getorganelle to avoid it getting stuck like this?
I can share a assembly_graph.fastg file somewhere if someone wants to test it.
BTW I switched from running get_organelle_from_reads to running on the spades graph output because I found that the latter was much better on average on finding circular mitogenomes.
The text was updated successfully, but these errors were encountered: