-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fastmultigather x rocksdb doesn't actually use multiple cores #503
Comments
for posterity, here is the Snakefile I am using:
|
Adding github is scrapping the interactive parts of the flamegraph SVG, but posting below. Generated by running
Some initial comments:
|
looks like this is all happening in The
that is needed because The second metagenome clone is done on the modified query, each iteration, when
and this is used in We can always make gather faster, I guess 😆 |
After sourmash-bio/sourmash#3385: Down to 22m (was 48m before). |
Tried using This took 25m vs 22m with |
…shBTree (#3385) While debugging sourmash-bio/sourmash_plugin_branchwater#503 the flamegraph showed ~26% of the time was spent on calculating MD5. WHY???? Turns out cloning and converting to `KmerMinHash` to `KmerMinHashBTree` triggered recalculation of the MD5 sum, even if it was already present (or... not needed). Oops!
From a ~two day run of fastmultigather against rocksdb, with an early version of #504 and NOT using the performance fix above, we get good multicore usage on fastmultigather using snakemake/process level parallelism. Only 6 GB of RAM used, too!
|
one of the upshots from the benchmarks in #479 is that fmg x rocksdb doesn't use multiple cores and is also I/O bound. This is presumably because it is a serial algorithm that blocks on RocksDB queries!
I was able to get decent parallelism by using snakemake with multiple processes (see benchmark).
This is not inherently a problem, but I think we should at the least mention it in the docs.
A few thoughts:
-o
behavior in thefastmultigather
command - it is an enduring source of confusion for me, at least. (ref: Can't get fastmultigather to write to a specific output file #299)fastgather
against rocksdb? It doesn't inherently make sense to have fmg support multiple threads that it's not going to use ;).The text was updated successfully, but these errors were encountered: