-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MRG: integrate sketching utils from directsketch #516
Conversation
status:
Apparently the zip that is installed in my environment is v2.2, which has the bug mentioned here, and not v2.0, which is in Cargo.toml. ...why/how? Update: I'm not sure why/how that was happening, but I did some cleaning + updating, and now the right versions seem to be installing. Hopefully this will stay fixed. |
The only remaining failing tests are the ones with "incomplete" param strings -- i.e. user doesn't provide a moltype or doesn't provide a ksize. Temporarily commented out. The current parameter strategy = DNA default (k=31,scaled=1000,noabund), then modify any parameters that are provided. Previously, we required a What do we want to require? I think maybe the right solution may be to require thoughts @ctb? Update: I've changed the strategy to require moltype. I think this matches our |
* buildutils for singlesketch * consolidate sig building * consolidate sig writing * gz + zip writing! * check gz is actually gz * black formatting
benchmarks:
I think simple parallelization per #526 isn't going to work :) :( |
ok @ctb - ready for review |
@mr-eyes Here is where I modify |
Hi Tessa! This is cool! I will see while writing the code, but I think it's gonna work smoothly. Thank you! |
@ctb any chance you could run the benchmarks again? I modified |
on it! |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice cleanup!
Are you OK with this code eventually be moved over to sourmash, maybe? And if so are you ok with the license changing to BSD? (Just want to get it on record ;)) If so, please also create a sourmash issue suggesting that we consolidate some of this over there!
Yep, that was the intention. Will make a sourmash issue for it! |
In
directsketch
, I tried to come up with a set of sketching utils that would make reading parameters + building sketches easier and more standardized. I introduced a series of buildutils --BuildRecord
,BuildManifest
,BuildCollection
, etc that are similar to their sourmash counterparts, but enable building and updating.Here, I copy the utils over from
directsketch
and replace the two async functions with sync counterparts that enable writing the sigs and manifest from the buildutils. Then, I useBuildCollection
inmanysketch
andsinglesketch
to manage parameter string handling, sig template generation, and sig writing.Main differences vs prior strategy:
build_siginfo
withBuildCollection
sig templates, and read the param string in viaBuildCollection
. There are now safety checks in place: we only allow a single moltype, abund value, scaled value, etc. We do allow multiple kmer sizes.BuildRecord
, we don't need to iterate through all the sigs and the end and build newRecord
s for them prior to writing a Manifest. However, we do need to make sure theBuildRecord
s are updated appropriately after adding data + before writing the file. This is handled within theBuildCollection
methods.To do:
Future:
InnerStorage
for thread safety/parallelization?