-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating new HSDS job store, no jobs added #3970
Comments
Thanks for working on this! @DailyDreaming has a PR #3569 that is trying to rip out SimpleDB altogether, which might be useful to look at, but it hasn't been touched in a while. I noticed you did create If all you want to do is not use SimpleDB, and MinIO's S3 implementation is strongly consistent, you can drop all the table stuff that the S3 job store uses and be more like the Google job store, and just store stuff actually in the bucket. We added SimpleDB over S3 when Amazon S3 wasn't strongly consistent, to enforce strong consistency and to speed things up a bit, but now Amazon S3 is strongly consistent, so we don't really need it anymore. If you're implementing something new it might be better to not use a database layer over the bucket layer at all. As for the logs you sent, it looks like no jobs are ever being read out. (This makes sense, because the leader has a cache and only reads back jobs when it thinks they've been touched elsewhere.) They are logged as being appended by the batch save code:
But jobs never seem to be read or loaded after that. There is a message that it loaded jobs to iterate over and got If you look at the leader messages, the last thing it says is:
This is being swamped by constant debug-level logging of every HTTP request made for the stats-and-logging monitor, which spits out job logs as soon as they are saved. You might want to look into quieting that down somehow; we have machinery in Toil already to set higher log levels on internal Boto loggers, and maybe something like that is needed for the HSDS library logger. But I think the real reason that there's no apparent forward progress is that the job went to the batch system, started running, and never stopped. What was happening to the Kubernetes Job object |
Thanks for the suggestions! I finished implementing the Using just MinIO would probably be the way to go, but I'll work on this for now. The other Kubernetes Job and it's associated pod are now giving an error:
Full worker logs here: https://gist.github.com/edraizen/84e45cc72fd004cb1b12f56dad9e0bf4 Would this have something to do with my Kubernetes setup? Here is my deployment file in case that helps: https://gist.github.com/edraizen/72b01873a41b497729c1b9814296590e EDIT: It works correctly using the local batch system so it is an issue with Kubernetes |
The workaround here would be Is
|
It sounds like the HSDS job store works now, so I'm going to close this. @edraizen I'm not sure we could take this into upstream Toil if you PR'd it, because we don't have the relevant setup and that would make it hard to put under CI. But Toil does have some plugin-registering support for batch systems that we could probably also make work for job stores, so if you want this working with upstream Toil we could build that out and help you make it a plugin. |
Hello,
I am moving away from AWS in favor of a local MinIO instance on our university cluster. However, the AWS S3 job store requires SDB, which I don't believe is available in MinIO. I have been using HSDS in my project so I thought it would be interesting to try and make a job store using it where MinIO handles the overlarge file storage. I think I might be missing something because no jobs get added and the stats and logger is constantly being written to the output. The S3/MinIO parts works with some modification and the HSDS part is the one with errors.
My new branch with the HSDSS3JobStore: https://github.com/edraizen/toil/tree/hsds.
If anyone has any suggestions on how to get this working or other alternatives to SDB, I would really appreciate it!
Thanks,
Eli
Full logs: https://gist.github.com/edraizen/18dc8d00441a2b2324cdeaa98f33b9bf
┆Issue is synchronized with this Jira Task
┆friendlyId: TOIL-1112
The text was updated successfully, but these errors were encountered: