Replace SDB with S3 in AWS job store #964

hannes-ucsc · 2016-06-10T17:56:57Z

Quoting an AWS support professional in case 1767267511:

I would recommend seeing if you would consider DynamoDB to replace your SimpleDB solution. DynamoDB is essentially the successor of SimpleDB, which is slowly being pulled out from active development. In fact, we're no longer offering that to new customers at this point.

If Toil should run on newly opened AWS accounts, we need to phase out SimpleDB.

I propose that we create a new, second implementation of the AWS job store that uses DynamoDB. The new implementation should be accessible under the aws job store locator, while the old one becomes aws_old.

The reason I didn't use DynamoDB in the first place was the payment model, which is based on a flat rate as a function of a configurable ("provisioned" in Amazon lingo) request volume. Toil would have to set that request volume to user-specified value (with a sensible default) before a workflow starts and make sure that it configures it back to the lowest possible value on exit.

┆Issue is synchronized with this Jira Story
┆friendlyId: TOIL-350

The text was updated successfully, but these errors were encountered:

cket · 2016-12-01T00:20:41Z

We should keep an eye on this, if they start deprecating SDB we need to start on a Dynamo job store replacement.

abatilo · 2021-01-31T01:06:57Z

Any chance that this can be revisited?

DailyDreaming · 2021-01-31T02:13:29Z

@abatilo s3 is now strongly consistent, and so this issue is now about replacing SDB with s3. This will probably be worked on relatively soon actually (sometime in the next few months).

abatilo · 2021-01-31T02:15:04Z

Would it be a big lift? I would be curious to know if I could help.

DailyDreaming · 2021-01-31T04:21:51Z

@abatilo Medium sized, I would guess? It still needs to be explored.

Most of the work would involve removing the current sdb functionality, identifying everything it's shuttling back and forth (primarily items with job attributes, representing jobs to be processed), and then making the remapping that will fetch/put files into s3. Jobs would map to job files in s3, and the presence of one signifies a job yet to be run, and a job that has finished should no longer have a file. Most of the work will be in the https://github.com/DataBiosphere/toil/blob/master/src/toil/jobStores/aws/jobStore.py file.

Some examples:

Loading a job currently uses a jobstore id to key the attributes for a job out of sdb: https://github.com/DataBiosphere/toil/blob/master/src/toil/jobStores/aws/jobStore.py

This would need to be changed to using the jobstore id to fetch a bucket file by bucket name (aws jobstore name) and key.

Same with deleting a job: https://github.com/DataBiosphere/toil/blob/master/src/toil/jobStores/aws/jobStore.py or listing jobs: https://github.com/DataBiosphere/toil/blob/master/src/toil/jobStores/aws/jobStore.py#L322

There are also some odd spots where not finding a job needs to be handled specific to sdb, for example:

toil/src/toil/leader.py

Line 973 in 98dbf33

self.processRemovedJob(issuedJob, resultStatus)

If you want to tackle this, or a portion of it, we'd be happy to have the help and I'd be glad to review code progress on this as well.

DailyDreaming · 2021-02-10T06:22:23Z

@abatilo We have sprint planning tomorrow and I'm going to propose putting this into the upcoming sprint.

abatilo · 2021-02-10T14:30:48Z

That's awesome. Thank you

abatilo · 2021-02-17T18:55:22Z

@DailyDreaming Could we still consider DynamoDB? S3 has throughput limits which might become problematic.

DailyDreaming · 2021-02-25T18:52:20Z

@abatilo Yes, that's certainly still a possibility. What kind of limits concern you? First hit searching indicates 3500 requests/second to PUT data, and 5500 requests per second to GET data on s3. I'm not sure we're going to be hitting those limits, though it does look like dynamodb has higher limits.

abatilo · 2021-02-27T15:01:53Z

Members of my informatics team have expressed to me that with the current usage of S3, we've had pipelines fail due to hitting S3 limits. I haven't had time to dig in yet but that's why I wanted to bring it up here.

DailyDreaming · 2021-03-03T17:44:28Z

I see. The database is more to enforce strong consistency, so I'd have to investigate how much the rate will increase (which I suspect would mostly be from heading a file to check for existence, rather than checking the db).

unito-bot · 2022-01-13T17:08:35Z

➤ Adam Novak commented:

Since S3 is strongly consistent now, we’re planning to just use that and not DynamoDB.

stain · 2022-01-19T10:26:23Z

Will it be possible to use other S3 backends than AWS?

Guigzai · 2024-02-01T18:39:41Z

Hello,

It would be interesting to get rid of the amazon dependency to be able to use on-premise kubernetes platforms.

And therefore to replace sdb with something other than an amazon solution like dynamodb.

Would it be possible to consider solutions like Redis, etc.?

Regards

unito-bot · 2024-02-13T18:10:45Z

➤ Adam Novak commented:

Lon is making a cool control flow diagram for this.

davidjsherman · 2024-02-15T08:21:12Z

We've been following this issue for a long time, hoping that using a strongly consistent S3 backend as mentioned by @unito-bot would be adopted.

Specifically we'd like to use Ceph's S3-compatible object storage, which guarantees strong consistency. Deploying Ceph is a common cluster storage solution for on-premises Kubernetes, since the Rook operator does the heavy lifting.

adamnovak · 2024-04-30T17:20:23Z

We have Ceph now at UCSC, and using Ceph directly (instead of through the shared filesystem) might be interesting.

davidjsherman · 2024-05-01T14:58:42Z

What could we (at Inria) do to contribute?

stxue1 · 2024-06-19T02:00:32Z

Lon will probably be the one who would work on this, though it will be a while before this is added to the sprint. We don't have many internal people using the AWS implementation so we haven't had much spare development time for this.

We have a vague idea on implementing jobstore plugins similar to batchsystem plugins, so any ideas/recommendations there can be helpful.

Community contributions are of course always welcome. Unfortunately I'm unsure where those contributions could go, as this is Lon's task and I'm unsure of its current progress. If you want, you could ping him and ask where contributions for this could go.

hannes-ucsc self-assigned this Jun 10, 2016

hannes-ucsc added enhancement aws epic labels Jun 10, 2016

hannes-ucsc added this to the Sprint 04 (3.3.0) milestone Jun 21, 2016

hannes-ucsc added the ready label Jun 28, 2016

hannes-ucsc modified the milestones: Sprint 05 (3.4.0), Sprint 04 (3.3.0) Jun 28, 2016

hannes-ucsc removed the ready label Jun 28, 2016

hannes-ucsc modified the milestones: Sprint 05 (skipped), Sprint 06 (3.5.0) Jul 5, 2016

hannes-ucsc mentioned this issue Jul 29, 2016

Remove workaround for ghost jobs caused by stale reads from SDB #1091

Open

hannes-ucsc removed this from the Sprint 06 (3.5.0) milestone Jul 29, 2016

cket added the discuss label Dec 1, 2016

ejacox added planned and removed discuss labels Aug 16, 2017

cricketsloan added the epic label Jun 29, 2018

cricketsloan unassigned hannes-ucsc Nov 27, 2018

This was referenced Apr 15, 2020

Replace simpleDB (dynamoDB?). #2632

Closed

AWSJobStores are unrestartable after ~2M files (~10GB metadata) #1809

Closed

DailyDreaming added the roadmap label Apr 15, 2020

unito-bot assigned w-gao Oct 6, 2020

unito-bot assigned DailyDreaming and unassigned w-gao Mar 8, 2021

unito-bot unassigned DailyDreaming Oct 19, 2021

adamnovak mentioned this issue Nov 17, 2021

toil clean should be able to destroy non-resumable AWS job stores #3924

Closed

unito-bot changed the title ~~Replace SDB with DynamoDB in AWS job store~~ Replace SDB with S3 in AWS job store Jan 13, 2022

unito-bot assigned DailyDreaming Feb 7, 2022

adamnovak linked a pull request Mar 30, 2022 that will close this issue

Replace simpledb. #3569

Draft

19 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace SDB with S3 in AWS job store #964

Replace SDB with S3 in AWS job store #964

hannes-ucsc commented Jun 10, 2016 •

edited by unito-bot

Loading

cket commented Dec 1, 2016

abatilo commented Jan 31, 2021

DailyDreaming commented Jan 31, 2021

abatilo commented Jan 31, 2021 •

edited

Loading

DailyDreaming commented Jan 31, 2021

DailyDreaming commented Feb 10, 2021

abatilo commented Feb 10, 2021

abatilo commented Feb 17, 2021

DailyDreaming commented Feb 25, 2021

abatilo commented Feb 27, 2021

DailyDreaming commented Mar 3, 2021

unito-bot commented Jan 13, 2022

stain commented Jan 19, 2022 •

edited

Loading

Guigzai commented Feb 1, 2024

unito-bot commented Feb 13, 2024

davidjsherman commented Feb 15, 2024

adamnovak commented Apr 30, 2024

davidjsherman commented May 1, 2024

stxue1 commented Jun 19, 2024

Replace SDB with S3 in AWS job store #964

Replace SDB with S3 in AWS job store #964

Comments

hannes-ucsc commented Jun 10, 2016 • edited by unito-bot Loading

cket commented Dec 1, 2016

abatilo commented Jan 31, 2021

DailyDreaming commented Jan 31, 2021

abatilo commented Jan 31, 2021 • edited Loading

DailyDreaming commented Jan 31, 2021

DailyDreaming commented Feb 10, 2021

abatilo commented Feb 10, 2021

abatilo commented Feb 17, 2021

DailyDreaming commented Feb 25, 2021

abatilo commented Feb 27, 2021

DailyDreaming commented Mar 3, 2021

unito-bot commented Jan 13, 2022

stain commented Jan 19, 2022 • edited Loading

Guigzai commented Feb 1, 2024

unito-bot commented Feb 13, 2024

davidjsherman commented Feb 15, 2024

adamnovak commented Apr 30, 2024

davidjsherman commented May 1, 2024

stxue1 commented Jun 19, 2024

hannes-ucsc commented Jun 10, 2016 •

edited by unito-bot

Loading

abatilo commented Jan 31, 2021 •

edited

Loading

stain commented Jan 19, 2022 •

edited

Loading