Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem: it is not possible to submit batches of transfers #137

Closed
sevein opened this issue Jun 10, 2020 · 0 comments
Closed

Problem: it is not possible to submit batches of transfers #137

sevein opened this issue Jun 10, 2020 · 0 comments

Comments

@sevein
Copy link
Member

sevein commented Jun 10, 2020

Users want to submit batches of transfers that are physically present in a local filesystem or a network share under a known path.


The solution that we're considering is introducing a new Batch API that allow users to submit a new batch with a simple path parameter indicating its physical location. The API triggers a new workflow batch that, using an activity, scans the given location and starts new processing workflows for each entry found.

  • POST /batch
    Submission of batch.
    Takes a path.

  • GET /batch
    Returns workflow status.

(we do very similar in /collection/bulk)

These are some considerations and/or compromises for this first iteration:

  • The underlying batch workflow will use a predefined identifier to avoid concurrent batches, which is something that we want to avoid while we explore our solution and its implications,
  • The batch workflow does not wait for processing workflows to complete, e.g. we're purposely avoiding parent-child workflow relationships because the cardinality is unbounded - what we plan is to write an activity that scans the location and fires the processing workflows using the Cadence client (we think this is not a problem as long as we reuse the same client during the life-cycle of the activity),
  • We need to process directories found under the given path, but this is going to require some refactoring in the processing workflow and its input parameters,
  • We do not expect the batch to mutate once it is submitted by the user, i.e. we don't need to track changes during its processing,
  • One compromise is not to worry about data locality for now - the batch must be locally available wherever Enduro is running, including its activity worker (which is going to be the case because of Problem: workers can't be deployed separately #37) - this is obviously going to become a blocker when we want to run Enduro at scale but that's not currently a priority for our customers.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant