Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to ensure ordering of jobs in case of delayed retries. #68

Open
ksyd9821 opened this issue Mar 4, 2024 · 12 comments
Open

How to ensure ordering of jobs in case of delayed retries. #68

ksyd9821 opened this issue Mar 4, 2024 · 12 comments

Comments

@ksyd9821
Copy link

ksyd9821 commented Mar 4, 2024

Hello,

for our current use case, we are adding jobs to a queue and would like them to be successfully processed in the same order they were added (FIFO).
So when a job fails, the desired outcome would be that the job is retried automatically after some delay before moving on to the next job on the queue.

For example, if we have these 3 jobs in our queue [1, 2, 3] with job 1 being the first added job, here is what a possible execution would look like:

  • process job 1
  • job 1 fails
  • X amount of delay
  • process job 1 again
  • job 1 completed
  • process job 2
  • process job 3

How can we achieve this with bullmq?

Thank you in advance for the support!

@manast
Copy link
Contributor

manast commented Mar 4, 2024

Thank you for your question. I am afraid that currently, when a job fails, the queue is not halted, so the other jobs waiting to be processed will be processed as soon as a worker is free.
How critical is this case for you? can you develop a bit more the whole scenario where this functionality is needed?

@hardcodet
Copy link

hardcodet commented Mar 4, 2024

@manast

How critical is this case for you? can you develop a bit more the whole scenario where this functionality is needed?

It is critical, unfortunately. Our use case is a number of event queues for webhooks (each queue representing a customer's subscription), where we would like to submit events in proper order. We see that in practice, webhooks are sometimes not working (e.g. the customer's endpoint is temporary available) and need to be retried, but we can't have those events move to the back of the queue, because order matters.

As a dummy example: imagine two events occurring in this order:

  1. The system is offline
  2. The system is online

If we would send these events in inverted order, the outcome on the customer end would be completely wrong, since they assumed the system is offline, and might cease communication to it.

@manast
Copy link
Contributor

manast commented Mar 5, 2024

Ok, so this function would be specific for groups, where a group would not continue processing new jobs until the previous one have been completely completed or failed, furthermore this feature would only make sense with concurrency equal 1.
We need to study to see how feasible this feature is in current design.

@hardcodet
Copy link

You're right. We're already using concurrency of 1 extensively to enforce sequential processing because there's a lot of cases for us that warrant that. Preserving order on retries is just one flavor more.

If it's a bad fit for BullMQ, we could work around the issue with the following strategy I guess:

  • handle the error ourselves, and
    • pause the queue
    • mark the failed job as completed
    • create a new job with the same payload and enqueue it LIFO
  • re-enable the queue after the retry delay

This is absolutely feasible for us. We just figured that ordered processing (including retries with backoff delays) would be a common scenario, so we wanted to discuss this with you first 👍

@hardcodet
Copy link

It wouldn't be specific for groups though: we thought about creating a queue for each customer (rather than groups with a customer ID), which would reduce the complexity for the retries remarkably compared to queues that still would have to process events for other groups.

@manast
Copy link
Contributor

manast commented Mar 11, 2024

We are working on a solution for this in BullMQ and then we will extend it to groups as well, this is the PR: taskforcesh/bullmq#2465

@hardcodet
Copy link

You guys rock! Looking forward to the implementation :)

@Adam-Burke
Copy link

Just wondering how this is progressing. I'm processing ordered sports facts from third parties and being able to block at the group level would be fantastic.

@manast
Copy link
Contributor

manast commented Jun 24, 2024

@Adam-Burke yes, we have this PR almost ready. The biggest issue I see with this is that despite everything the order cannot be guaranteed as long as you have more than one worker, as even though they will pick the jobs in order, due to network latencies and such, it is possible that one worker will start processing a job before the other one that is running in a different machine or process.

@Adam-Burke
Copy link

Could there be a way to ensure that jobs from the same group would always be processed by the same worker (assuming it's still running). So you could still scale out workers but have group-based, at-least-once, ordered processing?

Either way I think it still is quite useful for our purposes.

@manast
Copy link
Contributor

manast commented Jun 26, 2024

@Adam-Burke Let's see. If you used groups with max concurrency 1, then it is guaranteed that only 1 job will be processed per group, therefore order is guaranteed within a group, except for the fail case with retries. So if we supported this case, (keep order within a group for retries), would that solve your use case?

UPDATE: sorry for the confusion, now I see that this issue is exactly about this... so yes, basically we will support this case soon.

@rnevet-reply
Copy link

Hi,
We are also just facing this issue, I also see that the PR is in progress, can someone estimate how much work/time is left on this? can I be optimistic that this new feature will be soon available? when more or less would be super helpful?
Thanks and sorry for nagging!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants