Pipeline runs as background jobs #154

dinmukhamedm · 2024-11-05T02:50:08Z

Currently, our network configurations in the managed versions cut TCP (TLS to be more specific) connections after 350 seconds. For most of our APIs this is more than enough, but pipeline runs sometimes take longer, especially with the rise of larger/slower models, like o1.

We need to add an ability to run pipelines as background jobs.

Currently, this is rather a discussion, not a call for PR. I see two possible ways forward, but we are open to more suggestions as always.

Polling job. Client submits a job, gets a run_id and polls on it.
- Pros:
  - No need to care about network cutting anything short, as all responses are very quick
- Cons:
  - If polling is user's responsibility, then this overall makes UX much worse. If polling is hidden in our SDK, then we need to be careful about the intervals in order not to cause too much load.
  - We'll need some infrastructure (separate DB table?) to keep the status of running jobs
Websocket. Client opens a websocket connection, and it's the server's responsibility to periodically ping the connection to keep it alive.
- Pros:
  - Job state is kept in memory, similar to now, so not much additional infra
- Cons:
  - We need to design extensible messaging protocol with reliable ping/pong requests to make sure the connection does not close

We are open to discussions for the best way forward and any other suggestions

The text was updated successfully, but these errors were encountered:

nagxsan · 2024-12-10T15:51:39Z

Can we integrate approach 1 with some sort of notification functionality?

Instead of the user polling continuously on the run_id why not keep it completely asynchronous, and once the run has completed execution, the server sends a notification object to the front-end indicating the run has completed execution (success/failure).
We may need to maintain a separate table which includes the run_id and the status and the server updates this status upon run completion.
The front-end would get this status from the database and show it to the user. Until the notification is received, we can show the status as Pending.
Also if needed the run can have a scheduled timeout (for example 5 minutes) and if this time has passed and there is no response, we indicate the same with a failure status?

Please let me know if I am missing something crucial or going wrong somewhere.

dinmukhamedm added enhancement New feature or request help wanted Extra attention is needed labels Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pipeline runs as background jobs #154

Pipeline runs as background jobs #154

dinmukhamedm commented Nov 5, 2024

nagxsan commented Dec 10, 2024

Pipeline runs as background jobs #154

Pipeline runs as background jobs #154

Comments

dinmukhamedm commented Nov 5, 2024

nagxsan commented Dec 10, 2024