worker+runner: Dynamically start live inference runners #275

victorges · 2024-11-14T00:27:09Z

This implements support on the worker to dynamically start the runner containers
and manage their lifecycle.

This is slightly different from how the existing APIs work, since the runner container
runs long past the original request that started it. The solution was:

Just start the container on the request, but don't return it automatically
Implement a logic on the runner to stop the process when the input stops
Implement a monitor on the worker to track when the container stops
Update the worker's docker manager state when the container stop to allow other containers to run

I haven't been able to test this yet because my dev machine exploded. I'll create
a new one to try this out.

Required to run gen_openapi.py. If the idea is to not install this on runtime, it should at least be documented on README.

The live container will stop automatically and we should auto-remove it to free up space.

j0sh

🚢 🚢 🚢

leszko · 2024-11-14T08:08:57Z

runner/app/pipelines/base.py

@@ -5,8 +5,9 @@
 class Pipeline(ABC):
    @abstractmethod
    def __init__(self, model_id: str, model_dir: str):
+        self.model_id: str # type hint so we can use the field in routes


what does this comment mean?

# type hint so we can use the field in routes

This avoids a typing error on all the route/ implementations when they used the model_id field on the abstract Pipeline class (currently tons of typing errors on this proj, I'll try to clear them up as I make changes).

I'm changing this comment to this which might be clearer?

# declare the field here so the type hint is available when using this abstract class

leszko

LGTM

rickstaa · 2024-11-14T09:33:42Z

worker/docker.go

@@ -275,6 +285,8 @@ func (m *DockerManager) createContainer(ctx context.Context, pipeline string, mo
 	m.containers[containerName] = rc
 	m.gpuContainers[gpu] = containerName

+	go m.watchContainer(ctx, rc)


@victorges, @leszko I like your changes! However, I think there’s an advantage to keeping containers warm for a certain amount of time for batch jobs if requests continue to come in. This could help improve response times and resource efficiency. I haven't diven to much in the code yet but my group can add this as a parameter in a subsequent pull request.

I see! So we should not remove the container when the context is cancelled right? I can change the logic to just return the container on that case instead WDYT? With that I can even remove the "return container" function that is currently called explicitly, and instead just let the callers pass a ctx that is cancelled when the request is done and the container is auto-returned.

Implemented on 4d5adbe !

Noticed there was a potential race happening on the Stop function all updating the state concurrently.

victorges added 10 commits November 12, 2024 17:47

runner/requirements: Add missing PyYAML dependency

e630fe2

Required to run gen_openapi.py. If the idea is to not install this on runtime, it should at least be documented on README.

runner: A couple typing fixes

79b560e

runner: Improve documentation on live_video_to_video route

b43933c

worker: Implement API for live-video-to-video (naive)

50dd632

worker: Stop returning the live containers

a292c35

worker: Auto-remove containers on stop

4dee298

The live container will stop automatically and we should auto-remove it to free up space.

worker: Create function for destroy container logic

cad1bcc

worker: Watch running containers and cleanup when stopped

4b314bc

runner/live: Automatically stop infer.py when input stops

94e17d3

runner: Exit runner process on clean infer.py exit

7459c3b

victorges requested review from j0sh and leszko November 14, 2024 00:27

victorges requested a review from rickstaa as a code owner November 14, 2024 00:27

victorges requested a review from mjh1 November 14, 2024 00:27

j0sh approved these changes Nov 14, 2024

View reviewed changes

leszko reviewed Nov 14, 2024

View reviewed changes

leszko approved these changes Nov 14, 2024

View reviewed changes

rickstaa reviewed Nov 14, 2024

View reviewed changes

victorges added 4 commits November 14, 2024 11:08

runner: Improve Pipeline.model_id comment

3c4a2e4

worker: Avoid stopping the container when borrow ctx is done

4d5adbe

worker+runner: Fix generated files

ce79739

Merge branch 'main' into vg/feat/live-v2v-api

4286656

mjh1 approved these changes Nov 19, 2024

View reviewed changes

victorges added 3 commits November 19, 2024 19:21

Merge branch 'main' into vg/feat/live-v2v-api

9c64427

Generate runner.gen.go after merge

aa4f430

docker: Self-review last nits

e8da002

Noticed there was a potential race happening on the Stop function all updating the state concurrently.

victorges merged commit f3c00fe into main Nov 19, 2024
4 checks passed

victorges deleted the vg/feat/live-v2v-api branch November 19, 2024 23:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

worker+runner: Dynamically start live inference runners #275

worker+runner: Dynamically start live inference runners #275

victorges commented Nov 14, 2024

j0sh left a comment

leszko Nov 14, 2024

victorges Nov 14, 2024

leszko left a comment

rickstaa Nov 14, 2024 •

edited

Loading

victorges Nov 14, 2024

victorges Nov 14, 2024

worker+runner: Dynamically start live inference runners #275

worker+runner: Dynamically start live inference runners #275

Conversation

victorges commented Nov 14, 2024

j0sh left a comment

Choose a reason for hiding this comment

leszko Nov 14, 2024

Choose a reason for hiding this comment

victorges Nov 14, 2024

Choose a reason for hiding this comment

leszko left a comment

Choose a reason for hiding this comment

rickstaa Nov 14, 2024 • edited Loading

Choose a reason for hiding this comment

victorges Nov 14, 2024

Choose a reason for hiding this comment

victorges Nov 14, 2024

Choose a reason for hiding this comment

rickstaa Nov 14, 2024 •

edited

Loading