-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
worker+runner: Dynamically start live inference runners #275
Conversation
Required to run gen_openapi.py. If the idea is to not install this on runtime, it should at least be documented on README.
The live container will stop automatically and we should auto-remove it to free up space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚢 🚢 🚢
runner/app/pipelines/base.py
Outdated
@@ -5,8 +5,9 @@ | |||
class Pipeline(ABC): | |||
@abstractmethod | |||
def __init__(self, model_id: str, model_dir: str): | |||
self.model_id: str # type hint so we can use the field in routes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this comment mean?
# type hint so we can use the field in routes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This avoids a typing error on all the route/
implementations when they used the model_id
field on the abstract Pipeline
class (currently tons of typing errors on this proj, I'll try to clear them up as I make changes).
I'm changing this comment to this which might be clearer?
# declare the field here so the type hint is available when using this abstract class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
worker/docker.go
Outdated
@@ -275,6 +285,8 @@ func (m *DockerManager) createContainer(ctx context.Context, pipeline string, mo | |||
m.containers[containerName] = rc | |||
m.gpuContainers[gpu] = containerName | |||
|
|||
go m.watchContainer(ctx, rc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@victorges, @leszko I like your changes! However, I think there’s an advantage to keeping containers warm for a certain amount of time for batch jobs if requests continue to come in. This could help improve response times and resource efficiency. I haven't diven to much in the code yet but my group can add this as a parameter in a subsequent pull request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see! So we should not remove the container when the context is cancelled right? I can change the logic to just return the container on that case instead WDYT? With that I can even remove the "return container" function that is currently called explicitly, and instead just let the callers pass a ctx that is cancelled when the request is done and the container is auto-returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented on 4d5adbe !
Noticed there was a potential race happening on the Stop function all updating the state concurrently.
This implements support on the worker to dynamically start the runner containers
and manage their lifecycle.
This is slightly different from how the existing APIs work, since the runner container
runs long past the original request that started it. The solution was:
I haven't been able to test this yet because my dev machine exploded. I'll create
a new one to try this out.