-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: initial watchdog implementation #1341
Conversation
Signed-off-by: Ettore Di Giacinto <[email protected]>
✅ Deploy Preview for localai canceled.
|
Can you give a quick example of how this is supposed to interact with multiple backends? Currently, it looks like the timeout setting isn't backend specific, which is probably fine for now - but since I'd like to leverage this to improve the monitoring endpoints, I want to make sure I understand when the timer is being reset and what that interval is measuring. Thanks! |
It currently monitor all active connections, connections are recorded by the GRPC client, and when a backend becomes busy (starts processing a request), the current time is recorded in timetable for that backend. If the backend remains busy for longer than timeout, an action (like logging a warning or shutting down the backend) could be triggered (like now stops the backend directly). Similarly, when a backend becomes idle (finishes processing a request), the current time is recorded in idleTime for that backend. If the backend remains idle for longer than idletimeout, it gets killed. This was asked in #1202 and took the occasion to implement it here as most of the logic applies to as well. At the moment is possible to define timeout durations, enable and/or disable it (defaults to disabled), keeping it very simple to have a starting point. |
one enhancement for later: the current implementation - if a backend is stale - will cut the request. it should be possible instead to keep it alive and try again after the backend was shutdown |
Thanks for confirming that Mudler! That's pretty close to what I thought but it's good to check. The one feature request I have (even if it's not in the very first pr) is to expose that timetable from Watchdog, so that the monitoring endpoints can dig up data like when a backend was last used. Thanks!! |
Signed-off-by: Ettore Di Giacinto <[email protected]>
make totally sense, not exposing it now as it would not be used in the code and would be confusing but should be easy to iterate on it |
I'm not super-satisfied, but it's ok for a first stab at it. It works locally, and it's disabled by default, so should be good to go |
Description
This PR fixes #1339 and fixes #1202. Besides should alleviate also issues like #1017 and ggerganov/llama.cpp#3969 once for good
The WatchDog implementation (disabled by default) is designed to monitor and manage multiple backends. It keeps track of the last active times and idle times of each backend, and can stop them if a backend has been busy or idle for too long.
Key components of the WatchDog struct include:
timetable
: A map that stores the last active time of each backend.idleTime
: A map that stores the last idle time of each backend.timeout
andidletimeout
: Duration values that represent the maximum allowed busy and idle times for a backend, respectively.To turn on the watchdog, configure the following environment variables:
With the CLI:
--enable-watchdog-idle
,--enable-watchdog-busy
,--watchdog-busy-timeout
,--watchdog-idle-timeout
.Notes for Reviewers
Signed commits