You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
LeapfrogAI should implement a model directory to store and serve models. This would significantly reduce the size of LeapfrogAI Zarf packages, provide users with more model options, and enable the backends to dynamically swap out models for different tasks.
Background
Currently, model parameters are directly “baked in” to backend Zarf packages. This makes LeapfrogAI packages extra large, often exceeding 8GB, which causes automated deployment issues that require manual intervention to work around. Additionally, each backend, such as vllm, is only packaged with a single model. If a user wants access to multiple LLMs for different tasks (e.g., one model for chat and another for coding assistance) then they would need two vllm packages, one for each model. That current approach makes it cumbersome and impractical for users to switch between models.
Externally, we are hearing demand for users to be able to select from a collection of models to use from organizations like Platform One. This is a common and expected feature that is available in most modern AI chat interfaces. Internally, for the LeapfrogAI team to efficiently perform evaluations on a variety of models, we need a way to quickly and efficiently swap out models. Incorporating a model directory into LeapfrogAI could solve these challenges - decoupling model parameters from Zarf packages and enabling users to dynamically select which models to use from those available in the model directory.
Goals
Models are no longer directly “baked in” to the backend Zarf packages
Multiple models can be included in the UDS bundle for initial air-gapped deployment
The model directory can store and serve one or more models for each of the following types:
Text-to-text (LLM)
Text-to-vector (Embeddings)
Speech-to-text (Whisper)
System administrators can add new models to the model directory of an existing LeapfrogAI deployment in an air-gapped manner (No Internet connection required)
Users can select from available LLM models for an Assistant to use via the GUI and the API
User Stories
As a delivery engineer deploying LeapfrogAI I want the models to be separate from the backend Zarf packages So that I can easily choose which models to deploy without having to rebuild packages, and So that the packages are smaller and therefore do not require manual steps to push them
As a LeapfrogAI end user I want to be able to select from multiple LLMs So that I can choose the best model to use for a specific task
Acceptance Criteria - TODO
Given [a state] When [an action is taken] Then [something happens]
Overview
LeapfrogAI should implement a model directory to store and serve models. This would significantly reduce the size of LeapfrogAI Zarf packages, provide users with more model options, and enable the backends to dynamically swap out models for different tasks.
Background
Currently, model parameters are directly “baked in” to backend Zarf packages. This makes LeapfrogAI packages extra large, often exceeding 8GB, which causes automated deployment issues that require manual intervention to work around. Additionally, each backend, such as
vllm
, is only packaged with a single model. If a user wants access to multiple LLMs for different tasks (e.g., one model for chat and another for coding assistance) then they would need twovllm
packages, one for each model. That current approach makes it cumbersome and impractical for users to switch between models.Externally, we are hearing demand for users to be able to select from a collection of models to use from organizations like Platform One. This is a common and expected feature that is available in most modern AI chat interfaces. Internally, for the LeapfrogAI team to efficiently perform evaluations on a variety of models, we need a way to quickly and efficiently swap out models. Incorporating a model directory into LeapfrogAI could solve these challenges - decoupling model parameters from Zarf packages and enabling users to dynamically select which models to use from those available in the model directory.
Goals
User Stories
As a delivery engineer deploying LeapfrogAI
I want the models to be separate from the backend Zarf packages
So that I can easily choose which models to deploy without having to rebuild packages, and
So that the packages are smaller and therefore do not require manual steps to push them
As a LeapfrogAI end user
I want to be able to select from multiple LLMs
So that I can choose the best model to use for a specific task
Acceptance Criteria - TODO
Given [a state]
When [an action is taken]
Then [something happens]
Additional context
In-work technical design doc in LeapfrogAI Coda: https://coda.io/d/_dGmk3eNjmm8/Model-Directory_suuoJYJF
Tasks
The text was updated successfully, but these errors were encountered: