EPIC: Implement Model Directory #623

barronstone · 2024-06-13T10:56:20Z

Overview

LeapfrogAI should implement a model directory to store and serve models. This would significantly reduce the size of LeapfrogAI Zarf packages, provide users with more model options, and enable the backends to dynamically swap out models for different tasks.

Background

Currently, model parameters are directly “baked in” to backend Zarf packages. This makes LeapfrogAI packages extra large, often exceeding 8GB, which causes automated deployment issues that require manual intervention to work around. Additionally, each backend, such as vllm, is only packaged with a single model. If a user wants access to multiple LLMs for different tasks (e.g., one model for chat and another for coding assistance) then they would need two vllm packages, one for each model. That current approach makes it cumbersome and impractical for users to switch between models.

Externally, we are hearing demand for users to be able to select from a collection of models to use from organizations like Platform One. This is a common and expected feature that is available in most modern AI chat interfaces. Internally, for the LeapfrogAI team to efficiently perform evaluations on a variety of models, we need a way to quickly and efficiently swap out models. Incorporating a model directory into LeapfrogAI could solve these challenges - decoupling model parameters from Zarf packages and enabling users to dynamically select which models to use from those available in the model directory.

Goals

Models are no longer directly “baked in” to the backend Zarf packages
Multiple models can be included in the UDS bundle for initial air-gapped deployment
The model directory can store and serve one or more models for each of the following types:
- Text-to-text (LLM)
- Text-to-vector (Embeddings)
- Speech-to-text (Whisper)
System administrators can add new models to the model directory of an existing LeapfrogAI deployment in an air-gapped manner (No Internet connection required)
Users can select from available LLM models for an Assistant to use via the GUI and the API

User Stories

As a delivery engineer deploying LeapfrogAI
I want the models to be separate from the backend Zarf packages
So that I can easily choose which models to deploy without having to rebuild packages, and
So that the packages are smaller and therefore do not require manual steps to push them

As a LeapfrogAI end user
I want to be able to select from multiple LLMs
So that I can choose the best model to use for a specific task

Acceptance Criteria - TODO

Given [a state]
When [an action is taken]
Then [something happens]

Additional context

In-work technical design doc in LeapfrogAI Coda: https://coda.io/d/_dGmk3eNjmm8/Model-Directory_suuoJYJF

Tasks

Give feedback

ADR: Breaking models weights out of model images #752

ADR 🧐
feat(api, backend): load and expose backend model info at runtime #890

python tech-debt wontfix
feat(backend): create yaml files to store model configs
feat(ui): UI lets users select from one or more LLMs in the Model Directory to use with an Assistant #672

blocked 🛑 enhancement ui
Options

The text was updated successfully, but these errors were encountered:

barronstone added the enhancement New feature or request label Jun 13, 2024

barronstone added this to the v0.9.0 - RAG Improvements | Model Directory | Odds and Ends milestone Jun 13, 2024

barronstone assigned barronstone and YrrepNoj and unassigned barronstone Jun 13, 2024

barronstone changed the title ~~feat(api, backend) Implement Model Directory~~ EPIC: Implement Model Directory Jul 2, 2024

barronstone added the EPIC ⚔️ EPIC issue to consolidate several sub-issues label Jul 2, 2024

This was referenced Jul 8, 2024

spike: inference engine backends registry1 image replacements #737

Open

EPIC: IronBank LeapfrogAI Hardening #750

Open

YrrepNoj mentioned this issue Jul 11, 2024

ADR: Breaking models weights out of model images #752

Draft

CollectiveUnicorn modified the milestones: Current - RAG UX Enhancements | Model Directory | API Odds and Ends, EVERGREEN, Next - Speech-to-Text UI | Evals Jul 25, 2024

justinthelaw mentioned this issue Sep 16, 2024

feat(vllm)!: upgrade vllm backend and refactor deployment #854

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EPIC: Implement Model Directory #623

EPIC: Implement Model Directory #623

barronstone commented Jun 13, 2024 •

edited by alekst23

Loading

Tasks

EPIC: Implement Model Directory #623

EPIC: Implement Model Directory #623

Comments

barronstone commented Jun 13, 2024 • edited by alekst23 Loading

Overview

Background

Goals

User Stories

Acceptance Criteria - TODO

Additional context

Tasks

barronstone commented Jun 13, 2024 •

edited by alekst23

Loading