-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(api, backend): load and expose backend model info at runtime #890
base: main
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for leapfrogai-docs ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
42717b4
to
e3f755a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great so far! I'll do a deeper dive once it's fully ready. Just some comments to propagate downwards and discuss. Thanks!
src/leapfrogai_api/backend/types.py
Outdated
default=None, | ||
description="Embedding dimensions (for embeddings models)", | ||
) | ||
precision: str | None = Field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to constrain this further to only accept: float16, float32, bfloat16, int8, int4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 do you have a preference on using enums or anything? I am totally happy to make a new model referencing that (and refactoring accordingly)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also brings up a missing field: format. This includes None, GPTQ, GGUF, SqueezeLLM, AWQ. This depends on whether the model was quantized, and whether it is compatible with vLLM and/or llama.cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enums work with me! Didn't want to specify implementation, just the content.
src/leapfrogai_api/backend/types.py
Outdated
@@ -54,6 +53,25 @@ class Usage(BaseModel): | |||
########## | |||
|
|||
|
|||
class ModelMetadataResponse(BaseModel): | |||
type: Literal["embeddings", "llm"] | None = Field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Type and capabilities have cross-over based on the descriptions/examples here. We should be instead identifying modalities (image, text, speech) and capabilities (embeddings, chat, TTS, STT).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, should we maybe differentiate on this too? Would it make sense to alias type
to modalities
(to encapsulate what you said) and then adding an enum for capabilities?
I don't know how type
is used outside of this context and want to make sure it conforms to the rest of the repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I was thinking there are vision embeddings-only models versus vision comprehension-only (chat), for example. So the combination of modality and capability works; however, if you're saying type is basically modality, then I am okay with that. Modality is more of an industry-standard term (e.g., how HF organizes models using semantic search: https://huggingface.co/models)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@justinthelaw I think I am a bit unsure, to be honest! Referencing this config map I "lifted" type out of what was defined already in the config, based off of this convo. Tentatively looking at an implementation like this though (and will proceed with this for now unless you have any opinions otherwise). I am keeping defaults to None to ensure backward compatibility with existing implementations.
class ModelMetadataResponse(BaseModel):
"""Metadata for the model, including type, dimensions (for embeddings), and precision."""
model_config = ConfigDict(use_enum_values=True)
capabilities: list[Capability] | None = Field(
default=None,
description="Model capabilities (e.g., 'embeddings', 'chat', 'tts', 'stt')",
)
dimensions: int | None = Field(
default=None,
description="Embedding dimensions (for embeddings models)",
)
format: Format | None = Field(
default=None,
description="Model format (e.g., None, 'GPTQ', 'GGUF', 'SqueezeLLM', 'AWQ')",
)
modalities: list[Modality] | None = Field(
default=None,
description="The modalities of the model (e.g., 'image', 'text', 'speech')",
)
precision: Precision | None = Field(
default=None,
description="Model precision (e.g., 'float16', 'float32', 'bfloat16', 'int8', 'int4')",
)
type: Literal["embeddings", "llm"] | None = Field(
default=None,
description="The type of the model e.g. ('embeddings' or 'llm')",
)
I do think it might be a good idea to update the existing configmaps in the repo to include these fields. I'm happy to make another issue for that if it's useful just let me know.
Outside of the PR feedback is there anything else missing? This worked in terms of interacting with the API and wasn't sure if I needed to update things anywhere else to make it "complete". Let me know! |
This PR is perfectly constrained to the issue's main concerns, as long as the unit tests are there/updated properly for the new config and fields. Areas like UI consuming config and Backends producing config are meant for future tickets. |
a7c48c3
to
af85681
Compare
… singleton, stop event stuff
cce96e0
to
f276aba
Compare
for model in model_config.models: | ||
m = ModelResponseModel(id=model) | ||
# shared config object from the app | ||
model_config: "Config" = session.app.state.config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like there might be an issue here, the supabase_session
object doesn't have a property named app so its failing for me when I hit this endpoint: https://leapfrogai-api.uds.dev/openai/v1/models
As we icebox the repo, we are leaving this draft PR as a good place to pickup if we resume work. The goal behind the work was to expose model information such as type (embeddings, chat, etc.) so that the API can infer the model backend without requiring additional state. |
Summary
Fixes: #590
Fixes: #952
This PR introduces the ability to load and expose model metadata from configuration files at runtime within the API. This required somewhat significant refactoring of the
Config
class to allow easier testing.I took care to retain function signatures with defaults if needed, and do not believe this should contain any breaking changes.
ModelMetadataResponse
class intypes.py
to encapsulate model metadata such as type, dimensions, precision, and capabilities.ModelMetadata
dataclass to represent metadata.Config
class to include methods for loading and processing configuration files, including validation of metadata fields.models
endpoint inmodels.py
to include metadata in the response.Config
functionalities and validated correct handling of configuration files.Config
class to improve testability while retaining the original behavior.lifespan
method inmain.py
was causing a race condition in the API tests. I altered this slightly to explicitly cleanup on shutdown.Additional Context from internal Slack conversation:
capabilities
within metadata.Config
class, leveragingpytest
for unit testing.Testing coverage from this PR:
TODO:
pytest
/pytest-asyncio
. Is it appropriate to add dev dependencies into/src/leapfrogai_api/pyproject.toml
?