Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(api, backend): load and expose backend model info at runtime #890

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

jamestexas
Copy link

@jamestexas jamestexas commented Aug 7, 2024

Summary

Fixes: #590
Fixes: #952

This PR introduces the ability to load and expose model metadata from configuration files at runtime within the API. This required somewhat significant refactoring of the Config class to allow easier testing.

  • The key changes before that refactoring can be found in this commit: ad3ddea
  • Refactoring and testing changes can be found here: 9727d5b

I took care to retain function signatures with defaults if needed, and do not believe this should contain any breaking changes.

  1. Model Metadata Response: Added a new ModelMetadataResponse class in types.py to encapsulate model metadata such as type, dimensions, precision, and capabilities.
  2. Config Enhancements:
    • Introduced a ModelMetadata dataclass to represent metadata.
    • Updated Config class to include methods for loading and processing configuration files, including validation of metadata fields.
    • Ensured mutable default arguments were replaced.
  3. API Route Update: Modified the models endpoint in models.py to include metadata in the response.
  4. Testing Improvements: Added tests for the new Config functionalities and validated correct handling of configuration files.
  5. Significant Refactoring for Testability: Refactored the Config class to improve testability while retaining the original behavior.
  6. Tweak to main.py - the lifespan method in main.py was causing a race condition in the API tests. I altered this slightly to explicitly cleanup on shutdown.

Additional Context from internal Slack conversation:

  • Confirmed metadata should be an object and discussed potential placement of capabilities within metadata.
  • Highlighted the need for and added tests for the Config class, leveraging pytest for unit testing.

Testing coverage from this PR:

---------- coverage: platform darwin, python 3.12.4-final-0 ----------
Name                                 Stmts   Miss Branch BrPart  Cover   Missing
--------------------------------------------------------------------------------
src/leapfrogai_api/utils/config.py     130     13     35      0    90%   113-120, 140-141, 160-161, 228-229
--------------------------------------------------------------------------------
TOTAL                                  130     13     35      0    90%

TODO:

  • Update test dependencies to include pytest / pytest-asyncio. Is it appropriate to add dev dependencies into /src/leapfrogai_api/pyproject.toml?

@jamestexas jamestexas requested a review from a team as a code owner August 7, 2024 22:42
Copy link

netlify bot commented Aug 7, 2024

Deploy Preview for leapfrogai-docs ready!

Name Link
🔨 Latest commit 101b897
🔍 Latest deploy log https://app.netlify.com/sites/leapfrogai-docs/deploys/66db754fca2baf0008610c6f
😎 Deploy Preview https://deploy-preview-890--leapfrogai-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
Lighthouse
Lighthouse
1 paths audited
Performance: 42 (🟢 up 19 from production)
Accessibility: 98 (no change from production)
Best Practices: 100 (no change from production)
SEO: 92 (no change from production)
PWA: -
View the detailed breakdown and full score reports

To edit notification comments on pull requests, go to your Netlify site configuration.

@jamestexas jamestexas force-pushed the feat/api-backend-load-model-info-590 branch 5 times, most recently from 42717b4 to e3f755a Compare August 9, 2024 03:43
@jamestexas jamestexas added tech-debt Not a feature, but still necessary python Pull requests that update Python code labels Aug 9, 2024
Copy link
Contributor

@justinthelaw justinthelaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great so far! I'll do a deeper dive once it's fully ready. Just some comments to propagate downwards and discuss. Thanks!

default=None,
description="Embedding dimensions (for embeddings models)",
)
precision: str | None = Field(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to constrain this further to only accept: float16, float32, bfloat16, int8, int4.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 do you have a preference on using enums or anything? I am totally happy to make a new model referencing that (and refactoring accordingly)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also brings up a missing field: format. This includes None, GPTQ, GGUF, SqueezeLLM, AWQ. This depends on whether the model was quantized, and whether it is compatible with vLLM and/or llama.cpp

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enums work with me! Didn't want to specify implementation, just the content.

@@ -54,6 +53,25 @@ class Usage(BaseModel):
##########


class ModelMetadataResponse(BaseModel):
type: Literal["embeddings", "llm"] | None = Field(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Type and capabilities have cross-over based on the descriptions/examples here. We should be instead identifying modalities (image, text, speech) and capabilities (embeddings, chat, TTS, STT).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, should we maybe differentiate on this too? Would it make sense to alias type to modalities (to encapsulate what you said) and then adding an enum for capabilities?
I don't know how type is used outside of this context and want to make sure it conforms to the rest of the repo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I was thinking there are vision embeddings-only models versus vision comprehension-only (chat), for example. So the combination of modality and capability works; however, if you're saying type is basically modality, then I am okay with that. Modality is more of an industry-standard term (e.g., how HF organizes models using semantic search: https://huggingface.co/models)

Copy link
Author

@jamestexas jamestexas Aug 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justinthelaw I think I am a bit unsure, to be honest! Referencing this config map I "lifted" type out of what was defined already in the config, based off of this convo. Tentatively looking at an implementation like this though (and will proceed with this for now unless you have any opinions otherwise). I am keeping defaults to None to ensure backward compatibility with existing implementations.

class ModelMetadataResponse(BaseModel):
    """Metadata for the model, including type, dimensions (for embeddings), and precision."""
    model_config = ConfigDict(use_enum_values=True)

    capabilities: list[Capability] | None = Field(
        default=None,
        description="Model capabilities (e.g., 'embeddings', 'chat', 'tts', 'stt')",
    )
    dimensions: int | None = Field(
        default=None,
        description="Embedding dimensions (for embeddings models)",
    )
    format: Format | None = Field(
        default=None,
        description="Model format (e.g., None, 'GPTQ', 'GGUF', 'SqueezeLLM', 'AWQ')",
    )
    modalities: list[Modality] | None = Field(
        default=None,
        description="The modalities of the model (e.g., 'image', 'text', 'speech')",
    )

    precision: Precision | None = Field(
        default=None,
        description="Model precision (e.g., 'float16', 'float32', 'bfloat16', 'int8', 'int4')",
    )
    type: Literal["embeddings", "llm"] | None = Field(
        default=None,
        description="The type of the model e.g. ('embeddings' or 'llm')",
    )

I do think it might be a good idea to update the existing configmaps in the repo to include these fields. I'm happy to make another issue for that if it's useful just let me know.

src/leapfrogai_api/backend/types.py Show resolved Hide resolved
@jamestexas
Copy link
Author

Looks great so far! I'll do a deeper dive once it's fully ready. Just some comments to propagate downwards and discuss. Thanks!

Outside of the PR feedback is there anything else missing? This worked in terms of interacting with the API and wasn't sure if I needed to update things anywhere else to make it "complete". Let me know!

@justinthelaw
Copy link
Contributor

justinthelaw commented Aug 12, 2024

Looks great so far! I'll do a deeper dive once it's fully ready. Just some comments to propagate downwards and discuss. Thanks!

Outside of the PR feedback is there anything else missing? This worked in terms of interacting with the API and wasn't sure if I needed to update things anywhere else to make it "complete". Let me know!

This PR is perfectly constrained to the issue's main concerns, as long as the unit tests are there/updated properly for the new config and fields. Areas like UI consuming config and Backends producing config are meant for future tickets.

@jamestexas jamestexas force-pushed the feat/api-backend-load-model-info-590 branch from a7c48c3 to af85681 Compare August 20, 2024 19:42
@jamestexas jamestexas changed the title feat(api, backend): Load and Expose Backend Model Info at Runtime feat(api, backend): load and expose backend model Info at runtime Aug 20, 2024
@jamestexas jamestexas force-pushed the feat/api-backend-load-model-info-590 branch from cce96e0 to f276aba Compare August 26, 2024 18:39
for model in model_config.models:
m = ModelResponseModel(id=model)
# shared config object from the app
model_config: "Config" = session.app.state.config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like there might be an issue here, the supabase_session object doesn't have a property named app so its failing for me when I hit this endpoint: https://leapfrogai-api.uds.dev/openai/v1/models

Screenshot 2024-08-28 at 3 20 59 PM

@justinthelaw justinthelaw changed the title feat(api, backend): load and expose backend model Info at runtime feat(api, backend): load and expose backend model info at runtime Sep 18, 2024
@justinthelaw justinthelaw marked this pull request as draft September 25, 2024 02:27
@gphorvath
Copy link

As we icebox the repo, we are leaving this draft PR as a good place to pickup if we resume work. The goal behind the work was to expose model information such as type (embeddings, chat, etc.) so that the API can infer the model backend without requiring additional state.

@nywilken nywilken added the wontfix This will not be worked on label Oct 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python Pull requests that update Python code tech-debt Not a feature, but still necessary wontfix This will not be worked on
Projects
None yet
6 participants