Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Noeloc/ch01 - KServe documentation #27

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

noelo
Copy link

@noelo noelo commented Apr 8, 2024

please review if you have time

@rsriniva rsriniva linked an issue Apr 8, 2024 that may be closed by this pull request
@rsriniva
Copy link
Contributor

rsriniva commented Apr 8, 2024

@noelo - is this PR addressing #18 that you had raised after EMEA workshop?

@noelo
Copy link
Author

noelo commented Apr 8, 2024

yep...it's part of it....I'm now working on TGIS and hopefully custom runtimes as I need it for next week

@noelo noelo marked this pull request as draft April 10, 2024 11:12
@noelo noelo marked this pull request as ready for review April 10, 2024 15:56
@rsriniva
Copy link
Contributor

@noelo - is this ready for review and merge, or are you pushing more commits? Also, what version of RHOAI are you testing this on?

@noelo
Copy link
Author

noelo commented Apr 11, 2024 via email

@rsriniva
Copy link
Contributor

I am not qualified to review this. @jramcast @diego-torres @erwangranger or @mamurak ?

Copy link
Collaborator

@jramcast jramcast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My 2 cents: I left some comments mainly focused on the sections structure and how to find a clearer flow between the old and the newly added sections.

The new content is fantastic and sheds light on concepts that were very fuzzy to me, so I am approving upfront. Feel free to defer the structure improvements to future PRs. Good job @noelo !

As requests for a particular model are received the ModelMesh routing layer assigns the model to an existing serving pod.
In the serving pod a modelmesh-runtime-adapter retrieves the specified model and hands it over to the model adapter to serve.
Once the model is available the modelmesh-runtime-adapter instructs the routing later to forward any requests related to the model to the serving endpoint.
The modelmesh routing layer handles the dynamic nature of inferencing requests in that it can spin up additional model instances when load increase as well
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The modelmesh routing layer handles the dynamic nature of inferencing requests in that it can spin up additional model instances when load increase as well
The modelmesh routing layer handles the dynamic nature of inferencing requests in that it can spin up additional model instances when load increases as well

RHOAI has two main components for serving

* KServe Serving for single-model serving
* KServe ModelMesh for multi-model serving
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the introduction of single and multi-model serving options, we might need to reorganize sections a bit for clarity. The first 3 sections were written when only ModelMesh was an option.

Section 1 focuses on very high-level concepts and at the bottom introduces CRs for only Model Mesh. We might want to expand that part to explain that there are two flavors of model serving (single and multi).

Section 2 is ok as is but we might want to rename it to something like `Multi-model serving with OpenVINO.

Section 3 is about custom model servers. I would move that section to be the last one.

Comment on lines +40 to +41
* Caikit TGIS
* TGIS Standalone
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the first time we mention Caikit and TGIS if I am not wrong. It would be useful to add a one or two-sentence definition for each of them. Just a brief intro. You can tell students that they will find more details in section 5.

We're only going to focus on the TGIS-KServe usecase here.

== TGIS Overview
TGIS is a model serving runtime written in Rust which serves _PyTorch_ models via a _gRPC_ interface. It only supports the _SafeTensors_ model format.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For more info about safe tensors, we might want to add a reference to the huggingface docs https://huggingface.co/docs/safetensors/index

== TGIS Overview
TGIS is a model serving runtime written in Rust which serves _PyTorch_ models via a _gRPC_ interface. It only supports the _SafeTensors_ model format.
It supports batching of requests as well as streaming responses for individual requests.
The gRPC interface definitions are available https://github.com/opendatahub-io/text-generation-inference/tree/main/proto[here]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The gRPC interface definitions are available https://github.com/opendatahub-io/text-generation-inference/tree/main/proto[here]
The gRPC interface definitions are available https://github.com/opendatahub-io/text-generation-inference/tree/main/proto[here].


[NOTE]
****
TGIS currently doesn't have an embeddings API so embeddings have to be generated externally.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many students won't know what embeddings are. You might want to briefly mention what they are in the context of NLP.


=== Using the model

The model is served using the gRPC protocol and to test we need to fullfill a number of prerequisites
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a future improvement, we could provide students with a workbench image that includes gRPCurl and all the other prerequisites. This would enable them to make grpcurl requests directly from their workbench.

```

[NOTE]
For an python based example look https://github.com/cfchase/basic-tgis[here]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For an python based example look https://github.com/cfchase/basic-tgis[here]
For a Python based example look https://github.com/cfchase/basic-tgis[here].

@@ -0,0 +1,157 @@
= KServe Custom Serving Example
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could merge this section with section 3 to have a single custom serving section.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Need to cover models types and serving engines
3 participants