-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Noeloc/ch01 - KServe documentation #27
base: main
Are you sure you want to change the base?
Conversation
yep...it's part of it....I'm now working on TGIS and hopefully custom runtimes as I need it for next week |
@noelo - is this ready for review and merge, or are you pushing more commits? Also, what version of RHOAI are you testing this on? |
2.8. it's completed and ready for review.
…On Thu, Apr 11, 2024, 04:26 Ravi Srinivasan ***@***.***> wrote:
@noelo <https://github.com/noelo> - is this ready for review and merge,
or are you pushing more commits? Also, what version of RHOAI are you
testing this on?
—
Reply to this email directly, view it on GitHub
<#27 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADP7UZPQLO3IQ27BBZFBY3Y4X7GPAVCNFSM6AAAAABF4OZWP6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDANBYHA3DMMBRHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I am not qualified to review this. @jramcast @diego-torres @erwangranger or @mamurak ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My 2 cents: I left some comments mainly focused on the sections structure and how to find a clearer flow between the old and the newly added sections.
The new content is fantastic and sheds light on concepts that were very fuzzy to me, so I am approving upfront. Feel free to defer the structure improvements to future PRs. Good job @noelo !
As requests for a particular model are received the ModelMesh routing layer assigns the model to an existing serving pod. | ||
In the serving pod a modelmesh-runtime-adapter retrieves the specified model and hands it over to the model adapter to serve. | ||
Once the model is available the modelmesh-runtime-adapter instructs the routing later to forward any requests related to the model to the serving endpoint. | ||
The modelmesh routing layer handles the dynamic nature of inferencing requests in that it can spin up additional model instances when load increase as well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The modelmesh routing layer handles the dynamic nature of inferencing requests in that it can spin up additional model instances when load increase as well | |
The modelmesh routing layer handles the dynamic nature of inferencing requests in that it can spin up additional model instances when load increases as well |
RHOAI has two main components for serving | ||
|
||
* KServe Serving for single-model serving | ||
* KServe ModelMesh for multi-model serving |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the introduction of single and multi-model serving options, we might need to reorganize sections a bit for clarity. The first 3 sections were written when only ModelMesh was an option.
Section 1 focuses on very high-level concepts and at the bottom introduces CRs for only Model Mesh. We might want to expand that part to explain that there are two flavors of model serving (single and multi).
Section 2 is ok as is but we might want to rename it to something like `Multi-model serving with OpenVINO.
Section 3 is about custom model servers. I would move that section to be the last one.
* Caikit TGIS | ||
* TGIS Standalone |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the first time we mention Caikit and TGIS if I am not wrong. It would be useful to add a one or two-sentence definition for each of them. Just a brief intro. You can tell students that they will find more details in section 5.
We're only going to focus on the TGIS-KServe usecase here. | ||
|
||
== TGIS Overview | ||
TGIS is a model serving runtime written in Rust which serves _PyTorch_ models via a _gRPC_ interface. It only supports the _SafeTensors_ model format. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For more info about safe tensors, we might want to add a reference to the huggingface docs https://huggingface.co/docs/safetensors/index
== TGIS Overview | ||
TGIS is a model serving runtime written in Rust which serves _PyTorch_ models via a _gRPC_ interface. It only supports the _SafeTensors_ model format. | ||
It supports batching of requests as well as streaming responses for individual requests. | ||
The gRPC interface definitions are available https://github.com/opendatahub-io/text-generation-inference/tree/main/proto[here] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The gRPC interface definitions are available https://github.com/opendatahub-io/text-generation-inference/tree/main/proto[here] | |
The gRPC interface definitions are available https://github.com/opendatahub-io/text-generation-inference/tree/main/proto[here]. |
|
||
[NOTE] | ||
**** | ||
TGIS currently doesn't have an embeddings API so embeddings have to be generated externally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many students won't know what embeddings are. You might want to briefly mention what they are in the context of NLP.
|
||
=== Using the model | ||
|
||
The model is served using the gRPC protocol and to test we need to fullfill a number of prerequisites |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a future improvement, we could provide students with a workbench image that includes gRPCurl and all the other prerequisites. This would enable them to make grpcurl requests directly from their workbench.
``` | ||
|
||
[NOTE] | ||
For an python based example look https://github.com/cfchase/basic-tgis[here] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For an python based example look https://github.com/cfchase/basic-tgis[here] | |
For a Python based example look https://github.com/cfchase/basic-tgis[here]. |
@@ -0,0 +1,157 @@ | |||
= KServe Custom Serving Example |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could merge this section with section 3 to have a single custom serving section.
please review if you have time