Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide copyable text for exercises + minor style fixes #13

Merged
merged 2 commits into from
Dec 12, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 31 additions & 25 deletions modules/chapter1/pages/section1.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -47,23 +47,23 @@ A model server runtime is the execution environment or platform where a trained

The model server runtime can be a part of a larger deployment framework or service that includes features such as scalability, versioning, monitoring, and security. Examples of model server runtimes include TensorFlow Serving, TorchServe, and ONNX Runtime. These runtimes support the deployment of models trained using popular machine learning frameworks and provide a standardized way to serve predictions over APIs (Application Programming Interfaces).

=== Inference Engine:
=== Inference Engine
An inference engine is a component responsible for executing the forward pass of a machine learning model to generate predictions based on input data. It is a crucial part of the model server runtime and is specifically designed for performing inference tasks efficiently. The inference engine takes care of optimizations, such as hardware acceleration and parallelization, to ensure that predictions are made quickly and with minimal resource utilization.

The inference engine may be integrated into the model server runtime or work alongside it, depending on the specific architecture. For example, TensorFlow Serving incorporates TensorFlow's inference engine, and ONNX Runtime serves as both a runtime and an inference engine for models in the Open Neural Network Exchange (ONNX) format.

**Relationship**:
In summary, the model server runtime provides the overall environment for hosting and managing machine learning models in production, while the inference engine is responsible for the actual computation of predictions during inference. The two work together to deliver a scalable, efficient, and reliable solution for serving machine learning models in real-world applications. The choice of model server runtime and inference engine depends on factors such as the machine learning framework used, deployment requirements, and the specific optimizations needed for the target hardware.

=== Unravel The Runtime
== Unravel The Runtime

When deploying machine learning models, we need to deploy a container that serves a **Runtime** and uses a **Model** to perform predictions, consider the following example:

==== Train a model
=== Train a Model

Using a RHOAI instance, let us train and deploy an example.

. In a data science project, create a `Standard Data Science`workbench.
. In a data science project, create a `Standard Data Science` workbench.
Then, open the workbench to go to the JupyterLab interface.
+
image::workbench_options.png[Workbench Options]
Expand Down Expand Up @@ -102,10 +102,10 @@ There are different formats and libraries to export the model, in this case we a

* Torch

The use of either of those formats depend on the target server runtime, some of them are proven to be more efficient than others for certain type of training algorithms and model sizes.
The use of either of those formats depends on the target server runtime, some of them are proven to be more efficient than others for certain type of training algorithms and model sizes.
====

===== Use the model in another notebook
=== Use the Model in Another Notebook

The model can be deserialized in another notebook, and used to generate a prediction:

Expand All @@ -127,7 +127,7 @@ At this moment the model can be exported and imported in other projects for its

For this section, you need Podman to create an image, and a registry to upload the resulting image.

=== web application that uses the model
=== Web application that uses the model

The pickle model that we previously exported can be used in a Flask application. In this section we present an example Flask application that uses the model.

Expand Down Expand Up @@ -251,33 +251,39 @@ CMD ["app:app"]

. Build and push the image to an image registry
+
```shell
podman login quay.io
podman build -t purchase-predictor:1.0 .
podman tag purchase-predictor:1.0 quay.io/user_name/purchase-predictor:1.0
podman push quay.io/user_name/purchase-predictor:1.0
```

[source,console]
----
$ podman login quay.io
$ podman build -t purchase-predictor:1.0 .
$ podman tag purchase-predictor:1.0 quay.io/user_name/purchase-predictor:1.0
Comment on lines +258 to +259
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocker comment, but why creating 2 tags for the same container image?
Why not using the FQN tag in the build step?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe for clarity? We would have to ask the original writer of the exercise.

$ podman push quay.io/user_name/purchase-predictor:1.0
----
+
After you push the image, open quay.io in your browser and make the image public.

. Deploy the model image to **OpenShift**
+
```shell
oc login api.cluster.example.com:6443
oc new-project model-deploy
oc new-app --name purchase-predictor quay.io/user_name/purchase-predictor:1.0
oc expose service purchase-predictor
```
[source,console]
----
$ oc login api.cluster.example.com:6443
$ oc new-project model-deploy
$ oc new-app --name purchase-predictor quay.io/user_name/purchase-predictor:1.0
$ oc expose service purchase-predictor
----

Now we can use the Flask application with some commands such as:
```shell
curl http://purchase-predictor-model-deploy.apps.cluster.example.com/health
ok%
curl http://purchase-predictor-model-deploy.apps.cluster.example.com/info
[source,console]
----
$ curl http://purchase-predictor-model-deploy.apps.cluster.example.com/health
ok
$ curl http://purchase-predictor-model-deploy.apps.cluster.example.com/info
{"name":"Time to purchase amount predictor","version":"v1.0.0"}
curl -d '{"time":4}' -H "Content-Type: application/json" -X POST http://purchase-predictor-model-deploy.apps.cluster.example.com/predict
$ curl -d '{"time":4}' -H "Content-Type: application/json" \
> -X POST \
> http://purchase-predictor-model-deploy.apps.cluster.example.com/predict
{"prediction":34,"status":200}
```
----

[IMPORTANT]
====
Expand Down
108 changes: 67 additions & 41 deletions modules/chapter1/pages/section2.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -8,40 +8,44 @@ https://min.io[MinIO] is a high-performance, S3 compatible object store. It is b

We will need an S3 solution to share the model from training to deploy, in this exercise we will prepare MinIO to be such S3 solution.

. In OpenShift, create a new namespace with the name **object-datastore**
. In OpenShift, create a new namespace with the name **object-datastore**.
+
```shell
oc new-project object-datastore
```
[source,console]
----
$ oc new-project object-datastore
----

. Run the following yaml to install MinIO
. Run the following yaml to install MinIO:
+
```shell
curl https://raw.githubusercontent.com/RedHatQuickCourses/rhods-qc-apps/main/4.rhods-deploy/chapter2/minio.yml
oc apply -f ./minio.yml -n object-datastore
```
[source,console]
----
$ curl https://raw.githubusercontent.com/RedHatQuickCourses/rhods-qc-apps/main/4.rhods-deploy/chapter2/minio.yml
$ oc apply -f ./minio.yml -n object-datastore
----

. Get the route to the MinIO dashboard
. Get the route to the MinIO dashboard.
+
```shell
oc get routes -n object-datastore | grep minio-ui | awk '{print $2}'
```
[source,console]
----
$ oc get routes -n object-datastore | grep minio-ui | awk '{print $2}'
----
+
[INFO]
====
Use this route to navigate to the S3 dashboard using a browser. With the browser, you will be able to create buckets, upload files, and navigate the S3 contents.
====

. Get the route to the MinIO API
. Get the route to the MinIO API.
+
```shell
oc get routes -n object-datastore | grep minio-api | awk '{print $2}'
```
[source,console]
----
$ oc get routes -n object-datastore | grep minio-api | awk '{print $2}'
----
+
[INFO]
====
Use this route as the S3 API endpoint. Basically, this is the URL that we will use when creating a data connection to the S3 in RHOAI.
====
====

== Training The Model
We will use the iris dataset model for this excercise.
Expand All @@ -55,9 +59,10 @@ It is recommended to use a workbench that was created with the **Standard Data S

. Make sure that the workbench environment serves the required python packages for the notebook to run, for this to happen, open a terminal and run the following command to verify that the packages are already installed:
+
```shell
pip install -r /opt/app-root/src/rhods-qc-apps/4.rhods-deploy/chapter2/requirements.txt
```
[source,console]
----
$ pip install -r /opt/app-root/src/rhods-qc-apps/4.rhods-deploy/chapter2/requirements.txt
----

[TIP]
====
Expand Down Expand Up @@ -95,7 +100,7 @@ Make sure to create a new path in your bucket, and upload to such path, not to r

. In the RHOAI dashboard, create a project named **iris-project**.

. In the **Data Connections** section, create a Data Connection to your S3
. In the **Data Connections** section, create a Data Connection to your S3.
+
image::add-minio-iris-data-connection.png[Add iris data connection from minio]
+
Expand All @@ -104,6 +109,7 @@ image::add-minio-iris-data-connection.png[Add iris data connection from minio]
- The credentials (Access Key/Secret Key) are `minio`/`minio123`.
- Make sure to use the API route, not the UI route (`oc get routes -n object-datastore | grep minio-api | awk '{print $2}'`).
- The region is not important when using MinIO, this is a property that has effects when using AWS S3.
However, you must enter a non-empty value to prevent problems with model serving.
- Mind typos for the bucket name.
- You don't have to select a workbench to attach this data connection to.
====
Expand Down Expand Up @@ -160,7 +166,14 @@ s3.download_file(bucket_name, s3_data_path, "my/local/path/dataset.csv")
+
image::add-server-button.png[add server]

. Fill the form with the example values:
. Fill the form with the following values:
+
--
* Server name: `iris-model-server`.
* Serving runtime: `OpenVINO Model Server`.
* Select the checkboxes to expose the models through an external route, and to enable token authentication.
Enter `iris-serviceaccount` as the service account name.
--
+
image::add-server-form-example.png[Add Server Form]
+
Expand Down Expand Up @@ -198,7 +211,14 @@ image::model-server-with-token.png[Model Server with token]
+
image::deploy-model-button.png[Deploy Model button]

. Fill the **Deploy Model** from as in the example:
. Fill the **Deploy Model** form.
+
--
* Model name: `iris-model`
* Model framework: `onnx - 1`
* Model location data connection: `iris-data-connection`
* Model location path: `iris`
--
+
image::deploy-model-form.png[Deploy Model form]

Expand All @@ -208,11 +228,12 @@ image::deploy-model-success.png[Deploy model success]

. Observe and monitor the assets created in your OpenShift **iris-project** namespace.
+
```shell
oc get routes -n iris-project
oc get secrets -n iris-project | grep iris-model
oc get events -n iris-project
```
[source,console]
----
$ oc get routes -n iris-project
$ oc get secrets -n iris-project | grep iris-model
$ oc get events -n iris-project
----
+
image::iris-project-events.png[Iris project events]
+
Expand All @@ -225,23 +246,28 @@ Deploying a **Model Server** triggers a **ReplicaSet** with **ModelMesh**, which

Now that the model is ready to use, we can make an inference using the REST API

. Assign the route to an environment variable in your local machine, so that we can use it in our curl commands
. Assign the route to an environment variable in your local machine, so that we can use it in our curl commands.
+
```shell
export IRIS_ROUTE=https://$(oc get routes -n iris-project | grep iris-model | awk '{print $2}')
```
[source,console]
----
$ export IRIS_ROUTE=https://$(oc get routes -n iris-project | grep iris-model | awk '{print $2}')
----

. Assign an authentication token to an environment variable in your local machine
. Assign an authentication token to an environment variable in your local machine.
+
```shell
export TOKEN=$(oc whoami -t)
```
[source,console]
----
$ export TOKEN=$(oc whoami -t)
----

. Request an inference with the REST API
. Request an inference with the REST API.
+
```shell
curl -H "Authorization: Bearer $TOKEN" $IRIS_ROUTE/v2/models/iris-model/infer -X POST --data '{"inputs" : [{"name" : "X","shape" : [ 1, 4 ],"datatype" : "FP32","data" : [ 3, 4, 3, 2 ]}],"outputs" : [{"name" : "output0"}]}'
```
[source,console]
----
$ curl -H "Authorization: Bearer $TOKEN" $IRIS_ROUTE/v2/models/iris-model/infer \
-X POST \
--data '{"inputs" : [{"name" : "X","shape" : [ 1, 4 ],"datatype" : "FP32","data" : [ 3, 4, 3, 2 ]}],"outputs" : [{"name" : "output0"}]}'
----

The result of using the inference service looks like the following output:
```json
Expand Down
48 changes: 34 additions & 14 deletions modules/chapter1/pages/section3.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,15 @@ image::runtimes-list.png[Runtimes List]
+
image::add-custom-model-server.png[Add server]

. Fill up the form as in the following example, notice how **Triton runtime 23.05** is one of the available options for the **Serving runtime** dropdown.
. Create the model server with the following values:
+
--
* Server name: `iris-custom-server`.
* Serving runtime: `Triton runtime 23.05`.
This is the newly added runtime.
* Activate the external route and the authentication.
Use `custom-server-sa` as the service account name.
--
+
image:custom-model-server-form.png[Add model server form]

Expand All @@ -151,7 +159,14 @@ image::custom-runtime.png[Iris custom server]
+
image::iris-custom-deploy-model.png[Deploy Model]

. Fill up the **Deploy Model** form as in the following example:
. Fill up the **Deploy Model** form:
+
--
* Model name: `iris-custom-model`
* Model framework: `onnx - 1`
* Model location data connection: `iris-data-connection`
* Model location path: `iris`
--
+
image::iris-custom-deploy-model-form.png[Deploy model form]
+
Expand All @@ -169,23 +184,28 @@ image::triton-server-running.png[Triton server running]

Now that the model is ready to use, we can make an inference using the REST API

. Assign the route to an environment variable in your local machine, so that we can use it in our curl commands
. Assign the route to an environment variable in your local machine, so that we can use it in our curl commands.
+
```shell
export IRIS_ROUTE=https://$(oc get routes -n iris-project | grep iris-custom-model | awk '{print $2}')
```
[source,console]
----
$ export IRIS_ROUTE=https://$(oc get routes -n iris-project | grep iris-custom-model | awk '{print $2}')
----

. Assign an authentication token to an environment variable in your local machine
. Assign an authentication token to an environment variable in your local machine.
+
```shell
export TOKEN=$(oc whoami -t)
```
[source,console]
----
$ export TOKEN=$(oc whoami -t)
----

. Request an inference with the REST API
. Request an inference with the REST API.
+
```shell
curl -H "Authorization: Bearer $TOKEN" $IRIS_ROUTE/v2/models/iris-custom-model/infer -X POST --data '{"inputs" : [{"name" : "X","shape" : [ 1, 4 ],"datatype" : "FP32","data" : [ 3, 4, 3, 2 ]}]}'
```
[source,console]
----
$ curl -H "Authorization: Bearer $TOKEN" \
$IRIS_ROUTE/v2/models/iris-custom-model/infer -X POST \
--data '{"inputs" : [{"name" : "X","shape" : [ 1, 4 ],"datatype" : "FP32","data" : [ 3, 4, 3, 2 ]}]}'
----

. The result received from the inference service looks like the following:
+
Expand Down
Loading