From abdc5bd075aca27458c87209711ed25abe3b925f Mon Sep 17 00:00:00 2001 From: Sivanantham <90966311+sivanantha321@users.noreply.github.com> Date: Sat, 7 Oct 2023 21:57:02 +0530 Subject: [PATCH] Add open inference protocol support docs and runtime priority docs (#273) * Update docs for open inference protocol support Signed-off-by: Sivanantham Chinnaiyan * Add local model testing doc Signed-off-by: Sivanantham Chinnaiyan * Add serving runtime priority field docs Signed-off-by: Sivanantham Chinnaiyan * Update lightgbm docs Signed-off-by: Dan Sun * Update README.md Signed-off-by: Dan Sun * Update sklearn deployment doc Signed-off-by: Dan Sun * Update XGBoost doc Signed-off-by: Dan Sun --------- Signed-off-by: Sivanantham Chinnaiyan Signed-off-by: Dan Sun Co-authored-by: Dan Sun --- docs/modelserving/servingruntimes.md | 114 +++++++++++++---- docs/modelserving/v1beta1/lightgbm/README.md | 115 +++++++---------- .../modelserving/v1beta1/sklearn/v2/README.md | 105 +++++----------- docs/modelserving/v1beta1/xgboost/README.md | 119 ++++++------------ 4 files changed, 207 insertions(+), 246 deletions(-) diff --git a/docs/modelserving/servingruntimes.md b/docs/modelserving/servingruntimes.md index 5f7101ddd..5bdd19989 100644 --- a/docs/modelserving/servingruntimes.md +++ b/docs/modelserving/servingruntimes.md @@ -54,27 +54,29 @@ This is demonstrated in the example for the [AMD Inference Server](./v1beta1/amd Available attributes in the `ServingRuntime` spec: -| Attribute | Description | -| ---------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -| `multiModel` | Whether this ServingRuntime is ModelMesh-compatible and intended for multi-model usage (as opposed to KServe single-model serving). Defaults to false | -| `disabled` | Disables this runtime | -| `containers` | List of containers associated with the runtime | -| `containers[ ].image` | The container image for the current container | -| `containers[ ].command` | Executable command found in the provided image | -| `containers[ ].args` | List of command line arguments as strings | -| `containers[ ].resources` | Kubernetes [limits or requests](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#requests-and-limits) | -| `containers[ ].env ` | List of environment variables to pass to the container | -| `containers[ ].imagePullPolicy` | The container image pull policy | -| `containers[ ].workingDir` | The working directory for current container | -| `containers[ ].livenessProbe` | Probe for checking container liveness | -| `containers[ ].readinessProbe` | Probe for checking container readiness | -| `supportedModelFormats` | List of model types supported by the current runtime | -| `supportedModelFormats[ ].name` | Name of the model format | -| `supportedModelFormats[ ].version` | Version of the model format. Used in validating that a predictor is supported by a runtime. It is recommended to include only the major version here, for example "1" rather than "1.15.4" | -| `storageHelper.disabled` | Disables the storage helper | -| `nodeSelector` | Influence Kubernetes scheduling to [assign pods to nodes](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/) | -| `affinity` | Influence Kubernetes scheduling to [assign pods to nodes](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#affinity-and-anti-affinity) | -| `tolerations` | Allow pods to be scheduled onto nodes [with matching taints](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration) | +| Attribute | Description | +|---------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `multiModel` | Whether this ServingRuntime is ModelMesh-compatible and intended for multi-model usage (as opposed to KServe single-model serving). Defaults to false | +| `disabled` | Disables this runtime | +| `containers` | List of containers associated with the runtime | +| `containers[ ].image` | The container image for the current container | +| `containers[ ].command` | Executable command found in the provided image | +| `containers[ ].args` | List of command line arguments as strings | +| `containers[ ].resources` | Kubernetes [limits or requests](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#requests-and-limits) | +| `containers[ ].env ` | List of environment variables to pass to the container | +| `containers[ ].imagePullPolicy` | The container image pull policy | +| `containers[ ].workingDir` | The working directory for current container | +| `containers[ ].livenessProbe` | Probe for checking container liveness | +| `containers[ ].readinessProbe` | Probe for checking container readiness | +| `supportedModelFormats` | List of model types supported by the current runtime | +| `supportedModelFormats[ ].name` | Name of the model format | +| `supportedModelFormats[ ].version` | Version of the model format. Used in validating that a predictor is supported by a runtime. It is recommended to include only the major version here, for example "1" rather than "1.15.4" | +| `supportedModelFormats[ ].autoselect` | Set to true to allow the ServingRuntime to be used for automatic model placement if this model format is specified with no explicit runtime. The default value is false. | +| `supportedModelFormats[ ].priority` | Priority of this serving runtime for auto selection. This is used to select the serving runtime if more than one serving runtime supports the same model format.
The value should be greater than zero. The higher the value, the higher the priority. Priority is not considered if AutoSelect is either false or not specified. Priority can be overridden by specifying the runtime in the InferenceService. | +| `storageHelper.disabled` | Disables the storage helper | +| `nodeSelector` | Influence Kubernetes scheduling to [assign pods to nodes](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/) | +| `affinity` | Influence Kubernetes scheduling to [assign pods to nodes](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#affinity-and-anti-affinity) | +| `tolerations` | Allow pods to be scheduled onto nodes [with matching taints](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration) | ModelMesh leverages additional fields not listed here. More information [here](https://github.com/kserve/modelmesh-serving/blob/main/docs/runtimes/custom_runtimes.md#spec-attributes). @@ -112,7 +114,7 @@ by the runtime will be used for model deployment. ### Implicit: Automatic selection -In each entry of the `supportedModelFormats` list, `autoSelect: true` can optionally be specified to indicate that that the given `ServingRuntime` can be +In each entry of the `supportedModelFormats` list, `autoSelect: true` can optionally be specified to indicate that the given `ServingRuntime` can be considered for automatic selection for predictors with the corresponding model format if no runtime is explicitly specified. For example, the `kserve-sklearnserver` ClusterServingRuntime supports SKLearn version 1 and has `autoSelect` enabled: @@ -162,9 +164,75 @@ spec: Then, then the version of the `supportedModelFormat` must also match. In this example, `kserve-sklearnserver` would not be eligible for selection since it only lists support for `sklearn` version `1`. +#### Priority + +If more than one serving runtime supports the same `model format` with same `version` and also supports the same `protocolVersion` then, we can optionally specify `priority` for the serving runtime. +Based on the `priority` the runtime is automatically selected if no runtime is explicitly specified. Note that, `priority` is valid only if `autoSelect` is `true`. Higher value means higher priority. + +For example, let's consider the serving runtimes `mlserver` and `kserve-sklearnserver`. Both the serving runtimes supports the `sklearn` model format with version `1` and both supports +the `protocolVersion` v2. Also note that `autoSelect` is enabled in both the serving runtimes. + +```yaml +apiVersion: serving.kserve.io/v1alpha1 +kind: ClusterServingRuntime +metadata: + name: kserve-sklearnserver +spec: + protocolVersions: + - v1 + - v2 + supportedModelFormats: + - name: sklearn + version: "1" + autoSelect: true + priority: 1 +... +``` + +```yaml +apiVersion: serving.kserve.io/v1alpha1 +kind: ClusterServingRuntime +metadata: + name: mlserver +spec: + protocolVersions: + - v2 + supportedModelFormats: + - name: sklearn + version: "1" + autoSelect: true + priority: 2 +... +``` +When the following InferenceService is deployed with no runtime specified, the controller will look for a runtime that supports `sklearn`: + +```yaml +apiVersion: serving.kserve.io/v1beta1 +kind: InferenceService +metadata: + name: example-sklearn-isvc +spec: + predictor: + model: + protocolVersion: v2 + modelFormat: + name: sklearn + storageUri: s3://bucket/sklearn/mnist.joblib +``` +The controller will find the two runtimes `kserve-sklearnserver` and `mlserver` as both has an entry in its `supportedModelFormats` list with `sklearn` and `autoSelect: true`. +Now the runtime is sorted based on the priority by the controller as there are more than one supported runtime available. Since the `mlserver` has the higher `priority` value, this ClusterServingRuntime +will be used for model deployment. + +**Constraints of priority** + +- The higher priority value means higher precedence. The value must be greater than 0. +- The priority is valid only if auto select is enabled otherwise the priority is not considered. +- The serving runtime with priority takes precedence over the serving runtime with priority not specified. +- Two model formats with same name and same model version cannot have the same priority. +- If more than one serving runtime supports the model format and none of them specified the priority then, there is no guarantee _which_ runtime will be selected. !!! warning - If multiple runtimes list the same format and/or version as auto-selectable, then there is no guarantee _which_ runtime will be selected. + If multiple runtimes list the same format and/or version as auto-selectable and the priority is not specified, the runtime is selected based on the `creationTimestamp` i.e. the most recently created runtime is selected. So there is no guarantee _which_ runtime will be selected. So users and cluster-administrators should enable `autoSelect` with care. ### Previous schema diff --git a/docs/modelserving/v1beta1/lightgbm/README.md b/docs/modelserving/v1beta1/lightgbm/README.md index 9c0eea23f..5cd5c16da 100644 --- a/docs/modelserving/v1beta1/lightgbm/README.md +++ b/docs/modelserving/v1beta1/lightgbm/README.md @@ -30,7 +30,7 @@ lgb_model.save_model(model_file) ## Deploy LightGBM model with V1 protocol ### Test the model locally -Install and run the [LightGBM Server](https://github.com/kserve/kserve/python/lgbserver) using the trained model locally and test the prediction. +Install and run the [LightGBM Server](https://github.com/kserve/kserve/tree/master/python/lgbserver) using the trained model locally and test the prediction. ```shell python -m lgbserver --model_dir /path/to/model_dir --model_name lgb @@ -54,7 +54,7 @@ print(res.text) To deploy the model on Kubernetes you can create the InferenceService by specifying the `modelFormat` with `lightgbm` and `storageUri`. -=== "Old Schema" +=== "New Schema" ```yaml apiVersion: "serving.kserve.io/v1beta1" @@ -63,10 +63,12 @@ To deploy the model on Kubernetes you can create the InferenceService by specify name: "lightgbm-iris" spec: predictor: - lightgbm: + model: + modelFormat: + name: lightgbm storageUri: "gs://kfserving-examples/models/lightgbm/iris" ``` -=== "New Schema" +=== "Old Schema" ```yaml apiVersion: "serving.kserve.io/v1beta1" @@ -75,9 +77,7 @@ To deploy the model on Kubernetes you can create the InferenceService by specify name: "lightgbm-iris" spec: predictor: - model: - modelFormat: - name: lightgbm + lightgbm: storageUri: "gs://kfserving-examples/models/lightgbm/iris" ``` @@ -129,62 +129,46 @@ curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1 {"predictions": [[0.9, 0.05, 0.05]]} ``` -## Deploy the model with [V2 protocol](https://github.com/kserve/kserve/tree/master/docs/predict-api/v2) +## Deploy the model with [Open Inference Protocol](https://github.com/kserve/kserve/tree/master/docs/predict-api/v2) ### Test the model locally -Once you've got your model serialized `model.bst`, we can then use [MLServer](https://github.com/SeldonIO/MLServer) which implements the KServe V2 inference protocol to spin up a local server. For more details on MLServer, please check the [LightGBM example doc](https://github.com/SeldonIO/MLServer/blob/master/docs/examples/lightgbm/README.md). +Once you've got your model serialized `model.bst`, we can then use [KServe LightGBM Server](https://github.com/kserve/kserve/tree/master/python/lgbserver) to create a local model server. -To run MLServer locally, you first install the `mlserver` package in your local environment, as well as the LightGBM runtime. +!!! Note + This step is optional and just meant for testing, feel free to jump straight to [deploying with InferenceService](#deploy-with-inferenceservice). -```bash -pip install mlserver mlserver-lightgbm -``` +#### Pre-requisites -The next step is to provide the model settings so that MLServer knows: +Firstly, to use kserve lightgbm server locally, you will first need to install the `lgbserver` +runtime package in your local environment. -- The inference runtime to serve your model (i.e. `mlserver_lightgbm.LightGBMModel`) -- The model's name and version +1. Clone the KServe repository and navigate into the directory. + ```bash + git clone https://github.com/kserve/kserve + ``` +2. Install `lgbserver` runtime. KServe uses [Poetry](https://python-poetry.org/) as the dependency management tool. Make sure you have already [installed poetry](https://python-poetry.org/docs/#installation). + ```bash + cd python/lgbserver + poetry install + ``` +#### Serving model locally -These can be specified through environment variables or by creating a local -`model-settings.json` file: +The `lgbserver` package takes three arguments. -```json -{ - "name": "lightgbm-iris", - "version": "v1.0.0", - "implementation": "mlserver_lightgbm.LightGBMModel" -} -``` +- `--model_dir`: The model directory path where the model is stored. +- `--model_name`: The name of the model deployed in the model server, the default value is `model`. This is optional. +- `--nthread`: Number of threads to use by LightGBM. This is optional and the default value is 1. -With the `mlserver` package installed locally and a local `model-settings.json` -file, you should now be ready to start our server as: +With the `lgbserver` runtime package installed locally, you should now be ready to start our server as: ```bash -mlserver start . +python3 lgbserver --model_dir /path/to/model_dir --model_name lightgbm-iris ``` ### Deploy InferenceService with REST endpoint +To deploy the LightGBM model with Open Inference Protocol, you need to set the **`protocolVersion` field to `v2`**. -When you deploy your model with `InferenceService` KServe injects sensible defaults so that it runs out-of-the-box without any -further configuration. However, you can still override these defaults by providing a `model-settings.json` file similar to your local one. -You can even provide a [set of `model-settings.json` files to load multiple models](https://github.com/SeldonIO/MLServer/tree/master/docs/examples/mms). - -To deploy the LightGBM model with V2 inference protocol, you need to set the **`protocolVersion` field to `v2`**. - -=== "Old Schema" - - ```yaml - apiVersion: "serving.kserve.io/v1beta1" - kind: "InferenceService" - metadata: - name: "lightgbm-v2-iris" - spec: - predictor: - lightgbm: - protocolVersion: v2 - storageUri: "gs://kfserving-examples/models/lightgbm/v2/iris" - ``` -=== "New Schema" +=== "Schema" ```yaml apiVersion: "serving.kserve.io/v1beta1" @@ -196,9 +180,12 @@ To deploy the LightGBM model with V2 inference protocol, you need to set the **` model: modelFormat: name: lightgbm + runtime: kserve-lgbserver protocolVersion: v2 storageUri: "gs://kfserving-examples/models/lightgbm/v2/iris" ``` +!!! Note + For `V2 protocol (open inference protocol)` if `runtime` field is not provided then, by default `mlserver` runtime is used. Apply the InferenceService yaml to get the REST endpoint === "kubectl" @@ -213,7 +200,7 @@ kubectl apply -f lightgbm-v2.yaml $ inferenceservice.serving.kserve.io/lightgbm-v2-iris created ``` -### Test the deployed model with curl +#### Test the deployed model with curl You can now test your deployed model by sending a sample request. @@ -276,24 +263,7 @@ curl -v \ ### Create the InferenceService with gRPC endpoint Create the inference service yaml and expose the gRPC port, currently only one port is allowed to expose either HTTP or gRPC port and by default HTTP port is exposed. -=== "Old Schema" - - ```yaml - apiVersion: "serving.kserve.io/v1beta1" - kind: "InferenceService" - metadata: - name: "lightgbm-v2-iris" - spec: - predictor: - lightgbm: - protocolVersion: v2 - storageUri: "gs://kfserving-examples/models/lightgbm/v2/iris" - ports: - - name: h2c - protocol: TCP - containerPort: 9000 - ``` -=== "New Schema" +=== "Yaml" ```yaml apiVersion: "serving.kserve.io/v1beta1" @@ -306,12 +276,15 @@ Create the inference service yaml and expose the gRPC port, currently only one p modelFormat: name: lightgbm protocolVersion: v2 + runtime: kserve-lgbserver storageUri: "gs://kfserving-examples/models/lightgbm/v2/iris" ports: - - name: h2c - protocol: TCP - containerPort: 9000 + - name: h2c + protocol: TCP + containerPort: 8081 ``` +!!! Note + For `V2 protocol (open inference protocol)` if `runtime` field is not provided then, by default `mlserver` runtime is used. Apply the InferenceService yaml to get the gRPC endpoint === "kubectl" @@ -320,7 +293,7 @@ Apply the InferenceService yaml to get the gRPC endpoint kubectl apply -f lightgbm-v2-grpc.yaml ``` -### Test the deployed model with grpcurl +#### Test the deployed model with grpcurl After the gRPC `InferenceService` becomes ready, [grpcurl](https://github.com/fullstorydev/grpcurl), can be used to send gRPC requests to the `InferenceService`. diff --git a/docs/modelserving/v1beta1/sklearn/v2/README.md b/docs/modelserving/v1beta1/sklearn/v2/README.md index 1a5e547d7..b42e54253 100644 --- a/docs/modelserving/v1beta1/sklearn/v2/README.md +++ b/docs/modelserving/v1beta1/sklearn/v2/README.md @@ -4,10 +4,9 @@ This example walks you through how to deploy a `scikit-learn` model leveraging the `v1beta1` version of the `InferenceService` CRD. Note that, by default the `v1beta1` version will expose your model through an API compatible with the existing V1 Dataplane. -However, this example will show you how to serve a model through an API -compatible with the new [V2 Dataplane](https://github.com/kserve/kserve/tree/master/docs/predict-api/v2). +This example will show you how to serve a model through [Open Inference Protocol](https://github.com/kserve/open-inference-protocol). -## Training +## Train the Model The first step will be to train a sample `scikit-learn` model. Note that this model will be then saved as `model.joblib`. @@ -26,79 +25,49 @@ clf.fit(X, y) dump(clf, 'model.joblib') ``` -## Testing locally +## Test the Model locally -Once you've got your model serialised `model.joblib`, we can then use -[MLServer](https://github.com/SeldonIO/MLServer) to spin up a local server. -For more details on MLServer, feel free to check the [SKLearn example doc](https://github.com/SeldonIO/MLServer/blob/master/docs/examples/sklearn/README.md). +Once you've got your model serialised `model.joblib`, we can then use [KServe Sklearn Server](https://github.com/kserve/kserve/tree/master/python/sklearnserver) to spin up a local server. !!! Note - this step is optional and just meant for testing, feel free to jump straight to [deploying with InferenceService](#deploy-with-inferenceservice). + This step is optional and just meant for testing, feel free to jump straight to [deploying with InferenceService](#deploy-with-inferenceservice). -### Pre-requisites +### Using KServe SklearnServer -Firstly, to use MLServer locally, you will first need to install the `mlserver` -package in your local environment, as well as the SKLearn runtime. +#### Pre-requisites -```bash -pip install mlserver mlserver-sklearn -``` - -### Model settings - -The next step will be providing some model settings so that -MLServer knows: +Firstly, to use KServe sklearn server locally, you will first need to install the `sklearnserver` +runtime package in your local environment. -- The inference runtime to serve your model (i.e. `mlserver_sklearn.SKLearnModel`) -- The model's name and version - -These can be specified through environment variables or by creating a local -`model-settings.json` file: - -```json -{ - "name": "sklearn-iris", - "version": "v1.0.0", - "implementation": "mlserver_sklearn.SKLearnModel" -} -``` +1. Clone the KServe repository and navigate into the directory. + ```bash + git clone https://github.com/kserve/kserve + ``` +2. Install `sklearnserver` runtime. Kserve uses [Poetry](https://python-poetry.org/) as the dependency management tool. Make sure you have already [installed poetry](https://python-poetry.org/docs/#installation). + ```bash + cd python/sklearnserver + poetry install + ``` +#### Serving model locally -Note that, when you [deploy your model](#deployment), **KServe will already -inject some sensible defaults** so that it runs out-of-the-box without any -further configuration. -However, you can still override these defaults by providing a -`model-settings.json` file similar to your local one. -You can even provide a [set of `model-settings.json` files to load multiple -models](https://github.com/SeldonIO/MLServer/tree/master/docs/examples/mms). +The `sklearnserver` package takes two arguments. -### Serving model locally +- `--model_dir`: The model directory path where the model is stored. +- `--model_name`: The name of the model deployed in the model server, the default value is `model`. This is optional. -With the `mlserver` package installed locally and a local `model-settings.json` -file, you should now be ready to start our server as: +With the `sklearnserver` runtime package installed locally, you should now be ready to start our server as: ```bash -mlserver start . +python3 sklearnserver --model_dir /path/to/model_dir --model_name sklearn-irisv2 ``` -## Deploy with InferenceService +## Deploy the Model with InferenceService -Lastly, you will use KServe to deploy the trained model. +Lastly, you will use KServe to deploy the trained model onto Kubernetes. For this, you will just need to use **version `v1beta1`** of the `InferenceService` CRD and set the **`protocolVersion` field to `v2`**. -=== "Old Schema" - ```yaml - apiVersion: "serving.kserve.io/v1beta1" - kind: "InferenceService" - metadata: - name: "sklearn-irisv2" - spec: - predictor: - sklearn: - protocolVersion: "v2" - storageUri: "gs://seldon-models/sklearn/mms/lr_model" - ``` -=== "New Schema" +=== "Yaml" ```yaml apiVersion: "serving.kserve.io/v1beta1" kind: "InferenceService" @@ -109,30 +78,24 @@ For this, you will just need to use **version `v1beta1`** of the model: modelFormat: name: sklearn - runtime: kserve-mlserver - storageUri: "gs://seldon-models/sklearn/mms/lr_model" + protocolVersion: v2 + runtime: kserve-sklearnserver + storageUri: "gs://kfserving-examples/models/sklearn/1.0/model" ``` -Note that this makes the following assumptions: - -- Your model weights (i.e. your `model.joblib` file) have already been uploaded - to a "model repository" (GCS in this example) and can be accessed as - `gs://seldon-models/sklearn/mms/lr_model`. -- There is a K8s cluster available, accessible through `kubectl`. -- KServe has already been [installed in your cluster](../../../../get_started/README.md). - +!!! Note + For `V2 protocol (open inference protocol)` if `runtime` field is not provided then, by default `mlserver` runtime is used. === "kubectl" ```bash kubectl apply -f ./sklearn.yaml ``` -## Testing deployed model +## Test the Deployed Model You can now test your deployed model by sending a sample request. -Note that this request **needs to follow the [V2 Dataplane -protocol](https://github.com/kserve/kserve/tree/master/docs/predict-api/v2)**. +Note that this request **needs to follow the [Open Inference Protocol](https://github.com/kserve/open-inference-protocol)**. You can see an example payload below: ```json diff --git a/docs/modelserving/v1beta1/xgboost/README.md b/docs/modelserving/v1beta1/xgboost/README.md index 864441652..8fc913b78 100644 --- a/docs/modelserving/v1beta1/xgboost/README.md +++ b/docs/modelserving/v1beta1/xgboost/README.md @@ -1,13 +1,10 @@ # Deploying XGBoost models with InferenceService -This example walks you through how to deploy a `xgboost` model leveraging the -`v1beta1` version of the `InferenceService` CRD. -Note that, by default the `v1beta1` version will expose your model through an -API compatible with the existing V1 Dataplane. -However, this example will show you how to serve a model through an API -compatible with the new [V2 Dataplane](https://github.com/kserve/kserve/tree/master/docs/predict-api/v2). +This example walks you through how to deploy a `xgboost` model using KServe's `InferenceService` CRD. +Note that, by default it exposes your model through an API compatible with the existing V1 Dataplane. This example will show you how to serve a model through an API +compatible with the [Open Inference Protocol](https://github.com/kserve/open-inference-protocol). -## Training +## Train the Model The first step will be to train a sample `xgboost` model. We will save this model as `model.bst`. @@ -36,80 +33,47 @@ model_file = os.path.join((model_dir), BST_FILE) xgb_model.save_model(model_file) ``` -## Testing locally +### Test the model locally +Once you've got your model serialized `model.bst`, we can then use [KServe XGBoost Server](https://github.com/kserve/kserve/tree/master/python/xgbserver) to spin up a local server. -Once we've got our `model.bst` model serialised, we can then use -[MLServer](https://github.com/SeldonIO/MLServer) to spin up a local server. -For more details on MLServer, feel free to check the [XGBoost example in their -docs](https://github.com/SeldonIO/MLServer/tree/master/docs/examples/xgboost). +!!! Note + This step is optional and just meant for testing, feel free to jump straight to [deploying with InferenceService](#deploy-with-inferenceservice). -> Note that this step is optional and just meant for testing. -> Feel free to jump straight to [deploying your trained model](#deployment). +#### Pre-requisites -### Pre-requisites +Firstly, to use kserve xgboost server locally, you will first need to install the `xgbserver` runtime package in your local environment. -Firstly, to use MLServer locally, you will first need to install the `mlserver` -package in your local environment as well as the XGBoost runtime. - -```bash -pip install mlserver mlserver-xgboost -``` - -### Model settings +1. Clone the Kserve repository and navigate into the directory. + ```bash + git clone https://github.com/kserve/kserve + ``` +2. Install `xgbserver` runtime. Kserve uses [Poetry](https://python-poetry.org/) as the dependency management tool. Make sure you have already [installed poetry](https://python-poetry.org/docs/#installation). + ```bash + cd python/xgbserver + poetry install + ``` +#### Serving model locally -The next step will be providing some model settings so that -MLServer knows: +The `xgbserver` package takes three arguments. -- The inference runtime that we want our model to use (i.e. - `mlserver_xgboost.XGBoostModel`) -- Our model's name and version +- `--model_dir`: The model directory path where the model is stored. +- `--model_name`: The name of the model deployed in the model server, the default value is `model`. This is optional. +- `--nthread`: Number of threads to use by LightGBM. This is optional and the default value is 1. -These can be specified through environment variables or by creating a local -`model-settings.json` file: +With the `xgbserver` runtime package installed locally, you should now be ready to start our server as: -```json -{ - "name": "xgboost-iris", - "version": "v1.0.0", - "implementation": "mlserver_xgboost.XGBoostModel" -} +```bash +python3 xgbserver --model_dir /path/to/model_dir --model_name xgboost-iris ``` -Note that, when we [deploy our model](#deployment), **KServe will already -inject some sensible defaults** so that it runs out-of-the-box without any -further configuration. -However, you can still override these defaults by providing a -`model-settings.json` file similar to your local one. -You can even provide a [set of `model-settings.json` files to load multiple -models](https://github.com/SeldonIO/MLServer/tree/master/docs/examples/mms). -### Serving our model locally +## Deploy the Model with InferenceService -With the `mlserver` package installed locally and a local `model-settings.json` -file, we should now be ready to start our server as: +Lastly, we use KServe to deploy our trained model on Kubernetes. +For this, we use the `InferenceService` CRD and set the **`protocolVersion` field to `v2`**. -```bash -mlserver start . -``` +=== "Yaml" -## Deploy with InferenceService - -Lastly, we will use KServe to deploy our trained model. -For this, we will just need to use **version `v1beta1`** of the -`InferenceService` CRD and set the the **`protocolVersion` field to `v2`**. -=== "Old Schema" - ```yaml - apiVersion: "serving.kserve.io/v1beta1" - kind: "InferenceService" - metadata: - name: "xgboost-iris" - spec: - predictor: - xgboost: - protocolVersion: "v2" - storageUri: "gs://kfserving-examples/models/xgboost/iris" - ``` -=== "New Schema" ```yaml apiVersion: "serving.kserve.io/v1beta1" kind: "InferenceService" @@ -120,31 +84,24 @@ For this, we will just need to use **version `v1beta1`** of the model: modelFormat: name: xgboost - runtime: kserve-mlserver + protocolVersion: v2 + runtime: kserve-xgbserver storageUri: "gs://kfserving-examples/models/xgboost/iris" ``` -Note that this makes the following assumptions: - -- Your model weights (i.e. your `model.bst` file) have already been uploaded - to a "model repository" (GCS in this example) and can be accessed as - `gs://kfserving-examples/models/xgboost/iris`. -- There is a K8s cluster available, accessible through `kubectl`. -- KServe has already been [installed in your - cluster](../../../get_started/README.md#4-Install-kserve). +!!! Note + For `V2 protocol (open inference protocol)` if `runtime` field is not provided then, by default `mlserver` runtime is used. -Assuming that we've got a cluster accessible through `kubectl` with KServe -already installed, we can deploy our model as: +Assuming that we've got a cluster accessible through `kubectl` with KServe already installed, we can deploy our model as: ```bash kubectl apply -f xgboost.yaml ``` -## Testing deployed model +## Test the Deployed Model We can now test our deployed model by sending a sample request. -Note that this request **needs to follow the [V2 Dataplane -protocol](https://github.com/kserve/kserve/tree/master/docs/predict-api/v2)**. +Note that this request **needs to follow the [Open Inference Protocol](https://github.com/kserve/open-inference-protocol)**. You can see an example payload below: ```json