From abdc5bd075aca27458c87209711ed25abe3b925f Mon Sep 17 00:00:00 2001
From: Sivanantham <90966311+sivanantha321@users.noreply.github.com>
Date: Sat, 7 Oct 2023 21:57:02 +0530
Subject: [PATCH] Add open inference protocol support docs and runtime priority
 docs (#273)

* Update docs for open inference protocol support

Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>

* Add local model testing doc

Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>

* Add serving runtime priority field docs

Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>

* Update lightgbm docs

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

* Update README.md

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

* Update sklearn deployment doc

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

* Update XGBoost doc

Signed-off-by: Dan Sun <dsun20@bloomberg.net>

---------

Signed-off-by: Sivanantham Chinnaiyan <sivanantham.chinnaiyan@ideas2it.com>
Signed-off-by: Dan Sun <dsun20@bloomberg.net>
Co-authored-by: Dan Sun <dsun20@bloomberg.net>
---
 docs/modelserving/servingruntimes.md          | 114 +++++++++++++----
 docs/modelserving/v1beta1/lightgbm/README.md  | 115 +++++++----------
 .../modelserving/v1beta1/sklearn/v2/README.md | 105 +++++-----------
 docs/modelserving/v1beta1/xgboost/README.md   | 119 ++++++------------
 4 files changed, 207 insertions(+), 246 deletions(-)

diff --git a/docs/modelserving/servingruntimes.md b/docs/modelserving/servingruntimes.md
index 5f7101ddd..5bdd19989 100644
--- a/docs/modelserving/servingruntimes.md
+++ b/docs/modelserving/servingruntimes.md
@@ -54,27 +54,29 @@ This is demonstrated in the example for the [AMD Inference Server](./v1beta1/amd
 
 Available attributes in the `ServingRuntime` spec:
 
-| Attribute                          | Description                                                                                                                                                                                |
-| ---------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
-| `multiModel`                       | Whether this ServingRuntime is ModelMesh-compatible and intended for multi-model usage (as opposed to KServe single-model serving). Defaults to false                                      |
-| `disabled`                         | Disables this runtime                                                                                                                                                                      |
-| `containers`                       | List of containers associated with the runtime                                                                                                                                             |
-| `containers[ ].image`              | The container image for the current container                                                                                                                                              |
-| `containers[ ].command`            | Executable command found in the provided image                                                                                                                                             |
-| `containers[ ].args`               | List of command line arguments as strings                                                                                                                                                  |
-| `containers[ ].resources`          | Kubernetes [limits or requests](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#requests-and-limits)                                                        |
-| `containers[ ].env `               | List of environment variables to pass to the container                                                                                                                                     |
-| `containers[ ].imagePullPolicy`    | The container image pull policy                                                                                                                                                            |
-| `containers[ ].workingDir`         | The working directory for current container                                                                                                                                                |
-| `containers[ ].livenessProbe`      | Probe for checking container liveness                                                                                                                                                      |
-| `containers[ ].readinessProbe`     | Probe for checking container readiness                                                                                                                                                     |
-| `supportedModelFormats`            | List of model types supported by the current runtime                                                                                                                                       |
-| `supportedModelFormats[ ].name`    | Name of the model format                                                                                                                                                                   |
-| `supportedModelFormats[ ].version` | Version of the model format. Used in validating that a predictor is supported by a runtime. It is recommended to include only the major version here, for example "1" rather than "1.15.4" |
-| `storageHelper.disabled`           | Disables the storage helper                                                                                                                                                                |
-| `nodeSelector`                     | Influence Kubernetes scheduling to [assign pods to nodes](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/)                                                        |
-| `affinity`                         | Influence Kubernetes scheduling to [assign pods to nodes](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#affinity-and-anti-affinity)                             |
-| `tolerations`                      | Allow pods to be scheduled onto nodes [with matching taints](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration)                                                 |
+| Attribute                             | Description                                                                                                                                                                                                                                                                                                                                                                                                          |
+|---------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `multiModel`                          | Whether this ServingRuntime is ModelMesh-compatible and intended for multi-model usage (as opposed to KServe single-model serving). Defaults to false                                                                                                                                                                                                                                                                |
+| `disabled`                            | Disables this runtime                                                                                                                                                                                                                                                                                                                                                                                                |
+| `containers`                          | List of containers associated with the runtime                                                                                                                                                                                                                                                                                                                                                                       |
+| `containers[ ].image`                 | The container image for the current container                                                                                                                                                                                                                                                                                                                                                                        |
+| `containers[ ].command`               | Executable command found in the provided image                                                                                                                                                                                                                                                                                                                                                                       |
+| `containers[ ].args`                  | List of command line arguments as strings                                                                                                                                                                                                                                                                                                                                                                            |
+| `containers[ ].resources`             | Kubernetes [limits or requests](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#requests-and-limits)                                                                                                                                                                                                                                                                                  |
+| `containers[ ].env `                  | List of environment variables to pass to the container                                                                                                                                                                                                                                                                                                                                                               |
+| `containers[ ].imagePullPolicy`       | The container image pull policy                                                                                                                                                                                                                                                                                                                                                                                      |
+| `containers[ ].workingDir`            | The working directory for current container                                                                                                                                                                                                                                                                                                                                                                          |
+| `containers[ ].livenessProbe`         | Probe for checking container liveness                                                                                                                                                                                                                                                                                                                                                                                |
+| `containers[ ].readinessProbe`        | Probe for checking container readiness                                                                                                                                                                                                                                                                                                                                                                               |
+| `supportedModelFormats`               | List of model types supported by the current runtime                                                                                                                                                                                                                                                                                                                                                                 |
+| `supportedModelFormats[ ].name`       | Name of the model format                                                                                                                                                                                                                                                                                                                                                                                             |
+| `supportedModelFormats[ ].version`    | Version of the model format. Used in validating that a predictor is supported by a runtime. It is recommended to include only the major version here, for example "1" rather than "1.15.4"                                                                                                                                                                                                                           |
+| `supportedModelFormats[ ].autoselect` | Set to true to allow the ServingRuntime to be used for automatic model placement if this model format is specified with no explicit runtime. The default value is false.                                                                                                                                                                                                                                             |
+| `supportedModelFormats[ ].priority`   | Priority of this serving runtime for auto selection. This is used to select the serving runtime if more than one serving runtime supports the same model format. <br/>The value should be greater than zero. The higher the value, the higher the priority. Priority is not considered if AutoSelect is either false or not specified. Priority can be overridden by specifying the runtime in the InferenceService. |
+| `storageHelper.disabled`              | Disables the storage helper                                                                                                                                                                                                                                                                                                                                                                                          |
+| `nodeSelector`                        | Influence Kubernetes scheduling to [assign pods to nodes](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/)                                                                                                                                                                                                                                                                                  |
+| `affinity`                            | Influence Kubernetes scheduling to [assign pods to nodes](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#affinity-and-anti-affinity)                                                                                                                                                                                                                                                       |
+| `tolerations`                         | Allow pods to be scheduled onto nodes [with matching taints](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration)                                                                                                                                                                                                                                                                           |
 
 ModelMesh leverages additional fields not listed here. More information [here](https://github.com/kserve/modelmesh-serving/blob/main/docs/runtimes/custom_runtimes.md#spec-attributes).
 
@@ -112,7 +114,7 @@ by the runtime will be used for model deployment.
 
 ### Implicit: Automatic selection
 
-In each entry of the `supportedModelFormats` list, `autoSelect: true` can optionally be specified to indicate that that the given `ServingRuntime` can be
+In each entry of the `supportedModelFormats` list, `autoSelect: true` can optionally be specified to indicate that the given `ServingRuntime` can be
 considered for automatic selection for predictors with the corresponding model format if no runtime is explicitly specified.
 For example, the `kserve-sklearnserver` ClusterServingRuntime supports SKLearn version 1 and has `autoSelect` enabled:
 
@@ -162,9 +164,75 @@ spec:
 Then, then the version of the `supportedModelFormat` must also match. In this example, `kserve-sklearnserver` would not be eligible for selection since
 it only lists support for `sklearn` version `1`.
 
+#### Priority
+
+If more than one serving runtime supports the same `model format` with same `version` and also supports the same `protocolVersion` then, we can optionally specify `priority` for the serving runtime. 
+Based on the `priority` the runtime is automatically selected if no runtime is explicitly specified. Note that, `priority` is valid only if `autoSelect` is `true`. Higher value means higher priority.
+
+For example, let's consider the serving runtimes `mlserver` and `kserve-sklearnserver`. Both the serving runtimes supports the `sklearn` model format with version `1` and both supports
+the `protocolVersion` v2. Also note that `autoSelect` is enabled in both the serving runtimes.
+
+```yaml
+apiVersion: serving.kserve.io/v1alpha1
+kind: ClusterServingRuntime
+metadata:
+  name: kserve-sklearnserver
+spec:
+  protocolVersions:
+    - v1
+    - v2
+  supportedModelFormats:
+    - name: sklearn
+      version: "1"
+      autoSelect: true
+      priority: 1
+...
+```
+
+```yaml
+apiVersion: serving.kserve.io/v1alpha1
+kind: ClusterServingRuntime
+metadata:
+  name: mlserver
+spec:
+  protocolVersions:
+    - v2
+  supportedModelFormats:
+    - name: sklearn
+      version: "1"
+      autoSelect: true
+      priority: 2
+...
+```
+When the following InferenceService is deployed with no runtime specified, the controller will look for a runtime that supports `sklearn`:
+
+```yaml
+apiVersion: serving.kserve.io/v1beta1
+kind: InferenceService
+metadata:
+  name: example-sklearn-isvc
+spec:
+  predictor:
+    model:
+      protocolVersion: v2
+      modelFormat:
+        name: sklearn
+      storageUri: s3://bucket/sklearn/mnist.joblib
+```
+The controller will find the two runtimes `kserve-sklearnserver` and `mlserver` as both has an entry in its `supportedModelFormats` list with `sklearn` and `autoSelect: true`. 
+Now the runtime is sorted based on the priority by the controller as there are more than one supported runtime available. Since the `mlserver` has the higher `priority` value, this ClusterServingRuntime
+will be used for model deployment.
+
+**Constraints of priority**
+
+- The higher priority value means higher precedence. The value must be greater than 0.
+- The priority is valid only if auto select is enabled otherwise the priority is not considered.
+- The serving runtime with priority takes precedence over the serving runtime with priority not specified.
+- Two model formats with same name and same model version cannot have the same priority.
+- If more than one serving runtime supports the model format and none of them specified the priority then, there is no guarantee _which_ runtime will be selected.
 
 !!! warning
-    If multiple runtimes list the same format and/or version as auto-selectable, then there is no guarantee _which_ runtime will be selected.
+    If multiple runtimes list the same format and/or version as auto-selectable and the priority is not specified, the runtime is selected based on the `creationTimestamp` i.e. the most recently created runtime is selected. So there is no guarantee _which_ runtime will be selected.
     So users and cluster-administrators should enable `autoSelect` with care.
 
 ### Previous schema
diff --git a/docs/modelserving/v1beta1/lightgbm/README.md b/docs/modelserving/v1beta1/lightgbm/README.md
index 9c0eea23f..5cd5c16da 100644
--- a/docs/modelserving/v1beta1/lightgbm/README.md
+++ b/docs/modelserving/v1beta1/lightgbm/README.md
@@ -30,7 +30,7 @@ lgb_model.save_model(model_file)
 ## Deploy LightGBM model with V1 protocol
 
 ### Test the model locally
-Install and run the [LightGBM Server](https://github.com/kserve/kserve/python/lgbserver) using the trained model locally and test the prediction. 
+Install and run the [LightGBM Server](https://github.com/kserve/kserve/tree/master/python/lgbserver) using the trained model locally and test the prediction. 
 
 ```shell
 python -m lgbserver --model_dir /path/to/model_dir --model_name lgb
@@ -54,7 +54,7 @@ print(res.text)
 
 To deploy the model on Kubernetes you can create the InferenceService by specifying the `modelFormat` with `lightgbm` and `storageUri`. 
 
-=== "Old Schema"
+=== "New Schema"
 
     ```yaml
     apiVersion: "serving.kserve.io/v1beta1"
@@ -63,10 +63,12 @@ To deploy the model on Kubernetes you can create the InferenceService by specify
       name: "lightgbm-iris"
     spec:
       predictor:
-        lightgbm:
+        model:
+          modelFormat:
+            name: lightgbm
           storageUri: "gs://kfserving-examples/models/lightgbm/iris"
     ```
-=== "New Schema"
+=== "Old Schema"
 
     ```yaml
     apiVersion: "serving.kserve.io/v1beta1"
@@ -75,9 +77,7 @@ To deploy the model on Kubernetes you can create the InferenceService by specify
       name: "lightgbm-iris"
     spec:
       predictor:
-        model:
-          modelFormat:
-            name: lightgbm
+        lightgbm:
           storageUri: "gs://kfserving-examples/models/lightgbm/iris"
     ```
 
@@ -129,62 +129,46 @@ curl -v -H "Host: ${SERVICE_HOSTNAME}" http://${INGRESS_HOST}:${INGRESS_PORT}/v1
     {"predictions": [[0.9, 0.05, 0.05]]}
     ```
 
-## Deploy the model with [V2 protocol](https://github.com/kserve/kserve/tree/master/docs/predict-api/v2)
+## Deploy the model with [Open Inference Protocol](https://github.com/kserve/kserve/tree/master/docs/predict-api/v2)
 
 ### Test the model locally
-Once you've got your model serialized `model.bst`, we can then use [MLServer](https://github.com/SeldonIO/MLServer) which implements the KServe V2 inference protocol to spin up a local server. For more details on MLServer, please check the [LightGBM example doc](https://github.com/SeldonIO/MLServer/blob/master/docs/examples/lightgbm/README.md).
+Once you've got your model serialized `model.bst`, we can then use [KServe LightGBM Server](https://github.com/kserve/kserve/tree/master/python/lgbserver) to create a local model server.
 
-To run MLServer locally, you first install the `mlserver` package in your local environment, as well as the LightGBM runtime.
+!!! Note
+    This step is optional and just meant for testing, feel free to jump straight to [deploying with InferenceService](#deploy-with-inferenceservice).
 
-```bash
-pip install mlserver mlserver-lightgbm
-```
+#### Pre-requisites
 
-The next step is to provide the model settings so that MLServer knows:
+Firstly, to use kserve lightgbm server locally, you will first need to install the `lgbserver`
+runtime package in your local environment.
 
-- The inference runtime to serve your model (i.e. `mlserver_lightgbm.LightGBMModel`)
-- The model's name and version
+1. Clone the KServe repository and navigate into the directory.
+    ```bash
+    git clone https://github.com/kserve/kserve
+    ```
+2. Install `lgbserver` runtime. KServe uses [Poetry](https://python-poetry.org/) as the dependency management tool. Make sure you have already [installed poetry](https://python-poetry.org/docs/#installation).
+    ```bash
+    cd python/lgbserver
+    poetry install 
+    ```
+#### Serving model locally
 
-These can be specified through environment variables or by creating a local
-`model-settings.json` file:
+The `lgbserver` package takes three arguments.
 
-```json
-{
-  "name": "lightgbm-iris",
-  "version": "v1.0.0",
-  "implementation": "mlserver_lightgbm.LightGBMModel"
-}
-```
+- `--model_dir`: The model directory path where the model is stored.
+- `--model_name`: The name of the model deployed in the model server, the default value is `model`. This is optional. 
+- `--nthread`: Number of threads to use by LightGBM. This is optional and the default value is 1.
 
-With the `mlserver` package installed locally and a local `model-settings.json`
-file, you should now be ready to start our server as:
+With the `lgbserver` runtime package installed locally, you should now be ready to start our server as:
 
 ```bash
-mlserver start .
+python3 lgbserver --model_dir /path/to/model_dir --model_name lightgbm-iris
 ```
 
 ### Deploy InferenceService with REST endpoint
+To deploy the LightGBM model with Open Inference Protocol, you need to set the **`protocolVersion` field to `v2`**.
 
-When you deploy your model with `InferenceService` KServe injects sensible defaults so that it runs out-of-the-box without any
-further configuration. However, you can still override these defaults by providing a `model-settings.json` file similar to your local one.
-You can even provide a [set of `model-settings.json` files to load multiple models](https://github.com/SeldonIO/MLServer/tree/master/docs/examples/mms).
-
-To deploy the LightGBM model with V2 inference protocol, you need to set the **`protocolVersion` field to `v2`**.
-
-=== "Old Schema"
-
-    ```yaml
-    apiVersion: "serving.kserve.io/v1beta1"
-    kind: "InferenceService"
-    metadata:
-      name: "lightgbm-v2-iris"
-    spec:
-      predictor:
-        lightgbm:
-          protocolVersion: v2
-          storageUri: "gs://kfserving-examples/models/lightgbm/v2/iris"
-    ```
-=== "New Schema"
+=== "Schema"
 
     ```yaml
     apiVersion: "serving.kserve.io/v1beta1"
@@ -196,9 +180,12 @@ To deploy the LightGBM model with V2 inference protocol, you need to set the **`
         model:
           modelFormat:
             name: lightgbm
+          runtime: kserve-lgbserver
           protocolVersion: v2
           storageUri: "gs://kfserving-examples/models/lightgbm/v2/iris"
     ```
+!!! Note
+    For `V2 protocol (open inference protocol)` if `runtime` field is not provided then, by default `mlserver` runtime is used.
 
 Apply the InferenceService yaml to get the REST endpoint
 === "kubectl"
@@ -213,7 +200,7 @@ kubectl apply -f lightgbm-v2.yaml
     $ inferenceservice.serving.kserve.io/lightgbm-v2-iris created
     ```
 
-### Test the deployed model with curl
+#### Test the deployed model with curl
 
 You can now test your deployed model by sending a sample request.
 
@@ -276,24 +263,7 @@ curl -v \
 ### Create the InferenceService with gRPC endpoint
 Create the inference service yaml and expose the gRPC port, currently only one port is allowed to expose either HTTP or gRPC port and by default HTTP port is exposed.
 
-=== "Old Schema"
-
-    ```yaml
-    apiVersion: "serving.kserve.io/v1beta1"
-    kind: "InferenceService"
-    metadata:
-      name: "lightgbm-v2-iris"
-    spec:
-      predictor:
-        lightgbm:
-          protocolVersion: v2
-          storageUri: "gs://kfserving-examples/models/lightgbm/v2/iris"
-          ports:
-          - name: h2c
-            protocol: TCP
-            containerPort: 9000
-    ```
-=== "New Schema"
+=== "Yaml"
 
     ```yaml
     apiVersion: "serving.kserve.io/v1beta1"
@@ -306,12 +276,15 @@ Create the inference service yaml and expose the gRPC port, currently only one p
           modelFormat:
             name: lightgbm
           protocolVersion: v2
+          runtime: kserve-lgbserver
           storageUri: "gs://kfserving-examples/models/lightgbm/v2/iris"
           ports:
-          - name: h2c
-            protocol: TCP
-            containerPort: 9000
+            - name: h2c
+              protocol: TCP
+              containerPort: 8081
     ```
+!!! Note
+    For `V2 protocol (open inference protocol)` if `runtime` field is not provided then, by default `mlserver` runtime is used.
 
 Apply the InferenceService yaml to get the gRPC endpoint
 === "kubectl"
@@ -320,7 +293,7 @@ Apply the InferenceService yaml to get the gRPC endpoint
 kubectl apply -f lightgbm-v2-grpc.yaml
 ```
 
-### Test the deployed model with grpcurl
+#### Test the deployed model with grpcurl
 
 After the gRPC `InferenceService` becomes ready, [grpcurl](https://github.com/fullstorydev/grpcurl), can be used to send gRPC requests to the `InferenceService`.
 
diff --git a/docs/modelserving/v1beta1/sklearn/v2/README.md b/docs/modelserving/v1beta1/sklearn/v2/README.md
index 1a5e547d7..b42e54253 100644
--- a/docs/modelserving/v1beta1/sklearn/v2/README.md
+++ b/docs/modelserving/v1beta1/sklearn/v2/README.md
@@ -4,10 +4,9 @@ This example walks you through how to deploy a `scikit-learn` model leveraging
 the `v1beta1` version of the `InferenceService` CRD.
 Note that, by default the `v1beta1` version will expose your model through an
 API compatible with the existing V1 Dataplane.
-However, this example will show you how to serve a model through an API
-compatible with the new [V2 Dataplane](https://github.com/kserve/kserve/tree/master/docs/predict-api/v2).
+This example will show you how to serve a model through [Open Inference Protocol](https://github.com/kserve/open-inference-protocol).
 
-## Training
+## Train the Model
 
 The first step will be to train a sample `scikit-learn` model.
 Note that this model will be then saved as `model.joblib`.
@@ -26,79 +25,49 @@ clf.fit(X, y)
 dump(clf, 'model.joblib')
 ```
 
-## Testing locally
+## Test the Model locally
 
-Once you've got your model serialised `model.joblib`, we can then use
-[MLServer](https://github.com/SeldonIO/MLServer) to spin up a local server.
-For more details on MLServer, feel free to check the [SKLearn example doc](https://github.com/SeldonIO/MLServer/blob/master/docs/examples/sklearn/README.md).
+Once you've got your model serialised `model.joblib`, we can then use [KServe Sklearn Server](https://github.com/kserve/kserve/tree/master/python/sklearnserver) to spin up a local server.
 
 !!! Note
-    this step is optional and just meant for testing, feel free to jump straight to [deploying with InferenceService](#deploy-with-inferenceservice).
+    This step is optional and just meant for testing, feel free to jump straight to [deploying with InferenceService](#deploy-with-inferenceservice).
 
-### Pre-requisites
+### Using KServe SklearnServer
 
-Firstly, to use MLServer locally, you will first need to install the `mlserver`
-package in your local environment, as well as the SKLearn runtime.
+#### Pre-requisites
 
-```bash
-pip install mlserver mlserver-sklearn
-```
-
-### Model settings
-
-The next step will be providing some model settings so that
-MLServer knows:
+Firstly, to use KServe sklearn server locally, you will first need to install the `sklearnserver`
+runtime package in your local environment.
 
-- The inference runtime to serve your model (i.e. `mlserver_sklearn.SKLearnModel`)
-- The model's name and version
-
-These can be specified through environment variables or by creating a local
-`model-settings.json` file:
-
-```json
-{
-  "name": "sklearn-iris",
-  "version": "v1.0.0",
-  "implementation": "mlserver_sklearn.SKLearnModel"
-}
-```
+1. Clone the KServe repository and navigate into the directory.
+    ```bash
+    git clone https://github.com/kserve/kserve
+    ```
+2. Install `sklearnserver` runtime. Kserve uses [Poetry](https://python-poetry.org/) as the dependency management tool. Make sure you have already [installed poetry](https://python-poetry.org/docs/#installation).
+    ```bash
+    cd python/sklearnserver
+    poetry install 
+    ```
+#### Serving model locally
 
-Note that, when you [deploy your model](#deployment), **KServe will already
-inject some sensible defaults** so that it runs out-of-the-box without any
-further configuration.
-However, you can still override these defaults by providing a
-`model-settings.json` file similar to your local one.
-You can even provide a [set of `model-settings.json` files to load multiple
-models](https://github.com/SeldonIO/MLServer/tree/master/docs/examples/mms).
+The `sklearnserver` package takes two arguments.
 
-### Serving model locally
+- `--model_dir`: The model directory path where the model is stored.
+- `--model_name`: The name of the model deployed in the model server, the default value is `model`. This is optional. 
 
-With the `mlserver` package installed locally and a local `model-settings.json`
-file, you should now be ready to start our server as:
+With the `sklearnserver` runtime package installed locally, you should now be ready to start our server as:
 
 ```bash
-mlserver start .
+python3 sklearnserver --model_dir /path/to/model_dir --model_name sklearn-irisv2
 ```
 
-## Deploy with InferenceService 
+## Deploy the Model with InferenceService 
 
-Lastly, you will use KServe to deploy the trained model.
+Lastly, you will use KServe to deploy the trained model onto Kubernetes.
 For this, you will just need to use **version `v1beta1`** of the
 `InferenceService` CRD and set the **`protocolVersion` field to `v2`**.
 
-=== "Old Schema"
-    ```yaml
-    apiVersion: "serving.kserve.io/v1beta1"
-    kind: "InferenceService"
-    metadata:
-      name: "sklearn-irisv2"
-    spec:
-      predictor:
-        sklearn:
-          protocolVersion: "v2"
-          storageUri: "gs://seldon-models/sklearn/mms/lr_model"
-    ```
-=== "New Schema"
+=== "Yaml"
     ```yaml
     apiVersion: "serving.kserve.io/v1beta1"
     kind: "InferenceService"
@@ -109,30 +78,24 @@ For this, you will just need to use **version `v1beta1`** of the
         model:
           modelFormat:
             name: sklearn
-          runtime: kserve-mlserver
-          storageUri: "gs://seldon-models/sklearn/mms/lr_model"
+          protocolVersion: v2
+          runtime: kserve-sklearnserver
+          storageUri: "gs://kfserving-examples/models/sklearn/1.0/model"
     ```
 
-Note that this makes the following assumptions:
-
-- Your model weights (i.e. your `model.joblib` file) have already been uploaded
-  to a "model repository" (GCS in this example) and can be accessed as
-  `gs://seldon-models/sklearn/mms/lr_model`.
-- There is a K8s cluster available, accessible through `kubectl`.
-- KServe has already been [installed in your cluster](../../../../get_started/README.md).
-
+!!! Note
+    For `V2 protocol (open inference protocol)` if `runtime` field is not provided then, by default `mlserver` runtime is used.
 
 === "kubectl"
     ```bash
     kubectl apply -f ./sklearn.yaml
     ```
 
-## Testing deployed model
+## Test the Deployed Model
 
 You can now test your deployed model by sending a sample request.
 
-Note that this request **needs to follow the [V2 Dataplane
-protocol](https://github.com/kserve/kserve/tree/master/docs/predict-api/v2)**.
+Note that this request **needs to follow the [Open Inference Protocol](https://github.com/kserve/open-inference-protocol)**.
 You can see an example payload below:
 
 ```json
diff --git a/docs/modelserving/v1beta1/xgboost/README.md b/docs/modelserving/v1beta1/xgboost/README.md
index 864441652..8fc913b78 100644
--- a/docs/modelserving/v1beta1/xgboost/README.md
+++ b/docs/modelserving/v1beta1/xgboost/README.md
@@ -1,13 +1,10 @@
 # Deploying XGBoost models with InferenceService
 
-This example walks you through how to deploy a `xgboost` model leveraging the
-`v1beta1` version of the `InferenceService` CRD.
-Note that, by default the `v1beta1` version will expose your model through an
-API compatible with the existing V1 Dataplane.
-However, this example will show you how to serve a model through an API
-compatible with the new [V2 Dataplane](https://github.com/kserve/kserve/tree/master/docs/predict-api/v2).
+This example walks you through how to deploy a `xgboost` model using KServe's `InferenceService` CRD.
+Note that, by default it exposes your model through an API compatible with the existing V1 Dataplane. This example will show you how to serve a model through an API
+compatible with the [Open Inference Protocol](https://github.com/kserve/open-inference-protocol).
 
-## Training
+## Train the Model
 
 The first step will be to train a sample `xgboost` model.
 We will save this model as `model.bst`.
@@ -36,80 +33,47 @@ model_file = os.path.join((model_dir), BST_FILE)
 xgb_model.save_model(model_file)
 ```
 
-## Testing locally
+### Test the model locally
+Once you've got your model serialized `model.bst`, we can then use [KServe XGBoost Server](https://github.com/kserve/kserve/tree/master/python/xgbserver) to spin up a local server.
 
-Once we've got our `model.bst` model serialised, we can then use
-[MLServer](https://github.com/SeldonIO/MLServer) to spin up a local server.
-For more details on MLServer, feel free to check the [XGBoost example in their
-docs](https://github.com/SeldonIO/MLServer/tree/master/docs/examples/xgboost).
+!!! Note
+    This step is optional and just meant for testing, feel free to jump straight to [deploying with InferenceService](#deploy-with-inferenceservice).
 
-> Note that this step is optional and just meant for testing.
-> Feel free to jump straight to [deploying your trained model](#deployment).
+#### Pre-requisites
 
-### Pre-requisites
+Firstly, to use kserve xgboost server locally, you will first need to install the `xgbserver` runtime package in your local environment.
 
-Firstly, to use MLServer locally, you will first need to install the `mlserver`
-package in your local environment as well as the XGBoost runtime.
-
-```bash
-pip install mlserver mlserver-xgboost
-```
-
-### Model settings
+1. Clone the Kserve repository and navigate into the directory.
+    ```bash
+    git clone https://github.com/kserve/kserve
+    ```
+2. Install `xgbserver` runtime. Kserve uses [Poetry](https://python-poetry.org/) as the dependency management tool. Make sure you have already [installed poetry](https://python-poetry.org/docs/#installation).
+    ```bash
+    cd python/xgbserver
+    poetry install 
+    ```
+#### Serving model locally
 
-The next step will be providing some model settings so that
-MLServer knows:
+The `xgbserver` package takes three arguments.
 
-- The inference runtime that we want our model to use (i.e.
-  `mlserver_xgboost.XGBoostModel`)
-- Our model's name and version
+- `--model_dir`: The model directory path where the model is stored.
+- `--model_name`: The name of the model deployed in the model server, the default value is `model`. This is optional. 
+- `--nthread`: Number of threads to use by LightGBM. This is optional and the default value is 1.
 
-These can be specified through environment variables or by creating a local
-`model-settings.json` file:
+With the `xgbserver` runtime package installed locally, you should now be ready to start our server as:
 
-```json
-{
-  "name": "xgboost-iris",
-  "version": "v1.0.0",
-  "implementation": "mlserver_xgboost.XGBoostModel"
-}
+```bash
+python3 xgbserver --model_dir /path/to/model_dir --model_name xgboost-iris
 ```
 
-Note that, when we [deploy our model](#deployment), **KServe will already
-inject some sensible defaults** so that it runs out-of-the-box without any
-further configuration.
-However, you can still override these defaults by providing a
-`model-settings.json` file similar to your local one.
-You can even provide a [set of `model-settings.json` files to load multiple
-models](https://github.com/SeldonIO/MLServer/tree/master/docs/examples/mms).
 
-### Serving our model locally
+## Deploy the Model with InferenceService
 
-With the `mlserver` package installed locally and a local `model-settings.json`
-file, we should now be ready to start our server as:
+Lastly, we use KServe to deploy our trained model on Kubernetes.
+For this, we use the `InferenceService` CRD and set the **`protocolVersion` field to `v2`**.
 
-```bash
-mlserver start .
-```
+=== "Yaml"
 
-## Deploy with InferenceService
-
-Lastly, we will use KServe to deploy our trained model.
-For this, we will just need to use **version `v1beta1`** of the
-`InferenceService` CRD and set the the **`protocolVersion` field to `v2`**.
-=== "Old Schema"
-    ```yaml
-    apiVersion: "serving.kserve.io/v1beta1"
-    kind: "InferenceService"
-    metadata:
-      name: "xgboost-iris"
-    spec:
-      predictor:
-        xgboost:
-          protocolVersion: "v2"
-          storageUri: "gs://kfserving-examples/models/xgboost/iris"
-    ```
-=== "New Schema"
     ```yaml
     apiVersion: "serving.kserve.io/v1beta1"
     kind: "InferenceService"
@@ -120,31 +84,24 @@ For this, we will just need to use **version `v1beta1`** of the
         model:
           modelFormat:
             name: xgboost
-          runtime: kserve-mlserver
+          protocolVersion: v2
+          runtime: kserve-xgbserver
           storageUri: "gs://kfserving-examples/models/xgboost/iris"
     ```
-Note that this makes the following assumptions:
-
-- Your model weights (i.e. your `model.bst` file) have already been uploaded
-  to a "model repository" (GCS in this example) and can be accessed as
-  `gs://kfserving-examples/models/xgboost/iris`.
-- There is a K8s cluster available, accessible through `kubectl`.
-- KServe has already been [installed in your
-  cluster](../../../get_started/README.md#4-Install-kserve).
+!!! Note
+    For `V2 protocol (open inference protocol)` if `runtime` field is not provided then, by default `mlserver` runtime is used.
 
-Assuming that we've got a cluster accessible through `kubectl` with KServe
-already installed, we can deploy our model as:
+Assuming that we've got a cluster accessible through `kubectl` with KServe already installed, we can deploy our model as:
 
 ```bash
 kubectl apply -f xgboost.yaml
 ```
 
-## Testing deployed model
+## Test the Deployed Model
 
 We can now test our deployed model by sending a sample request.
 
-Note that this request **needs to follow the [V2 Dataplane
-protocol](https://github.com/kserve/kserve/tree/master/docs/predict-api/v2)**.
+Note that this request **needs to follow the [Open Inference Protocol](https://github.com/kserve/open-inference-protocol)**.
 You can see an example payload below:
 
 ```json