Standardized schema order (kserve#318)

* Standardized schema's order. Signed-off-by: Andrews Arokiam <[email protected]> * Fix v2 spec for torch serve --------- Signed-off-by: Andrews Arokiam <[email protected]> Signed-off-by: Dan Sun <[email protected]> Co-authored-by: Dan Sun <[email protected]>
bmopuri · Jan 27, 2024 · fc1ba3e · fc1ba3e
1 parent 03c546c
commit fc1ba3e
Show file tree

Hide file tree

Showing 11 changed files with 156 additions and 153 deletions.
diff --git a/docs/admin/serverless/kourier_networking/README.md b/docs/admin/serverless/kourier_networking/README.md
@@ -63,7 +63,7 @@ Please refer to the [Serverless Installation Guide](../serverless.md) and change
 
 ### Create the InferenceService
 
-=== "Old Schema"
+=== "New Schema"
 
     ```yaml
     apiVersion: "serving.kserve.io/v1beta1"
@@ -72,10 +72,12 @@ Please refer to the [Serverless Installation Guide](../serverless.md) and change
       name: "pmml-demo"
     spec:
       predictor:
-        pmml:
-          storageUri: gs://kfserving-examples/models/pmml
+        model:
+          modelFormat:
+            name: pmml
+          storageUri: "gs://kfserving-examples/models/pmml"
     ```
-=== "New Schema"
+=== "Old Schema"
 
     ```yaml
     apiVersion: "serving.kserve.io/v1beta1"
@@ -84,10 +86,8 @@ Please refer to the [Serverless Installation Guide](../serverless.md) and change
       name: "pmml-demo"
     spec:
       predictor:
-        model:
-          modelFormat:
-            name: pmml
-          storageUri: "gs://kfserving-examples/models/pmml"
+        pmml:
+          storageUri: gs://kfserving-examples/models/pmml
     ```
 
 ```bash

diff --git a/docs/modelserving/autoscaling/autoscaling.md b/docs/modelserving/autoscaling/autoscaling.md
@@ -9,35 +9,35 @@ Apply the tensorflow example CR with scaling target set to 1. Annotation `autosc
 The `scaleTarget` and `scaleMetric` are introduced in version 0.9 of kserve and should be available in both new and old schema.
 This is the preferred way of defining autoscaling options.
 
-=== "Old Schema"
+=== "New Schema"
 
     ```yaml
     apiVersion: "serving.kserve.io/v1beta1"
     kind: "InferenceService"
     metadata:
       name: "flowers-sample"
-      annotations:
-        autoscaling.knative.dev/target: "1"
     spec:
       predictor:
-        tensorflow:
+        scaleTarget: 1
+        scaleMetric: concurrency
+        model:
+          modelFormat:
+            name: tensorflow
           storageUri: "gs://kfserving-examples/models/tensorflow/flowers"
     ```
 
-=== "New Schema"
+=== "Old Schema"
 
     ```yaml
     apiVersion: "serving.kserve.io/v1beta1"
     kind: "InferenceService"
     metadata:
       name: "flowers-sample"
+      annotations:
+        autoscaling.knative.dev/target: "1"
     spec:
       predictor:
-        scaleTarget: 1
-        scaleMetric: concurrency
-        model:
-          modelFormat:
-            name: tensorflow
+        tensorflow:
           storageUri: "gs://kfserving-examples/models/tensorflow/flowers"
     ```
 
@@ -238,7 +238,8 @@ Autoscaling on GPU is hard with GPU metrics, however thanks to Knative's concurr
 ### Create the InferenceService with GPU resource
 
 Apply the tensorflow gpu example CR
-=== "Old Schema"
+
+=== "New Schema"
 
     ```yaml
     apiVersion: "serving.kserve.io/v1beta1"
@@ -247,15 +248,17 @@ Apply the tensorflow gpu example CR
       name: "flowers-sample-gpu"
     spec:
       predictor:
-        tensorflow:
+        model:
+          modelFormat:
+            name: tensorflow
           storageUri: "gs://kfserving-examples/models/tensorflow/flowers"
           runtimeVersion: "2.6.2-gpu"
           resources:
             limits:
               nvidia.com/gpu: 1
     ```
 
-=== "New Schema"
+=== "Old Schema"
 
     ```yaml
     apiVersion: "serving.kserve.io/v1beta1"
@@ -264,9 +267,7 @@ Apply the tensorflow gpu example CR
       name: "flowers-sample-gpu"
     spec:
       predictor:
-        model:
-          modelFormat:
-            name: tensorflow
+        tensorflow:
           storageUri: "gs://kfserving-examples/models/tensorflow/flowers"
           runtimeVersion: "2.6.2-gpu"
           resources:
@@ -350,7 +351,7 @@ hey -z 30s -c 5 -m POST -host ${SERVICE_HOSTNAME} -D $INPUT_PATH http://${INGRES
 at any given time, it is a hard limit and if the concurrency reaches the hard limit surplus requests will be buffered and must wait until
 enough capacity is free to execute the requests.
 
-=== "Old Schema"
+=== "New Schema"
 
     ```yaml
     apiVersion: "serving.kserve.io/v1beta1"
@@ -360,11 +361,13 @@ enough capacity is free to execute the requests.
     spec:
       predictor:
         containerConcurrency: 10
-        tensorflow:
+        model:
+          modelFormat:
+            name: tensorflow
           storageUri: "gs://kfserving-examples/models/tensorflow/flowers"
     ```
 
-=== "New Schema"
+=== "Old Schema"
 
     ```yaml
     apiVersion: "serving.kserve.io/v1beta1"
@@ -374,9 +377,7 @@ enough capacity is free to execute the requests.
     spec:
       predictor:
         containerConcurrency: 10
-        model:
-          modelFormat:
-            name: tensorflow
+        tensorflow:
           storageUri: "gs://kfserving-examples/models/tensorflow/flowers"
     ```
 
@@ -392,7 +393,7 @@ kubectl apply -f autoscale-custom.yaml
 KServe by default sets `minReplicas` to 1, if you want to enable scaling down to zero especially for use cases like serving on GPUs you can
 set `minReplicas` to 0 so that the pods automatically scale down to zero when no traffic is received.
 
-=== "Old Schema"
+=== "New Schema"
 
     ```yaml
     apiVersion: "serving.kserve.io/v1beta1"
@@ -402,11 +403,13 @@ set `minReplicas` to 0 so that the pods automatically scale down to zero when no
     spec:
       predictor:
         minReplicas: 0
-        tensorflow:
+        model:
+          modelFormat:
+            name: tensorflow
           storageUri: "gs://kfserving-examples/models/tensorflow/flowers"
     ```
 
-=== "New Schema"
+=== "Old Schema"
 
     ```yaml
     apiVersion: "serving.kserve.io/v1beta1"
@@ -416,9 +419,7 @@ set `minReplicas` to 0 so that the pods automatically scale down to zero when no
     spec:
       predictor:
         minReplicas: 0
-        model:
-          modelFormat:
-            name: tensorflow
+        tensorflow:
           storageUri: "gs://kfserving-examples/models/tensorflow/flowers"
     ```
 
@@ -434,34 +435,6 @@ kubectl apply -f scale-down-to-zero.yaml
 Autoscaling options can also be configured at the component level.
 This allows more flexibility in terms of the autoscaling configuration. In a typical deployment, transformers may require a different autoscaling configuration than a predictor. This feature allows the user to scale individual components as required.
 
-=== "Old Schema"
-
-    ```yaml
-      apiVersion: serving.kserve.io/v1beta1
-      kind: InferenceService
-      metadata:
-        name: torch-transformer  
-      spec:
-        predictor:
-          scaleTarget: 2
-          scaleMetric: concurrency
-          pytorch:
-            storageUri: gs://kfserving-examples/models/torchserve/image_classifier
-        transformer:
-          scaleTarget: 8
-          scaleMetric: rps
-          containers:
-            - image: kserve/image-transformer:latest
-              name: kserve-container
-              command:
-                - "python"
-                - "-m"
-                - "model"
-              args:
-                - --model_name
-                - mnist
-    ```
-
 === "New Schema"
 
     ```yaml
@@ -491,5 +464,33 @@ This allows more flexibility in terms of the autoscaling configuration. In a typ
               - --model_name
               - mnist
     ```
+
+=== "Old Schema"
+
+    ```yaml
+    apiVersion: serving.kserve.io/v1beta1
+    kind: InferenceService
+    metadata:
+      name: torch-transformer  
+    spec:
+      predictor:
+        scaleTarget: 2
+        scaleMetric: concurrency
+        pytorch:
+          storageUri: gs://kfserving-examples/models/torchserve/image_classifier
+      transformer:
+        scaleTarget: 8
+        scaleMetric: rps
+        containers:
+          - image: kserve/image-transformer:latest
+            name: kserve-container
+            command:
+              - "python"
+              - "-m"
+              - "model"
+            args:
+              - --model_name
+              - mnist
+    ```
 Apply the `autoscale-adv.yaml` to create the Autoscale InferenceService.
 The default for scaleMetric is `concurrency` and possible values are `concurrency`, `rps`, `cpu` and `memory`.
diff --git a/docs/modelserving/batcher/batcher.md b/docs/modelserving/batcher/batcher.md
@@ -19,7 +19,7 @@ This batcher is implemented in the KServe model agent sidecar, so the requests f
 
 We first create a pytorch predictor with a batcher. The `maxLatency` is set to a big value (500 milliseconds) to make us be able to observe the batching process.
 
-=== "Old Schema"
+=== "New Schema"
 
     ```yaml
     apiVersion: serving.kserve.io/v1beta1
@@ -33,11 +33,13 @@ We first create a pytorch predictor with a batcher. The `maxLatency` is set to a
         batcher:
           maxBatchSize: 32
           maxLatency: 500
-        pytorch:
+        model:
+          modelFormat:
+            name: pytorch
           storageUri: gs://kfserving-examples/models/torchserve/image_classifier/v1
     ```
 
-=== "New Schema"
+=== "Old Schema"
 
     ```yaml
     apiVersion: serving.kserve.io/v1beta1
@@ -51,9 +53,7 @@ We first create a pytorch predictor with a batcher. The `maxLatency` is set to a
         batcher:
           maxBatchSize: 32
           maxLatency: 500
-        model:
-          modelFormat:
-            name: pytorch
+        pytorch:
           storageUri: gs://kfserving-examples/models/torchserve/image_classifier/v1
     ```
 

diff --git a/docs/modelserving/logger/logger.md b/docs/modelserving/logger/logger.md
@@ -28,7 +28,7 @@ kubectl create -f message-dumper.yaml
 
 Create a sklearn predictor with the logger which points at the message dumper url.
 
-=== "Old Schema"
+=== "New Schema"
 
     ```yaml
     apiVersion: serving.kserve.io/v1beta1
@@ -40,11 +40,13 @@ Create a sklearn predictor with the logger which points at the message dumper ur
         logger:
           mode: all
           url: http://message-dumper.default/
-        sklearn:
+        model:
+          modelFormat:
+            name: sklearn
           storageUri: gs://kfserving-examples/models/sklearn/1.0/model
     ```
 
-=== "New Schema"
+=== "Old Schema"
 
     ```yaml
     apiVersion: serving.kserve.io/v1beta1
@@ -56,9 +58,7 @@ Create a sklearn predictor with the logger which points at the message dumper ur
         logger:
           mode: all
           url: http://message-dumper.default/
-        model:
-          modelFormat:
-            name: sklearn
+        sklearn:
           storageUri: gs://kfserving-examples/models/sklearn/1.0/model
     ```
 
@@ -233,7 +233,7 @@ kubectl create -f trigger.yaml
 
 Create a sklearn predictor with the `logger url` pointing to the Knative eventing multi-tenant broker in `knative-eventing` namespace.
 
-=== "Old Schema"
+=== "New Schema"
 
     ```yaml
     apiVersion: serving.kserve.io/v1beta1
@@ -246,11 +246,13 @@ Create a sklearn predictor with the `logger url` pointing to the Knative eventin
         logger:
           mode: all
           url: http://broker-ingress.knative-eventing.svc.cluster.local/default/default
-        sklearn:
+        model:
+          modelFormat:
+            name: sklearn
           storageUri: gs://kfserving-examples/models/sklearn/1.0/model
     ```
 
-=== "New Schema"
+=== "Old Schema"
 
     ```yaml
     apiVersion: serving.kserve.io/v1beta1
@@ -263,9 +265,7 @@ Create a sklearn predictor with the `logger url` pointing to the Knative eventin
         logger:
           mode: all
           url: http://broker-ingress.knative-eventing.svc.cluster.local/default/default
-        model:
-          modelFormat:
-            name: sklearn
+        sklearn:
           storageUri: gs://kfserving-examples/models/sklearn/1.0/model
     ```