Skip to content

Commit

Permalink
Standardized schema order (kserve#318)
Browse files Browse the repository at this point in the history
* Standardized schema's order.

Signed-off-by: Andrews Arokiam <[email protected]>

* Fix v2 spec for torch serve

---------

Signed-off-by: Andrews Arokiam <[email protected]>
Signed-off-by: Dan Sun <[email protected]>
Co-authored-by: Dan Sun <[email protected]>
  • Loading branch information
andyi2it and yuzisun authored Jan 27, 2024
1 parent 03c546c commit fc1ba3e
Show file tree
Hide file tree
Showing 11 changed files with 156 additions and 153 deletions.
16 changes: 8 additions & 8 deletions docs/admin/serverless/kourier_networking/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ Please refer to the [Serverless Installation Guide](../serverless.md) and change

### Create the InferenceService

=== "Old Schema"
=== "New Schema"

```yaml
apiVersion: "serving.kserve.io/v1beta1"
Expand All @@ -72,10 +72,12 @@ Please refer to the [Serverless Installation Guide](../serverless.md) and change
name: "pmml-demo"
spec:
predictor:
pmml:
storageUri: gs://kfserving-examples/models/pmml
model:
modelFormat:
name: pmml
storageUri: "gs://kfserving-examples/models/pmml"
```
=== "New Schema"
=== "Old Schema"

```yaml
apiVersion: "serving.kserve.io/v1beta1"
Expand All @@ -84,10 +86,8 @@ Please refer to the [Serverless Installation Guide](../serverless.md) and change
name: "pmml-demo"
spec:
predictor:
model:
modelFormat:
name: pmml
storageUri: "gs://kfserving-examples/models/pmml"
pmml:
storageUri: gs://kfserving-examples/models/pmml
```

```bash
Expand Down
113 changes: 57 additions & 56 deletions docs/modelserving/autoscaling/autoscaling.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,35 +9,35 @@ Apply the tensorflow example CR with scaling target set to 1. Annotation `autosc
The `scaleTarget` and `scaleMetric` are introduced in version 0.9 of kserve and should be available in both new and old schema.
This is the preferred way of defining autoscaling options.

=== "Old Schema"
=== "New Schema"

```yaml
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "flowers-sample"
annotations:
autoscaling.knative.dev/target: "1"
spec:
predictor:
tensorflow:
scaleTarget: 1
scaleMetric: concurrency
model:
modelFormat:
name: tensorflow
storageUri: "gs://kfserving-examples/models/tensorflow/flowers"
```

=== "New Schema"
=== "Old Schema"

```yaml
apiVersion: "serving.kserve.io/v1beta1"
kind: "InferenceService"
metadata:
name: "flowers-sample"
annotations:
autoscaling.knative.dev/target: "1"
spec:
predictor:
scaleTarget: 1
scaleMetric: concurrency
model:
modelFormat:
name: tensorflow
tensorflow:
storageUri: "gs://kfserving-examples/models/tensorflow/flowers"
```

Expand Down Expand Up @@ -238,7 +238,8 @@ Autoscaling on GPU is hard with GPU metrics, however thanks to Knative's concurr
### Create the InferenceService with GPU resource

Apply the tensorflow gpu example CR
=== "Old Schema"

=== "New Schema"

```yaml
apiVersion: "serving.kserve.io/v1beta1"
Expand All @@ -247,15 +248,17 @@ Apply the tensorflow gpu example CR
name: "flowers-sample-gpu"
spec:
predictor:
tensorflow:
model:
modelFormat:
name: tensorflow
storageUri: "gs://kfserving-examples/models/tensorflow/flowers"
runtimeVersion: "2.6.2-gpu"
resources:
limits:
nvidia.com/gpu: 1
```

=== "New Schema"
=== "Old Schema"

```yaml
apiVersion: "serving.kserve.io/v1beta1"
Expand All @@ -264,9 +267,7 @@ Apply the tensorflow gpu example CR
name: "flowers-sample-gpu"
spec:
predictor:
model:
modelFormat:
name: tensorflow
tensorflow:
storageUri: "gs://kfserving-examples/models/tensorflow/flowers"
runtimeVersion: "2.6.2-gpu"
resources:
Expand Down Expand Up @@ -350,7 +351,7 @@ hey -z 30s -c 5 -m POST -host ${SERVICE_HOSTNAME} -D $INPUT_PATH http://${INGRES
at any given time, it is a hard limit and if the concurrency reaches the hard limit surplus requests will be buffered and must wait until
enough capacity is free to execute the requests.

=== "Old Schema"
=== "New Schema"

```yaml
apiVersion: "serving.kserve.io/v1beta1"
Expand All @@ -360,11 +361,13 @@ enough capacity is free to execute the requests.
spec:
predictor:
containerConcurrency: 10
tensorflow:
model:
modelFormat:
name: tensorflow
storageUri: "gs://kfserving-examples/models/tensorflow/flowers"
```

=== "New Schema"
=== "Old Schema"

```yaml
apiVersion: "serving.kserve.io/v1beta1"
Expand All @@ -374,9 +377,7 @@ enough capacity is free to execute the requests.
spec:
predictor:
containerConcurrency: 10
model:
modelFormat:
name: tensorflow
tensorflow:
storageUri: "gs://kfserving-examples/models/tensorflow/flowers"
```

Expand All @@ -392,7 +393,7 @@ kubectl apply -f autoscale-custom.yaml
KServe by default sets `minReplicas` to 1, if you want to enable scaling down to zero especially for use cases like serving on GPUs you can
set `minReplicas` to 0 so that the pods automatically scale down to zero when no traffic is received.

=== "Old Schema"
=== "New Schema"

```yaml
apiVersion: "serving.kserve.io/v1beta1"
Expand All @@ -402,11 +403,13 @@ set `minReplicas` to 0 so that the pods automatically scale down to zero when no
spec:
predictor:
minReplicas: 0
tensorflow:
model:
modelFormat:
name: tensorflow
storageUri: "gs://kfserving-examples/models/tensorflow/flowers"
```

=== "New Schema"
=== "Old Schema"

```yaml
apiVersion: "serving.kserve.io/v1beta1"
Expand All @@ -416,9 +419,7 @@ set `minReplicas` to 0 so that the pods automatically scale down to zero when no
spec:
predictor:
minReplicas: 0
model:
modelFormat:
name: tensorflow
tensorflow:
storageUri: "gs://kfserving-examples/models/tensorflow/flowers"
```

Expand All @@ -434,34 +435,6 @@ kubectl apply -f scale-down-to-zero.yaml
Autoscaling options can also be configured at the component level.
This allows more flexibility in terms of the autoscaling configuration. In a typical deployment, transformers may require a different autoscaling configuration than a predictor. This feature allows the user to scale individual components as required.

=== "Old Schema"

```yaml
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: torch-transformer
spec:
predictor:
scaleTarget: 2
scaleMetric: concurrency
pytorch:
storageUri: gs://kfserving-examples/models/torchserve/image_classifier
transformer:
scaleTarget: 8
scaleMetric: rps
containers:
- image: kserve/image-transformer:latest
name: kserve-container
command:
- "python"
- "-m"
- "model"
args:
- --model_name
- mnist
```

=== "New Schema"

```yaml
Expand Down Expand Up @@ -491,5 +464,33 @@ This allows more flexibility in terms of the autoscaling configuration. In a typ
- --model_name
- mnist
```

=== "Old Schema"

```yaml
apiVersion: serving.kserve.io/v1beta1
kind: InferenceService
metadata:
name: torch-transformer
spec:
predictor:
scaleTarget: 2
scaleMetric: concurrency
pytorch:
storageUri: gs://kfserving-examples/models/torchserve/image_classifier
transformer:
scaleTarget: 8
scaleMetric: rps
containers:
- image: kserve/image-transformer:latest
name: kserve-container
command:
- "python"
- "-m"
- "model"
args:
- --model_name
- mnist
```
Apply the `autoscale-adv.yaml` to create the Autoscale InferenceService.
The default for scaleMetric is `concurrency` and possible values are `concurrency`, `rps`, `cpu` and `memory`.
12 changes: 6 additions & 6 deletions docs/modelserving/batcher/batcher.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ This batcher is implemented in the KServe model agent sidecar, so the requests f

We first create a pytorch predictor with a batcher. The `maxLatency` is set to a big value (500 milliseconds) to make us be able to observe the batching process.

=== "Old Schema"
=== "New Schema"

```yaml
apiVersion: serving.kserve.io/v1beta1
Expand All @@ -33,11 +33,13 @@ We first create a pytorch predictor with a batcher. The `maxLatency` is set to a
batcher:
maxBatchSize: 32
maxLatency: 500
pytorch:
model:
modelFormat:
name: pytorch
storageUri: gs://kfserving-examples/models/torchserve/image_classifier/v1
```

=== "New Schema"
=== "Old Schema"

```yaml
apiVersion: serving.kserve.io/v1beta1
Expand All @@ -51,9 +53,7 @@ We first create a pytorch predictor with a batcher. The `maxLatency` is set to a
batcher:
maxBatchSize: 32
maxLatency: 500
model:
modelFormat:
name: pytorch
pytorch:
storageUri: gs://kfserving-examples/models/torchserve/image_classifier/v1
```

Expand Down
24 changes: 12 additions & 12 deletions docs/modelserving/logger/logger.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ kubectl create -f message-dumper.yaml

Create a sklearn predictor with the logger which points at the message dumper url.

=== "Old Schema"
=== "New Schema"

```yaml
apiVersion: serving.kserve.io/v1beta1
Expand All @@ -40,11 +40,13 @@ Create a sklearn predictor with the logger which points at the message dumper ur
logger:
mode: all
url: http://message-dumper.default/
sklearn:
model:
modelFormat:
name: sklearn
storageUri: gs://kfserving-examples/models/sklearn/1.0/model
```

=== "New Schema"
=== "Old Schema"

```yaml
apiVersion: serving.kserve.io/v1beta1
Expand All @@ -56,9 +58,7 @@ Create a sklearn predictor with the logger which points at the message dumper ur
logger:
mode: all
url: http://message-dumper.default/
model:
modelFormat:
name: sklearn
sklearn:
storageUri: gs://kfserving-examples/models/sklearn/1.0/model
```

Expand Down Expand Up @@ -233,7 +233,7 @@ kubectl create -f trigger.yaml

Create a sklearn predictor with the `logger url` pointing to the Knative eventing multi-tenant broker in `knative-eventing` namespace.

=== "Old Schema"
=== "New Schema"

```yaml
apiVersion: serving.kserve.io/v1beta1
Expand All @@ -246,11 +246,13 @@ Create a sklearn predictor with the `logger url` pointing to the Knative eventin
logger:
mode: all
url: http://broker-ingress.knative-eventing.svc.cluster.local/default/default
sklearn:
model:
modelFormat:
name: sklearn
storageUri: gs://kfserving-examples/models/sklearn/1.0/model
```

=== "New Schema"
=== "Old Schema"

```yaml
apiVersion: serving.kserve.io/v1beta1
Expand All @@ -263,9 +265,7 @@ Create a sklearn predictor with the `logger url` pointing to the Knative eventin
logger:
mode: all
url: http://broker-ingress.knative-eventing.svc.cluster.local/default/default
model:
modelFormat:
name: sklearn
sklearn:
storageUri: gs://kfserving-examples/models/sklearn/1.0/model
```

Expand Down
Loading

0 comments on commit fc1ba3e

Please sign in to comment.