Skip to content

Commit

Permalink
Fix torchserve llm example link
Browse files Browse the repository at this point in the history
Signed-off-by: Dan Sun <[email protected]>
  • Loading branch information
yuzisun authored Nov 18, 2023
1 parent c8f6a1e commit daaa70a
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions docs/blog/articles/2023-10-08-KServe-0.11-release.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,11 +84,11 @@ While `pip install` still works, we highly recommend using poetry to ensure pre

### LLM Runtimes

### TorchServe LLM Runtime
#### TorchServe LLM Runtime
KServe now integrates with TorchServe 0.8, offering the support for [LLM models](https://pytorch.org/serve/large_model_inference.html) that may not fit onto a single GPU.
Huggingface Accelerate and Deepspeed are available options to split the model into multiple partitions over multiple GPUs. You can see the [detailed example](../../modelserving/v1beta1/llm/) for how to serve the LLM on KServe with TorchServe runtime.
Huggingface Accelerate and Deepspeed are available options to split the model into multiple partitions over multiple GPUs. You can see the [detailed example](../../modelserving/v1beta1/llm/torchserve/accelerate/README.md) for how to serve the LLM on KServe with TorchServe runtime.

### vLLM Runtime
#### vLLM Runtime
Serving LLM models can be surprisingly slow even on high end GPUs, [vLLM](https://github.com/vllm-project/vllm) is a fast and easy-to-use LLM inference engine. It can achieve 10x-20x higher throughput than Huggingface transformers.
It supports [continuous batching](https://www.anyscale.com/blog/continuous-batching-llm-inference) for increased throughput and GPU utilization,
[paged attention](https://vllm.ai) to address the memory bottleneck where in the autoregressive decoding process all the attention key value tensors(KV Cache) are kept in the GPU memory to generate next tokens.
Expand Down

0 comments on commit daaa70a

Please sign in to comment.