Skip to content

Commit

Permalink
Correct small typos in vllm_mixtral.py
Browse files Browse the repository at this point in the history
  • Loading branch information
pbadeer authored Jan 25, 2024
1 parent b4b2ad4 commit 82873ad
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion 06_gpu_and_ml/vllm_mixtral.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
# walks through setting up an environment that works with `vLLM ` for basic inference.
#
# We are running the [Mixtral 8x7B Instruct](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) model here, which is a mixture-of-experts model finetuned for conversation.
# You can expect 3 minute second cold starts
# You can expect 3 minute cold starts.
# For a single request, the throughput is about 11 tokens/second, but there are upcoming `vLLM` optimizations to improve this.
# The larger the batch of prompts, the higher the throughput (up to about 300 tokens/second).
# For example, with the 60 prompts below, we can produce 30k tokens in 100 seconds.
Expand Down

0 comments on commit 82873ad

Please sign in to comment.