Skip to content

Commit

Permalink
updates link to point at latest deployment (#707)
Browse files Browse the repository at this point in the history
  • Loading branch information
charlesfrye authored Apr 18, 2024
1 parent 8841ec8 commit 13e1844
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions 06_gpu_and_ml/llm-serving/text_generation_inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# - continuous batching, so multiple generations can take place at the same time on a single container
# - PagedAttention, which applies memory paging to the attention mechanism's key-value cache, increasing throughput
#
# This example deployment, [accessible here](https://modal-labs--llama3.modal.run), can serve LLaMA 3 70B with
# This example deployment, [accessible here](https://modal.chat), can serve LLaMA 3 70B with
# 70 second cold starts, up to 200 tokens/s of throughput, and a per-token latency of 55ms.

# ## Setup
Expand Down Expand Up @@ -205,7 +205,7 @@ def main(prompt: str = None):
# behind an ASGI app front-end. The front-end code (a single file of Alpine.js) is available
# [here](https://github.com/modal-labs/modal-examples/blob/main/06_gpu_and_ml/llm-frontend/index.html).
#
# You can try our deployment [here](https://modal-labs--llama3.modal.run).
# You can try our deployment [here](https://modal.chat).

frontend_path = Path(__file__).parent.parent / "llm-frontend"

Expand Down

0 comments on commit 13e1844

Please sign in to comment.