-
Notifications
You must be signed in to change notification settings - Fork 20
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add
examples/cloud-run
on preview (#82)
* Update `examples/README.md` and add `examples/cloud-run/README.md` * Add `examples/cloud-run/tgi-deployment` * Update `examples/cloud-run/README.md` * Update `README.md` * Update `README.md` Co-authored-by: Philipp Schmid <[email protected]> * Apply suggestions from code review - Increase max instances from 5 to 7 (including that it's subject to change) - Explain the default auth for Cloud Run, and mention that only developer use-cases are covered within this example - Add note with alternatives for auth handling on the services towards exposing those - Add references used at the end Co-authored-by: Frank He <[email protected]> * Update `examples/cloud-run/tgi-deployment/README.md` * Set `max-instances=3` to prevent downtime during infra migrations Co-authored-by: Steren <[email protected]> * Set `--concurrency` and `--max-concurrent-requests` to 64 Value determined after running `text-generation-benchmark` with different batch sizes with the default settings, on the same instance/node on Google Kubernetes Engine (GKE), as it allows SSH tunneling as `text-generation-benchmark` needs to run within the host instance --------- Co-authored-by: Philipp Schmid <[email protected]> Co-authored-by: Frank He <[email protected]> Co-authored-by: Steren <[email protected]>
- Loading branch information
1 parent
e6b57bf
commit 6e8682c
Showing
6 changed files
with
382 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
# (Preview) Cloud Run Examples | ||
|
||
This directory contains usage examples of the Hugging Face Deep Learning Containers (DLCs) in Cloud Run only for inference at the moment, with a focus on Large Language Models (LLMs). | ||
|
||
> [!WARNING] | ||
> Cloud Run now offers on-demand access to NVIDIA L4 GPUs for running AI inference workloads; but is still in preview, so the Cloud Run examples within this repository should be taken solely for testing and experimentation; please avoid using those for production workloads. We are actively working towards general availability and appreciate your understanding. | ||
## Inference Examples | ||
|
||
| Example | Description | | ||
| ---------------------------------- | ------------------------------------------------------------------------ | | ||
| [tgi-deployment](./tgi-deployment) | Deploying Meta Llama 3.1 8B with Text Generation Inference on Cloud Run. | |
Oops, something went wrong.