Skip to content

Commit

Permalink
Add examples/cloud-run on preview (#82)
Browse files Browse the repository at this point in the history
* Update `examples/README.md` and add `examples/cloud-run/README.md`

* Add `examples/cloud-run/tgi-deployment`

* Update `examples/cloud-run/README.md`

* Update `README.md`

* Update `README.md`

Co-authored-by: Philipp Schmid <[email protected]>

* Apply suggestions from code review

- Increase max instances from 5 to 7 (including that it's subject to
change)
- Explain the default auth for Cloud Run, and mention that only
developer use-cases are covered within this example
- Add note with alternatives for auth handling on the services towards
exposing those
- Add references used at the end

Co-authored-by: Frank He <[email protected]>

* Update `examples/cloud-run/tgi-deployment/README.md`

* Set `max-instances=3` to prevent downtime during infra migrations

Co-authored-by: Steren <[email protected]>

* Set `--concurrency` and `--max-concurrent-requests` to 64

Value determined after running `text-generation-benchmark` with
different batch sizes with the default settings, on the same
instance/node on Google Kubernetes Engine (GKE), as it allows SSH
tunneling as `text-generation-benchmark` needs to run within the host
instance

---------

Co-authored-by: Philipp Schmid <[email protected]>
Co-authored-by: Frank He <[email protected]>
Co-authored-by: Steren <[email protected]>
  • Loading branch information
4 people authored Sep 16, 2024
1 parent e6b57bf commit 6e8682c
Show file tree
Hide file tree
Showing 6 changed files with 382 additions and 2 deletions.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

<img alt="Hugging Face x Google Cloud" src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/Google-Cloud-Containers/thumbnail.png" />

[Hugging Face Deep Learning Containers for Google Cloud](https://cloud.google.com/deep-learning-containers/docs/choosing-container#hugging-face) are a set of Docker images for training and deploying Transformers, Sentence Transformers, and Diffusers models on Google Cloud Vertex AI and Google Kubernetes Engine (GKE).
[Hugging Face Deep Learning Containers for Google Cloud](https://cloud.google.com/deep-learning-containers/docs/choosing-container#hugging-face) are a set of Docker images for training and deploying Transformers, Sentence Transformers, and Diffusers models on Google Cloud Vertex AI, Google Kubernetes Engine (GKE), and Google Cloud Run.

The [Google-Cloud-Containers](https://github.com/huggingface/Google-Cloud-Containers/tree/main) repository contains the container files for building Hugging Face-specific Deep Learning Containers (DLCs), examples on how to train and deploy models on Google Cloud. The containers are publicly maintained, updated and released periodically by Hugging Face and the Google Cloud Team and available for all Google Cloud Customers within the [Google Cloud's Artifact Registry](https://cloud.google.com/deep-learning-containers/docs/choosing-container#hugging-face). For each supported combination of use-case (training, inference), accelerator type (CPU, GPU, TPU), and framework (PyTorch, TGI, TEI) containers are created. Those include:

Expand Down Expand Up @@ -63,3 +63,4 @@ The [`examples`](./examples) directory contains examples for using the container
| Vertex AI | [deploy-gemma-from-gcs-on-vertex-ai](./examples/vertex-ai/notebooks/deploy-gemma-from-gcs-on-vertex-ai) | Deploying Gemma 7B Instruct with Text Generation Inference (TGI) from a GCS Bucket on Vertex AI. |
| Vertex AI | [deploy-flux-on-vertex-ai](./examples/vertex-ai/notebooks/deploy-flux-on-vertex-ai) | Deploying FLUX with Hugging Face PyTorch DLCs for Inference on Vertex AI. |
| Vertex AI | [deploy-llama-3-1-405b-on-vertex-ai](./examples/vertex-ai/notebooks/deploy-llama-405b-on-vertex-ai/vertex-notebook.ipynb) | Deploying Meta Llama 3.1 405B in FP8 with Hugging Face DLC for TGI on Vertex AI. |
| Cloud Run | [tgi-deployment](./examples/cloud-run/tgi-deployment/README.md) | Deploying Meta Llama 3.1 8B with Text Generation Inference on Cloud Run. |
9 changes: 8 additions & 1 deletion examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,11 @@

This directory contains some usage examples for the Hugging Face Deep Learning Containers (DLCs) available in Google Cloud, as published from the [containers directory](../containers).

The examples' structure is organized based on the Google Cloud service we can use to deploy the containers, being Google Kubernetes Engine (GKE) on one end, and Vertex AI on the other.
The examples' structure is organized based on the Google Cloud service we can use to deploy the containers, being:

- [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine)
- [Vertex AI](https://cloud.google.com/vertex-ai)
- (Preview) [Cloud Run](https://cloud.google.com/run)

> [!WARNING]
> Cloud Run now offers on-demand access to NVIDIA L4 GPUs for running AI inference workloads; but is still in preview, so the Cloud Run examples within this repository should be taken solely for testing and experimentation; please avoid using those for production workloads. We are actively working towards general availability and appreciate your understanding.
12 changes: 12 additions & 0 deletions examples/cloud-run/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# (Preview) Cloud Run Examples

This directory contains usage examples of the Hugging Face Deep Learning Containers (DLCs) in Cloud Run only for inference at the moment, with a focus on Large Language Models (LLMs).

> [!WARNING]
> Cloud Run now offers on-demand access to NVIDIA L4 GPUs for running AI inference workloads; but is still in preview, so the Cloud Run examples within this repository should be taken solely for testing and experimentation; please avoid using those for production workloads. We are actively working towards general availability and appreciate your understanding.
## Inference Examples

| Example | Description |
| ---------------------------------- | ------------------------------------------------------------------------ |
| [tgi-deployment](./tgi-deployment) | Deploying Meta Llama 3.1 8B with Text Generation Inference on Cloud Run. |
Loading

0 comments on commit 6e8682c

Please sign in to comment.