Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add examples/gke/tgi-multi-lora-deployment #102

Merged
merged 20 commits into from
Oct 10, 2024
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .github/workflows/doc-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,10 @@ on:
- main
- doc-builder*
paths:
- docs/source/**
- docs/**
- examples/**/*.md
- examples/**/*.ipynb
- Makefile
- .github/workflows/doc-build.yml

jobs:
Expand Down
7 changes: 6 additions & 1 deletion .github/workflows/doc-pr-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,10 @@ name: Build PR Documentation
on:
pull_request:
paths:
- docs/source/**
- docs/**
- examples/**/*.md
- examples/**/*.ipynb
- Makefile
- .github/workflows/doc-pr-build.yml

concurrency:
Expand All @@ -20,3 +23,5 @@ jobs:
package_name: google-cloud
additional_args: --not_python_module
pre_command: cd Google-Cloud-Containers && make docs
env:
GITHUB_BRANCH: ${{ github.head_ref || github.ref_name }}
35 changes: 17 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,24 +51,23 @@ The [`examples`](./examples) directory contains examples for using the container

### Inference Examples


| Service | Example | Description |
| --------- | ------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------- |
| GKE | [tgi-deployment](./examples/gke/tgi-deployment) | Deploying Llama3 8B with Text Generation Inference (TGI) on GKE. |
| GKE | [tgi-from-gcs-deployment](./examples/gke/tgi-from-gcs-deployment) | Deploying Qwen2 7B Instruct with Text Generation Inference (TGI) from a GCS Bucket on GKE. |
| GKE | [tei-deployment](./examples/gke/tei-deployment) | Deploying Snowflake's Arctic Embed (M) with Text Embeddings Inference (TEI) on GKE. |
| GKE | [tei-from-gcs-deployment](./examples/gke/tei-from-gcs-deployment) | Deploying BGE Base v1.5 (English) with Text Embeddings Inference (TEI) from a GCS Bucket on GKE. |
| Vertex AI | [deploy-bert-on-vertex-ai](./examples/vertex-ai/notebooks/deploy-bert-on-vertex-ai) | Deploying a BERT model for a text classification task using `huggingface-inference-toolkit` for a Custom Prediction Routine (CPR) on Vertex AI. |
| Vertex AI | [deploy-embedding-on-vertex-ai](./examples/vertex-ai/notebooks/deploy-embedding-on-vertex-ai) | Deploying an embedding model with Text Embeddings Inference (TEI) on Vertex AI. |
| Vertex AI | [deploy-gemma-on-vertex-ai](./examples/vertex-ai/notebooks/deploy-gemma-on-vertex-ai) | Deploying Gemma 7B Instruct with Text Generation Inference (TGI) on Vertex AI. |
| Vertex AI | [deploy-gemma-from-gcs-on-vertex-ai](./examples/vertex-ai/notebooks/deploy-gemma-from-gcs-on-vertex-ai) | Deploying Gemma 7B Instruct with Text Generation Inference (TGI) from a GCS Bucket on Vertex AI. |
| Vertex AI | [deploy-flux-on-vertex-ai](./examples/vertex-ai/notebooks/deploy-flux-on-vertex-ai) | Deploying FLUX with Hugging Face PyTorch DLCs for Inference on Vertex AI. |
| Vertex AI | [deploy-llama-3-1-405b-on-vertex-ai](./examples/vertex-ai/notebooks/deploy-llama-405b-on-vertex-ai/vertex-notebook.ipynb) | Deploying Meta Llama 3.1 405B in FP8 with Hugging Face DLC for TGI on Vertex AI. |
| Cloud Run | [tgi-deployment](./examples/cloud-run/tgi-deployment/README.md) | Deploying Meta Llama 3.1 8B with Text Generation Inference on Cloud Run. |

| Service | Example | Title |
| --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------- |
| GKE | [examples/gke/tgi-deployment](./examples/gke/tgi-deployment) | Deploy Meta Llama 3 8B with TGI DLC on GKE |
| GKE | [examples/gke/tgi-from-gcs-deployment](./examples/gke/tgi-from-gcs-deployment) | Deploy Qwen2 7B with TGI DLC from GCS on GKE |
| GKE | [examples/gke/tgi-multi-lora-deployment](./examples/gke/tgi-multi-lora-deployment) | Deploy Gemma2 with multiple LoRA adapters with TGI DLC on GKE |
| GKE | [examples/gke/tei-deployment](./examples/gke/tei-deployment) | Deploy Snowflake's Arctic Embed with TEI DLC on GKE |
| GKE | [examples/gke/tei-from-gcs-deployment](./examples/gke/tei-from-gcs-deployment) | Deploy BGE Base v1.5 with TEI DLC from GCS on GKE |
| Vertex AI | [examples/vertex-ai/notebooks/deploy-bert-on-vertex-ai](./examples/vertex-ai/notebooks/deploy-bert-on-vertex-ai) | Deploy BERT Models with PyTorch Inference DLC on Vertex AI |
| Vertex AI | [examples/vertex-ai/notebooks/deploy-embedding-on-vertex-ai](./examples/vertex-ai/notebooks/deploy-embedding-on-vertex-ai) | Deploy Embedding Models with TEI DLC on Vertex AI |
| Vertex AI | [examples/vertex-ai/notebooks/deploy-gemma-on-vertex-ai](./examples/vertex-ai/notebooks/deploy-gemma-on-vertex-ai) | Deploy Gemma 7B with TGI DLC on Vertex AI |
| Vertex AI | [examples/vertex-ai/notebooks/deploy-gemma-from-gcs-on-vertex-ai](./examples/vertex-ai/notebooks/deploy-gemma-from-gcs-on-vertex-ai) | Deploy Gemma 7B with TGI DLC from GCS on Vertex AI |
| Vertex AI | [examples/vertex-ai/notebooks/deploy-flux-on-vertex-ai](./examples/vertex-ai/notebooks/deploy-flux-on-vertex-ai) | Deploy FLUX with PyTorch Inference DLC on Vertex AI |
| Vertex AI | [examples/vertex-ai/notebooks/deploy-llama-3-1-405b-on-vertex-ai](./examples/vertex-ai/notebooks/deploy-llama-405b-on-vertex-ai/vertex-notebook.ipynb) | Deploy Meta Llama 3.1 405B with TGI DLC on Vertex AI |
| Cloud Run | [examples/cloud-run/tgi-deployment](./examples/cloud-run/tgi-deployment/README.md) | Deploy Meta Llama 3.1 with TGI DLC on Cloud Run |

### Evaluation

| Service | Example | Description |
| --------- | ------------------------------------------------------------------------------------------- | ----------------------------------------------- |
| Vertex AI | [evaluate-llms-with-vertex-ai](./examples/vertex-ai/notebooks/evaluate-llms-with-vertex-ai) | Evaluating open LLMs with Vertex AI and Gemini. |
| Service | Example | Title |
| --------- | ------------------------------------------------------------------------------------------------------------------------ | -------------------------------------------- |
| Vertex AI | [examples/vertex-ai/notebooks/evaluate-llms-with-vertex-ai](./examples/vertex-ai/notebooks/evaluate-llms-with-vertex-ai) | Evaluate open LLMs with Vertex AI and Gemini |
24 changes: 12 additions & 12 deletions docs/scripts/auto-generate-examples.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
import os
import re

GITHUB_BRANCH = os.getenv("GITHUB_BRANCH", "main")


def process_readme_files():
print("Processing README.md files from examples/gke and examples/cloud-run...")
Expand Down Expand Up @@ -35,37 +37,32 @@ def process_file(root, file, dir):
# Replace image and link paths
content = re.sub(
r"\(\./(imgs|assets)/([^)]*\.png)\)",
r"(https://raw.githubusercontent.com/huggingface/Google-Cloud-Containers/main/"
rf"(https://raw.githubusercontent.com/huggingface/Google-Cloud-Containers/{GITHUB_BRANCH}/"
+ root
+ r"/\1/\2)",
content,
)
content = re.sub(
r"\(\.\./([^)]+)\)",
r"(https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/"
rf"(https://github.com/huggingface/Google-Cloud-Containers/tree/{GITHUB_BRANCH}/examples/"
+ dir
+ r"/\1)",
content,
)
content = re.sub(
r"\(\.\/([^)]+)\)",
r"(https://github.com/huggingface/Google-Cloud-Containers/tree/main/"
rf"(https://github.com/huggingface/Google-Cloud-Containers/tree/{GITHUB_BRANCH}/"
+ root
+ r"/\1)",
content,
)

# Regular expression to match the specified blocks
pattern = r"> \[!(NOTE|WARNING)\]\n((?:> .*\n)+)"

def replacement(match):
block_type = match.group(1)
content = match.group(2)

# Remove '> ' from the beginning of each line and strip whitespace
lines = [
line.lstrip("> ").strip() for line in content.split("\n") if line.strip()
]
# Remove '> ' from the beginning of each line
lines = [line[2:] for line in content.split("\n") if line.strip()]

# Determine the Tip type
tip_type = " warning" if block_type == "WARNING" else ""
Expand All @@ -77,11 +74,14 @@ def replacement(match):

return new_block

# Regular expression to match the specified blocks
pattern = r"> \[!(NOTE|WARNING)\]\n((?:>.*(?:\n|$))+)"

# Perform the transformation
content = re.sub(pattern, replacement, content, flags=re.MULTILINE)

# Remove blockquotes
content = re.sub(r"^(>[ ]*)+", "", content, flags=re.MULTILINE)
# Remove any remaining '>' or '> ' at the beginning of lines
content = re.sub(r"^>[ ]?", "", content, flags=re.MULTILINE)

# Check for remaining relative paths
if re.search(r"\(\.\./|\(\./", content):
Expand Down
8 changes: 4 additions & 4 deletions docs/source/resources.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,8 @@ Learn how to use Hugging Face in Google Cloud by reading our blog posts, Google
- Inference

- [Deploy Meta Llama 3 8B with TGI DLC on GKE](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/gke/tgi-deployment)
- [Deploying Llama3 8B with Text Generation Inference (TGI) on GKE](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/gke/tgi-from-gcs-deployment)
- [Deploy Llama3 8B with TGI DLC on GKE](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/gke/tgi-from-gcs-deployment)
- [Deploy Gemma2 with multiple LoRA adapters with TGI DLC on GKE](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/gke/tgi-multi-lora-deployment)
- [Deploy Snowflake's Arctic Embed with TEI DLC on GKE](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/gke/tei-deployment)
- [Deploy BGE Base v1.5 with TEI DLC from GCS on GKE](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/gke/tei-from-gcs-deployment)

Expand All @@ -50,14 +51,13 @@ Learn how to use Hugging Face in Google Cloud by reading our blog posts, Google
- [Deploy FLUX with PyTorch Inference DLC on Vertex AI](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/vertex-ai/notebooks/deploy-flux-on-vertex-ai/vertex-notebook.ipynb)
- [Deploy Meta Llama 3.1 405B with TGI DLC on Vertex AI](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/vertex-ai/notebooks/deploy-llama-3-1-405b-on-vertex-ai/vertex-notebook.ipynb)


- Evaluation

- [Evaluating open LLMs with Vertex AI and Gemini](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/vertex-ai/notebooks/evaluate-llms-with-vertex-ai)
- [Evaluate open LLMs with Vertex AI and Gemini](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/vertex-ai/notebooks/evaluate-llms-with-vertex-ai)


### (Preview) Cloud Run

- Inference

- [Deploy Meta Llama 3.1 with TGI DLC on Cloud Run](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/cloud-run/tgi-deployment)
- [Deploy Meta Llama 3.1 with TGI DLC on Cloud Run](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/cloud-run/tgi-deployment)
13 changes: 7 additions & 6 deletions examples/gke/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,10 @@ This directory contains usage examples of the Hugging Face Deep Learning Contain

## Inference Examples

| Example | Title |
| ---------------------------------------------------- | --------------------------------------------------- |
| [tgi-deployment](./tgi-deployment) | Deploy Meta Llama 3 8B with TGI DLC on GKE |
| [tgi-from-gcs-deployment](./tgi-from-gcs-deployment) | Deploy Qwen2 7B with TGI DLC from GCS on GKE |
| [tei-deployment](./tei-deployment) | Deploy Snowflake's Arctic Embed with TEI DLC on GKE |
| [tei-from-gcs-deployment](./tei-from-gcs-deployment) | Deploy BGE Base v1.5 with TEI DLC from GCS on GKE |
| Example | Title |
| -------------------------------------------------------- | ------------------------------------------------------------- |
| [tgi-deployment](./tgi-deployment) | Deploy Meta Llama 3 8B with TGI DLC on GKE |
| [tgi-from-gcs-deployment](./tgi-from-gcs-deployment) | Deploy Qwen2 7B with TGI DLC from GCS on GKE |
| [tgi-multi-lora-deployment](./tgi-multi-lora-deployment) | Deploy Gemma2 with multiple LoRA adapters with TGI DLC on GKE |
| [tei-deployment](./tei-deployment) | Deploy Snowflake's Arctic Embed with TEI DLC on GKE |
| [tei-from-gcs-deployment](./tei-from-gcs-deployment) | Deploy BGE Base v1.5 with TEI DLC from GCS on GKE |
2 changes: 1 addition & 1 deletion examples/gke/tgi-deployment/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -154,7 +154,7 @@ kubectl apply -f config/
> Alternatively, you can just wait for the deployment to be ready with the following command:
>
> ```bash
> kubectl wait --for=condition=Available --timeout=700s deployment/tei-deployment
> kubectl wait --for=condition=Available --timeout=700s deployment/tgi-deployment
> ```

## Inference with TGI
Expand Down
Loading