From 241791cb742f1e6ac2d7d3ceae06962d5f20fdd5 Mon Sep 17 00:00:00 2001 From: Alvaro Bartolome <36760800+alvarobartt@users.noreply.github.com> Date: Sat, 21 Sep 2024 11:32:56 +0200 Subject: [PATCH] Add `examples` within the `docs/` (#88) * Add `examples` as embeded notebooks in the `docs/` * Remove `notebooks_folder` and `convert_notebooks` args * Update `docs/source/_toctree.yml` to include examples (WIP) * Add sample paths to examples (WIP) * Update `thumbnail.png` https://huggingface.co/datasets/huggingface/documentation-images/discussions/365 * Split partially `index.mdx` in `features.mdx` and `resources.mdx` * Add `docs/source/containers/*.mdx` (WIP) * Update `docs/source/containers/available.mdx` * Fix path to `scripts/upload_model_to_gcs.sh` * Clean `docs/source` * Add `Makefile` to auto-generate docs from examples * Add `pre_command: make docs` * (debug) Add `ls -la` before `make docs` * Fix `pre_command` to `cd` into `Google-Cloud-Containers` first * Include `examples/cloud-run` directory in `make docs` * Remove extra empty `>` lines and add `make serve` * Update `Makefile` and add `docs/sed/huggingface-tip.sed` * Add `docs/scripts/auto-generate-examples.py` * Update `Makefile` and `docs/scripts/auto-generate-examples.py` * Update "Examples" section ordering Co-authored-by: Jeff Boudier * Remove emojis within `docs/source/_toctree.yml` Co-authored-by: Philipp Schmid * Add `metadata` to every example under `examples` * Update `docs/scripts/auto-generate-examples.py` Remove Jupyter Markdown comment to hide the metadata after its converted from `.ipynb` to `.mdx` * Add `docs/scripts/auto-update-toctree.py` * Add `docs/source/examples` to `.gitignore` As those are automatically generated and not intended to be pushed * Update comment parsing for Jupyter Notebooks * Clean metadata from `.mdx` files (and remove if none) * Set `isExpanded: true` for top level examples * Update `docs/source/containers/available.mdx` * Fix typo in `youself`->`yourself` * Split example introduction from TL;DR * Apply suggestions from code review Co-authored-by: pagezyhf <165770107+pagezyhf@users.noreply.github.com> * Update `containers/tgi/README.md` - Add missing `--shm-size 1g` - Fixed some wording / typos * Update and align example titles * Fix `title` for `/resources` --------- Co-authored-by: Jeff Boudier Co-authored-by: Philipp Schmid Co-authored-by: pagezyhf <165770107+pagezyhf@users.noreply.github.com> --- .github/workflows/doc-build.yml | 3 +- .github/workflows/doc-pr-build.yml | 1 + .gitignore | 3 + Makefile | 33 ++++++ README.md | 38 +++---- containers/tgi/README.md | 8 +- docs/scripts/auto-generate-examples.py | 103 ++++++++++++++++++ docs/scripts/auto-update-toctree.py | 97 +++++++++++++++++ docs/source/_toctree.yml | 14 ++- docs/source/containers/available.mdx | 35 ++++++ docs/source/containers/introduction.mdx | 5 + docs/source/features.mdx | 35 ++++++ docs/source/index.mdx | 87 --------------- docs/source/resources.mdx | 57 ++++++++++ examples/cloud-run/README.md | 6 +- examples/cloud-run/tgi-deployment/README.md | 11 +- examples/gke/README.md | 20 ++-- examples/gke/tei-deployment/README.md | 7 +- .../gke/tei-from-gcs-deployment/README.md | 9 +- examples/gke/tgi-deployment/README.md | 11 +- .../gke/tgi-from-gcs-deployment/README.md | 11 +- examples/gke/trl-full-fine-tuning/README.md | 11 +- examples/gke/trl-lora-fine-tuning/README.md | 11 +- examples/vertex-ai/README.md | 26 ++--- .../vertex-notebook.ipynb | 16 ++- .../vertex-notebook.ipynb | 16 ++- .../vertex-notebook.ipynb | 16 ++- .../vertex-notebook.ipynb | 16 ++- .../vertex-notebook.ipynb | 16 ++- .../vertex-notebook.ipynb | 15 ++- .../vertex-notebook.ipynb | 16 ++- .../vertex-notebook.ipynb | 16 ++- 32 files changed, 601 insertions(+), 168 deletions(-) create mode 100644 Makefile create mode 100644 docs/scripts/auto-generate-examples.py create mode 100644 docs/scripts/auto-update-toctree.py create mode 100644 docs/source/containers/available.mdx create mode 100644 docs/source/containers/introduction.mdx create mode 100644 docs/source/features.mdx create mode 100644 docs/source/resources.mdx diff --git a/.github/workflows/doc-build.yml b/.github/workflows/doc-build.yml index 29c40724..f864e5c5 100644 --- a/.github/workflows/doc-build.yml +++ b/.github/workflows/doc-build.yml @@ -10,13 +10,14 @@ on: - .github/workflows/doc-build.yml jobs: - build: + build: uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@main with: commit_sha: ${{ github.sha }} package: Google-Cloud-Containers package_name: google-cloud additional_args: --not_python_module + pre_command: cd Google-Cloud-Containers && make docs secrets: token: ${{ secrets.HUGGINGFACE_PUSH }} hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }} diff --git a/.github/workflows/doc-pr-build.yml b/.github/workflows/doc-pr-build.yml index 22034215..98500342 100644 --- a/.github/workflows/doc-pr-build.yml +++ b/.github/workflows/doc-pr-build.yml @@ -19,3 +19,4 @@ jobs: package: Google-Cloud-Containers package_name: google-cloud additional_args: --not_python_module + pre_command: cd Google-Cloud-Containers && make docs diff --git a/.gitignore b/.gitignore index 81326505..65736d5b 100644 --- a/.gitignore +++ b/.gitignore @@ -161,3 +161,6 @@ cython_debug/ # .DS_Store files .DS_Store + +# Auto-generated docs +docs/source/examples/ diff --git a/Makefile b/Makefile new file mode 100644 index 00000000..acef43a2 --- /dev/null +++ b/Makefile @@ -0,0 +1,33 @@ +.PHONY: docs clean help + +docs: clean + @echo "Processing README.md files from examples/gke, examples/cloud-run, and examples/vertex-ai..." + @mkdir -p docs/source/examples + @echo "Converting Jupyter Notebooks to MDX..." + @doc-builder notebook-to-mdx examples/vertex-ai/notebooks/ + @echo "Auto-generating example files for documentation..." + @python docs/scripts/auto-generate-examples.py + @echo "Cleaning up generated Markdown Notebook files..." + @find examples/vertex-ai/notebooks -name "vertex-notebook.md" -type f -delete + @echo "Generating YAML tree structure and appending to _toctree.yml..." + @python docs/scripts/auto-update-toctree.py + @echo "YAML tree structure appended to docs/source/_toctree.yml" + @echo "Documentation setup complete." + +clean: + @echo "Cleaning up generated documentation..." + @rm -rf docs/source/examples + @awk '/^# GENERATED CONTENT DO NOT EDIT!/,/^# END GENERATED CONTENT/{next} {print}' docs/source/_toctree.yml > docs/source/_toctree.yml.tmp && mv docs/source/_toctree.yml.tmp docs/source/_toctree.yml + @echo "Cleaning up generated Markdown Notebook files (if any)..." + @find examples/vertex-ai/notebooks -name "vertex-notebook.md" -type f -delete + @echo "Cleanup complete." + +serve: + @echo "Serving documentation via doc-builder" + doc-builder preview gcloud docs/source --not_python_module + +help: + @echo "Usage:" + @echo " make docs - Auto-generate the examples for the docs" + @echo " make clean - Remove the auto-generated docs" + @echo " make help - Display this help message" diff --git a/README.md b/README.md index 08079d8d..6f7ed135 100644 --- a/README.md +++ b/README.md @@ -42,25 +42,25 @@ The [`examples`](./examples) directory contains examples for using the container ### Training Examples -| Service | Example | Description | -| --------- | ------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------- | -| GKE | [trl-full-fine-tuning](./examples/gke/trl-full-fine-tuning) | Full SFT fine-tuning of Gemma 2B in a multi-GPU instance with TRL on GKE. | -| GKE | [trl-lora-fine-tuning](./examples/gke/trl-lora-fine-tuning) | LoRA SFT fine-tuning of Mistral 7B v0.3 in a single GPU instance with TRL on GKE. | -| Vertex AI | [trl-full-sft-fine-tuning-on-vertex-ai](./examples/vertex-ai/notebooks/trl-full-sft-fine-tuning-on-vertex-ai) | Full SFT fine-tuning of Mistral 7B v0.3 in a multi-GPU instance with TRL on Vertex AI. | -| Vertex AI | [trl-lora-sft-fine-tuning-on-vertex-ai](./examples/vertex-ai/notebooks/trl-lora-sft-fine-tuning-on-vertex-ai) | LoRA SFT fine-tuning of Mistral 7B v0.3 in a single GPU instance with TRL on Vertex AI. | +| Service | Example | Title | +| --------- | ------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------- | +| GKE | [examples/gke/trl-full-fine-tuning](./examples/gke/trl-full-fine-tuning) | Fine-tune Gemma 2B with PyTorch Training DLC using SFT on GKE | +| GKE | [examples/gke/trl-lora-fine-tuning](./examples/gke/trl-lora-fine-tuning) | Fine-tune Mistral 7B v0.3 with PyTorch Training DLC using SFT + LoRA on GKE | +| Vertex AI | [examples/vertex-ai/notebooks/trl-full-sft-fine-tuning-on-vertex-ai](./examples/vertex-ai/notebooks/trl-full-sft-fine-tuning-on-vertex-ai) | Fine-tune Mistral 7B v0.3 with PyTorch Training DLC using SFT on Vertex AI | +| Vertex AI | [examples/vertex-ai/notebooks/trl-lora-sft-fine-tuning-on-vertex-ai](./examples/vertex-ai/notebooks/trl-lora-sft-fine-tuning-on-vertex-ai) | Fine-tune Gemma 2B with PyTorch Training DLC using SFT + LoRA on Vertex AI | ### Inference Examples -| Service | Example | Description | -| --------- | ------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------- | -| GKE | [tgi-deployment](./examples/gke/tgi-deployment) | Deploying Llama3 8B with Text Generation Inference (TGI) on GKE. | -| GKE | [tgi-from-gcs-deployment](./examples/gke/tgi-from-gcs-deployment) | Deploying Qwen2 7B Instruct with Text Generation Inference (TGI) from a GCS Bucket on GKE. | -| GKE | [tei-deployment](./examples/gke/tei-deployment) | Deploying Snowflake's Arctic Embed (M) with Text Embeddings Inference (TEI) on GKE. | -| GKE | [tei-from-gcs-deployment](./examples/gke/tei-from-gcs-deployment) | Deploying BGE Base v1.5 (English) with Text Embeddings Inference (TEI) from a GCS Bucket on GKE. | -| Vertex AI | [deploy-bert-on-vertex-ai](./examples/vertex-ai/notebooks/deploy-bert-on-vertex-ai) | Deploying a BERT model for a text classification task using `huggingface-inference-toolkit` for a Custom Prediction Routine (CPR) on Vertex AI. | -| Vertex AI | [deploy-embedding-on-vertex-ai](./examples/vertex-ai/notebooks/deploy-embedding-on-vertex-ai) | Deploying an embedding model with Text Embeddings Inference (TEI) on Vertex AI. | -| Vertex AI | [deploy-gemma-on-vertex-ai](./examples/vertex-ai/notebooks/deploy-gemma-on-vertex-ai) | Deploying Gemma 7B Instruct with Text Generation Inference (TGI) on Vertex AI. | -| Vertex AI | [deploy-gemma-from-gcs-on-vertex-ai](./examples/vertex-ai/notebooks/deploy-gemma-from-gcs-on-vertex-ai) | Deploying Gemma 7B Instruct with Text Generation Inference (TGI) from a GCS Bucket on Vertex AI. | -| Vertex AI | [deploy-flux-on-vertex-ai](./examples/vertex-ai/notebooks/deploy-flux-on-vertex-ai) | Deploying FLUX with Hugging Face PyTorch DLCs for Inference on Vertex AI. | -| Vertex AI | [deploy-llama-3-1-405b-on-vertex-ai](./examples/vertex-ai/notebooks/deploy-llama-405b-on-vertex-ai/vertex-notebook.ipynb) | Deploying Meta Llama 3.1 405B in FP8 with Hugging Face DLC for TGI on Vertex AI. | -| Cloud Run | [tgi-deployment](./examples/cloud-run/tgi-deployment/README.md) | Deploying Meta Llama 3.1 8B with Text Generation Inference on Cloud Run. | +| Service | Example | Title | +| --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------- | +| GKE | [examples/gke/tgi-deployment](./examples/gke/tgi-deployment) | Deploy Meta Llama 3 8B with TGI DLC on GKE | +| GKE | [examples/gke/tgi-from-gcs-deployment](./examples/gke/tgi-from-gcs-deployment) | Deploy Qwen2 7B with TGI DLC from GCS on GKE | +| GKE | [examples/gke/tei-deployment](./examples/gke/tei-deployment) | Deploy Snowflake's Arctic Embed with TEI DLC on GKE | +| GKE | [examples/gke/tei-from-gcs-deployment](./examples/gke/tei-from-gcs-deployment) | Deploy BGE Base v1.5 with TEI DLC from GCS on GKE | +| Vertex AI | [examples/vertex-ai/notebooks/deploy-bert-on-vertex-ai](./examples/vertex-ai/notebooks/deploy-bert-on-vertex-ai) | Deploy BERT Models with PyTorch Inference DLC on Vertex AI | +| Vertex AI | [examples/vertex-ai/notebooks/deploy-embedding-on-vertex-ai](./examples/vertex-ai/notebooks/deploy-embedding-on-vertex-ai) | Deploy Embedding Models with TEI DLC on Vertex AI | +| Vertex AI | [examples/vertex-ai/notebooks/deploy-gemma-on-vertex-ai](./examples/vertex-ai/notebooks/deploy-gemma-on-vertex-ai) | Deploy Gemma 7B with TGI DLC on Vertex AI | +| Vertex AI | [examples/vertex-ai/notebooks/deploy-gemma-from-gcs-on-vertex-ai](./examples/vertex-ai/notebooks/deploy-gemma-from-gcs-on-vertex-ai) | Deploy Gemma 7B with TGI DLC from GCS on Vertex AI | +| Vertex AI | [examples/vertex-ai/notebooks/deploy-flux-on-vertex-ai](./examples/vertex-ai/notebooks/deploy-flux-on-vertex-ai) | Deploy FLUX with PyTorch Inference DLC on Vertex AI | +| Vertex AI | [examples/vertex-ai/notebooks/deploy-llama-3-1-405b-on-vertex-ai](./examples/vertex-ai/notebooks/deploy-llama-405b-on-vertex-ai/vertex-notebook.ipynb) | Deploy Meta Llama 3.1 405B with TGI DLC on Vertex AI | +| Cloud Run | [examples/cloud-run/tgi-deployment](./examples/cloud-run/tgi-deployment/README.md) | Deploy Meta Llama 3.1 with TGI DLC on Cloud Run | diff --git a/containers/tgi/README.md b/containers/tgi/README.md index b23da17c..fc69b42e 100644 --- a/containers/tgi/README.md +++ b/containers/tgi/README.md @@ -16,7 +16,7 @@ Below you will find the instructions on how to run and test the TGI containers a To run the Docker container in GPUs you need to ensure that your hardware is supported (NVIDIA drivers on your device need to be compatible with CUDA version 12.2 or higher) and also install the NVIDIA Container Toolkit. -To find the supported models and hardware before running the TGI DLC, feel free to check [TGI's documentation](https://huggingface.co/docs/text-generation-inference/supported_models). +To find the supported models and hardware before running the TGI DLC, feel free to check [TGI Documentation](https://huggingface.co/docs/text-generation-inference/supported_models). ### Run @@ -51,7 +51,7 @@ Which returns the following output containing the optimal configuration for depl Then you are ready to run the container as follows: ```bash -docker run --gpus all -ti -p 8080:8080 \ +docker run --gpus all -ti --shm-size 1g -p 8080:8080 \ -e MODEL_ID=google/gemma-7b-it \ -e NUM_SHARD=4 \ -e HF_TOKEN=$(cat ~/.cache/huggingface/token) \ @@ -85,7 +85,7 @@ curl 0.0.0.0:8080/v1/chat/completions \ Which will start streaming the completion tokens for the given messages until the stop sequences are generated. -Alternatively, you can also use the `/generate` endpoint instead, which already expects the inputs to be formatted according to the tokenizer's requirements, which is more convenient when working with base models without a pre-defined chat template or whenever you want to use a custom chat template instead, and can be used as follows: +Alternatively, you can also use the `/generate` endpoint instead, which already expects the inputs to be formatted according to the tokenizer requirements, which is more convenient when working with base models without a pre-defined chat template or whenever you want to use a custom chat template instead, and can be used as follows: ```bash curl 0.0.0.0:8080/generate \ @@ -108,7 +108,7 @@ curl 0.0.0.0:8080/generate \ > [!WARNING] > Building the containers is not recommended since those are already built by Hugging Face and Google Cloud teams and provided openly, so the recommended approach is to use the pre-built containers available in [Google Cloud's Artifact Registry](https://console.cloud.google.com/artifacts/docker/deeplearning-platform-release/us/gcr.io) instead. -In order to build TGI's Docker container, you will need an instance with at least 4 NVIDIA GPUs available with at least 24 GiB of VRAM each, since TGI needs to build and compile the kernels required for the optimized inference. Also note that the build process may take ~30 minutes to complete, depending on the instance's specifications. +In order to build TGI Docker container, you will need an instance with at least 4 NVIDIA GPUs available with at least 24 GiB of VRAM each, since TGI needs to build and compile the kernels required for the optimized inference. Also note that the build process may take ~30 minutes to complete, depending on the instance's specifications. ```bash docker build -t us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-text-generation-inference-cu121.2-2.ubuntu2204.py310 -f containers/tgi/gpu/2.2.0/Dockerfile . diff --git a/docs/scripts/auto-generate-examples.py b/docs/scripts/auto-generate-examples.py new file mode 100644 index 00000000..8490b294 --- /dev/null +++ b/docs/scripts/auto-generate-examples.py @@ -0,0 +1,103 @@ +import os +import re + + +def process_readme_files(): + print("Processing README.md files from examples/gke and examples/cloud-run...") + os.makedirs("docs/source/examples", exist_ok=True) + + for dir in ["gke", "cloud-run", "vertex-ai/notebooks"]: + for root, _, files in os.walk(f"examples/{dir}"): + for file in files: + if file == "README.md" or file == "vertex-notebook.md": + process_file(root, file, dir) + + +def process_file(root, file, dir): + dir_name = dir if not dir.__contains__("/") else dir.replace("/", "-") + + file_path = os.path.join(root, file) + subdir = root.replace(f"examples/{dir}/", "") + base = os.path.basename(subdir) + + if file_path == f"examples/{dir}/README.md": + target = f"docs/source/examples/{dir_name}-index.mdx" + else: + target = f"docs/source/examples/{dir_name}-{base}.mdx" + + print(f"Processing {file_path} to {target}") + with open(file_path, "r") as f: + content = f.read() + + # For Juypter Notebooks, remove the comment i.e. `` but keep the metadata + content = re.sub(r"", r"\1", content, flags=re.DOTALL) + + # Replace image and link paths + content = re.sub( + r"\(\./(imgs|assets)/([^)]*\.png)\)", + r"(https://raw.githubusercontent.com/huggingface/Google-Cloud-Containers/main/" + + root + + r"/\1/\2)", + content, + ) + content = re.sub( + r"\(\.\./([^)]+)\)", + r"(https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/" + + dir + + r"/\1)", + content, + ) + content = re.sub( + r"\(\.\/([^)]+)\)", + r"(https://github.com/huggingface/Google-Cloud-Containers/tree/main/" + + root + + r"/\1)", + content, + ) + + # Regular expression to match the specified blocks + pattern = r"> \[!(NOTE|WARNING)\]\n((?:> .*\n)+)" + + def replacement(match): + block_type = match.group(1) + content = match.group(2) + + # Remove '> ' from the beginning of each line and strip whitespace + lines = [ + line.lstrip("> ").strip() for line in content.split("\n") if line.strip() + ] + + # Determine the Tip type + tip_type = " warning" if block_type == "WARNING" else "" + + # Construct the new block + new_block = f"\n\n" + new_block += "\n".join(lines) + new_block += "\n\n\n" + + return new_block + + # Perform the transformation + content = re.sub(pattern, replacement, content, flags=re.MULTILINE) + + # Remove blockquotes + content = re.sub(r"^(>[ ]*)+", "", content, flags=re.MULTILINE) + + # Check for remaining relative paths + if re.search(r"\(\.\./|\(\./", content): + print("WARNING: Relative paths still exist in the processed file.") + print( + "The following lines contain relative paths, consider replacing those with GitHub URLs instead:" + ) + for i, line in enumerate(content.split("\n"), 1): + if re.search(r"\(\.\./|\(\./", line): + print(f"{i}: {line}") + else: + print("No relative paths found in the processed file.") + + with open(target, "w") as f: + f.write(content) + + +if __name__ == "__main__": + process_readme_files() diff --git a/docs/scripts/auto-update-toctree.py b/docs/scripts/auto-update-toctree.py new file mode 100644 index 00000000..71d50bb0 --- /dev/null +++ b/docs/scripts/auto-update-toctree.py @@ -0,0 +1,97 @@ +import glob +import os +import re + + +from pathlib import Path + + +def update_toctree_yaml(): + output_file = "docs/source/_toctree.yml" + dirs = ["vertex-ai", "gke", "cloud-run"] + + with open(output_file, "a") as f: + f.write("# GENERATED CONTENT DO NOT EDIT!\n") + f.write("- sections:\n") + + for dir in dirs: + f.write(" - sections:\n") + + # Find and sort files + files = sorted(glob.glob(f"docs/source/examples/{dir}-*.mdx")) + files = [file for file in files if not file.endswith(f"{dir}-index.mdx")] + + # Dictionary to store files by type + files_by_type = {} + + for file in files: + with open(file, "r+") as mdx_file: + content = mdx_file.read() + metadata_match = re.search(r"---(.*?)---", content, re.DOTALL) + + metadata = {} + if metadata_match: + metadata_str = metadata_match.group(1) + metadata = dict(re.findall(r"(\w+):\s*(.+)", metadata_str)) + + # Remove metadata from content assuming it's the block on top + # surrounded by `---` including those too + content = re.sub( + r"^---\s*\n.*?\n---\s*\n", + "", + content, + flags=re.DOTALL | re.MULTILINE, + ) + content = content.strip() + + mdx_file.seek(0) + mdx_file.write(content) + mdx_file.truncate() + + if not all(key in metadata for key in ["title", "type"]): + print(f"WARNING: Metadata missing in {file}") + print("Ensure that the file contains the following metadata:") + print("title: ") + print("type: <type>") + + # Remove the file from `docs/source/examples` if doesn't contain metadata + print( + "Removing the file as it won't be included in the _toctree.yml" + ) + os.remove(file) + + continue + + file_type = metadata["type"] + if file_type not in files_by_type: + files_by_type[file_type] = [] + files_by_type[file_type].append((file, metadata)) + + for file_type, file_list in files_by_type.items(): + f.write(" - sections:\n") + for file, metadata in file_list: + base = Path(file).stem + title = metadata["title"] + f.write(f" - local: examples/{base}\n") + f.write(f' title: "{title}"\n') + f.write(" isExpanded: false\n") + f.write(f" title: {file_type.capitalize()}\n") + + f.write(" isExpanded: true\n") + + if dir == "cloud-run": + f.write(f" local: examples/{dir}-index\n") + f.write(" title: Cloud Run\n") + elif dir == "vertex-ai": + f.write(" title: Vertex AI\n") + else: + f.write(f" local: examples/{dir}-index\n") + f.write(f" title: {dir.upper()}\n") + + f.write(" # local: examples/index\n") + f.write(" title: Examples\n") + f.write("# END GENERATED CONTENT\n") + + +if __name__ == "__main__": + update_toctree_yaml() diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml index 64f4ee9b..c5371cfd 100644 --- a/docs/source/_toctree.yml +++ b/docs/source/_toctree.yml @@ -1,4 +1,14 @@ - sections: - - local: index - title: Hugging Face on Google Cloud + - local: index + title: Hugging Face on Google Cloud + - local: features + title: Features & benefits + - local: resources + title: Other Resources title: Getting Started +- sections: + - local: containers/introduction + title: Introduction + - local: containers/available + title: Available DLCs on Google Cloud + title: Deep Learning Containers (DLCs) diff --git a/docs/source/containers/available.mdx b/docs/source/containers/available.mdx new file mode 100644 index 00000000..5a79f5a9 --- /dev/null +++ b/docs/source/containers/available.mdx @@ -0,0 +1,35 @@ +# DLCs on Google Cloud + +Below you can find a listing of all the Deep Learning Containers (DLCs) available on Google Cloud. + +<Tip> + +The listing below only contains the latest version of each one of the Hugging Face DLCs, the full listing of the available published containers in Google Cloud can be found either in the [Google Cloud Deep Learning Containers Documentation](https://cloud.google.com/deep-learning-containers/docs/choosing-container#hugging-face), in the [Google Cloud Artifact Registry](https://console.cloud.google.com/artifacts/docker/deeplearning-platform-release/us/gcr.io) or via the `gcloud container images list --repository="us-docker.pkg.dev/deeplearning-platform-release/gcr.io" | grep "huggingface-"` command. + +</Tip> + +## Text Generation Inference (TGI) + +| Container URI | Path | Accelerator | +| --------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- | ----------- | +| us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-text-generation-inference-cu121.2-2.ubuntu2204.py310 | [text-generation-inference-gpu.2.2.0](./containers/tgi/gpu/2.2.0/Dockerfile) | GPU | + +## Text Embeddings Inference (TEI) + +| Container URI | Path | Accelerator | +| --------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- | ----------- | +| us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-text-embeddings-inference-cu122.1-4.ubuntu2204 | [text-embeddings-inference-gpu.1.4.0](./containers/tei/gpu/1.4.0/Dockerfile) | GPU | +| us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-text-embeddings-inference-cpu.1-4 | [text-embeddings-inference-cpu.1.4.0](./containers/tei/cpu/1.4.0/Dockerfile) | CPU | + +## PyTorch Inference + +| Container URI | Path | Accelerator | +| --------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- | ----------- | +| us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-pytorch-inference-cu121.2-2.transformers.4-44.ubuntu2204.py311 | [huggingface-pytorch-inference-gpu.2.2.2.transformers.4.44.0.py311](./containers/pytorch/inference/gpu/2.2.2/transformers/4.44.0/py311/Dockerfile) | GPU | +| us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-pytorch-inference-cpu.2-2.transformers.4-44.ubuntu2204.py311 | [huggingface-pytorch-inference-cpu.2.2.2.transformers.4.44.0.py311](./containers/pytorch/inference/cpu/2.2.2/transformers/4.44.0/py311/Dockerfile) | CPU | + +## PyTorch Training + +| Container URI | Path | Accelerator | +| --------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- | ----------- | +| us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-pytorch-training-cu121.2-3.transformers.4-42.ubuntu2204.py310 | [huggingface-pytorch-training-gpu.2.3.0.transformers.4.42.3.py310](./containers/pytorch/training/gpu/2.3.0/transformers/4.42.3/py310/Dockerfile) | GPU | diff --git a/docs/source/containers/introduction.mdx b/docs/source/containers/introduction.mdx new file mode 100644 index 00000000..538de157 --- /dev/null +++ b/docs/source/containers/introduction.mdx @@ -0,0 +1,5 @@ +# Introduction + +[Hugging Face Deep Learning Containers for Google Cloud](https://cloud.google.com/deep-learning-containers/docs/choosing-container#hugging-face) are a set of Docker images for training and deploying Transformers, Sentence Transformers, and Diffusers models on Google Cloud Vertex AI and Google Kubernetes Engine (GKE). + +The [Google-Cloud-Containers](https://github.com/huggingface/Google-Cloud-Containers) repository contains the container files for building Hugging Face-specific Deep Learning Containers (DLCs), examples on how to train and deploy models on Google Cloud. The containers are publicly maintained, updated and released periodically by Hugging Face and the Google Cloud Team and available for all Google Cloud Customers within the [Google Cloud's Artifact Registry](https://cloud.google.com/deep-learning-containers/docs/choosing-container#hugging-face). For each supported combination of use-case (training, inference), accelerator type (CPU, GPU, TPU), and framework (PyTorch, TGI, TEI) containers are created. diff --git a/docs/source/features.mdx b/docs/source/features.mdx new file mode 100644 index 00000000..95eab807 --- /dev/null +++ b/docs/source/features.mdx @@ -0,0 +1,35 @@ +# 🔥 Features & benefits + +The Hugging Face DLCs provide ready-to-use, tested environments to train and deploy Hugging Face models. They can be used in combination with Google Cloud offerings including Google Kubernetes Engine (GKE) and Vertex AI. GKE is a fully-managed Kubernetes service in Google Cloud that can be used to deploy and operate containerized applications at scale using Google Cloud's infrastructure. Vertex AI is a Machine Learning (ML) platform that lets you train and deploy ML models and AI applications, and customize Large Language Models (LLMs). + +## One command is all you need + +With the new Hugging Face DLCs, train cutting-edge Transformers-based NLP models in a single line of code. The Hugging Face PyTorch DLCs for training come with all the libraries installed to run a single command e.g. via TRL CLI to fine-tune LLMs on any setting, either single-GPU, single-node multi-GPU, and more. + +## Accelerate machine learning from science to production + +In addition to Hugging Face DLCs, we created a first-class Hugging Face library for inference, [`huggingface-inference-toolkit`](https://github.com/huggingface/huggingface-inference-toolkit), that comes with the Hugging Face PyTorch DLCs for inference, with full support on serving any PyTorch model on Google Cloud. + +Deploy your trained models for inference with just one more line of code or select [any of the 170,000+ publicly available models from the model Hub](https://huggingface.co/models?library=pytorch,transformers&sort=trending) and deploy them on either Vertex AI or GKE. + +## High-performance text generation and embedding + +Besides the PyTorch-oriented DLCs, Hugging Face also provides high-performance inference for both text generation and embedding models via the Hugging Face DLCs for both [Text Generation Inference (TGI)](https://github.com/huggingface/text-generation-inference) and [Text Embeddings Inference (TEI)](https://github.com/huggingface/text-embeddings-inference), respectively. + +The Hugging Face DLC for TGI enables you to deploy [any of the +140,000 text generation inference supported models from the Hugging Face Hub](https://huggingface.co/models?other=text-generation-inference&sort=trending), or any custom model as long as [its architecture is supported within TGI](https://huggingface.co/docs/text-generation-inference/supported_models). + +The Hugging Face DLC for TEI enables you to deploy [any of the +10,000 embedding, re-ranking or sequence classification supported models from the Hugging Face Hub](https://huggingface.co/models?other=text-embeddings-inference&sort=trending), or any custom model as long as [its architecture is supported within TEI](https://huggingface.co/docs/text-embeddings-inference/en/supported_models). + +Additionally, these DLCs come with full support for Google Cloud meaning that deploying models from Google Cloud Storage (GCS) is also straight forward and requires no configuration. + +## Built-in performance + +Hugging Face DLCs feature built-in performance optimizations for PyTorch to train models faster. The DLCs also give you the flexibility to choose a training infrastructure that best aligns with the price/performance ratio for your workload. + +The Hugging Face Training DLCs are fully integrated with Google Cloud, enabling the use of [the latest generation of instances available on Google Cloud Compute Engine](https://cloud.google.com/products/compute?hl=en). + +Hugging Face Inference DLCs provide you with production-ready endpoints that scale quickly with your Google Cloud environment, built-in monitoring, and a ton of enterprise features. + +--- + +Read more about both Vertex AI in [their official documentation](https://cloud.google.com/vertex-ai/docs) and GKE in [their official documentation](https://cloud.google.com/kubernetes-engine/docs). diff --git a/docs/source/index.mdx b/docs/source/index.mdx index 88bc3388..4e9ca402 100644 --- a/docs/source/index.mdx +++ b/docs/source/index.mdx @@ -24,90 +24,3 @@ You have two options to take advantage of these DLCs as a Google Cloud customer: 1. To [get started](https://huggingface.co/blog/google-cloud-model-garden), you can use our no-code integrations within Vertex AI or GKE. 2. For more advanced scenarios, you can pull the containers from the Google Cloud Artifact Registry directly in your environment. [Here](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples) is a list of notebooks examples. - -## Features & benefits 🔥 - -The Hugging Face DLCs provide ready-to-use, tested environments to train and deploy Hugging Face models. They can be used in combination with Google Cloud offerings including Google Kubernetes Engine (GKE) and Vertex AI. GKE is a fully-managed Kubernetes service in Google Cloud that can be used to deploy and operate containerized applications at scale using Google Cloud's infrastructure. Vertex AI is a Machine Learning (ML) platform that lets you train and deploy ML models and AI applications, and customize Large Language Models (LLMs). - -### One command is all you need - -With the new Hugging Face DLCs, train cutting-edge Transformers-based NLP models in a single line of code. The Hugging Face PyTorch DLCs for training come with all the libraries installed to run a single command e.g. via TRL CLI to fine-tune LLMs on any setting, either single-GPU, single-node multi-GPU, and more. - -### Accelerate machine learning from science to production - -In addition to Hugging Face DLCs, we created a first-class Hugging Face library for inference, [`huggingface-inference-toolkit`](https://github.com/huggingface/huggingface-inference-toolkit), that comes with the Hugging Face PyTorch DLCs for inference, with full support on serving any PyTorch model on Google Cloud. - -Deploy your trained models for inference with just one more line of code or select [any of the 170,000+ publicly available models from the model Hub](https://huggingface.co/models?library=pytorch,transformers&sort=trending) and deploy them on either Vertex AI or GKE. - -### High-performance text generation and embedding - -Besides the PyTorch-oriented DLCs, Hugging Face also provides high-performance inference for both text generation and embedding models via the Hugging Face DLCs for both [Text Generation Inference (TGI)](https://github.com/huggingface/text-generation-inference) and [Text Embeddings Inference (TEI)](https://github.com/huggingface/text-embeddings-inference), respectively. - -The Hugging Face DLC for TGI enables you to deploy [any of the +140,000 text generation inference supported models from the Hugging Face Hub](https://huggingface.co/models?other=text-generation-inference&sort=trending), or any custom model as long as [its architecture is supported within TGI](https://huggingface.co/docs/text-generation-inference/supported_models). - -The Hugging Face DLC for TEI enables you to deploy [any of the +10,000 embedding, re-ranking or sequence classification supported models from the Hugging Face Hub](https://huggingface.co/models?other=text-embeddings-inference&sort=trending), or any custom model as long as [its architecture is supported within TEI](https://huggingface.co/docs/text-embeddings-inference/en/supported_models). - -Additionally, these DLCs come with full support for Google Cloud meaning that deploying models from Google Cloud Storage (GCS) is also straight forward and requires no configuration. - -### Built-in performance - -Hugging Face DLCs feature built-in performance optimizations for PyTorch to train models faster. The DLCs also give you the flexibility to choose a training infrastructure that best aligns with the price/performance ratio for your workload. - -The Hugging Face Training DLCs are fully integrated with Google Cloud, enabling the use of [the latest generation of instances available on Google Cloud Compute Engine](https://cloud.google.com/products/compute?hl=en). - -Hugging Face Inference DLCs provide you with production-ready endpoints that scale quickly with your Google Cloud environment, built-in monitoring, and a ton of enterprise features. - ---- - -Read more about both Vertex AI in [their official documentation](https://cloud.google.com/vertex-ai/docs) and GKE in [their official documentation](https://cloud.google.com/kubernetes-engine/docs). - -## Resources, Documentation & Examples 📄 - -Learn how to use Hugging Face in Google Cloud by reading our blog posts, documentation and examples below. - -### Blog posts - -- [Hugging Face and Google partner for open AI collaboration](https://huggingface.co/blog/gcp-partnership) -- [Google Cloud TPUs made available to Hugging Face users](https://huggingface.co/blog/tpu-inference-endpoints-spaces) -- [Making thousands of open LLMs bloom in the Vertex AI Model Garden](https://huggingface.co/blog/google-cloud-model-garden) -- [Deploy Meta Llama 3.1 405B on Google Cloud Vertex AI](https://huggingface.co/blog/llama31-on-vertex-ai) - -### Documentation - -- [Google Cloud Hugging Face Deep Learning Containers](https://cloud.google.com/deep-learning-containers/docs/choosing-container#hugging-face) -- [Google Cloud public Artifact Registry for DLCs](https://console.cloud.google.com/artifacts/docker/deeplearning-platform-release/us/gcr.io) -- [Serve Gemma open models using GPUs on GKE with Hugging Face TGI](https://cloud.google.com/kubernetes-engine/docs/tutorials/serve-gemma-gpu-tgi) -- [Generative AI on Vertex - Use Hugging Face text generation models](https://cloud.google.com/vertex-ai/generative-ai/docs/open-models/use-hugging-face-models) - -### Examples - -- [All examples](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples) - -#### GKE - -- Training - - - [Full SFT fine-tuning of Gemma 2B in a multi-GPU instance with TRL on GKE](https://github.com/huggingface/Google-Cloud-Containers/blob/main/examples/gke/trl-full-fine-tuning) - - [LoRA SFT fine-tuning of Mistral 7B v0.3 in a single GPU instance with TRL on GKE](https://github.com/huggingface/Google-Cloud-Containers/blob/main/examples/gke/trl-lora-fine-tuning) - -- Inference - - - [Deploying Llama3 8B with Text Generation Inference (TGI) on GKE](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/gke/tgi-deployment) - - [Deploying Qwen2 7B Instruct with Text Generation Inference (TGI) from a GCS Bucket on GKE](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/gke/tgi-from-gcs-deployment) - - [Deploying Snowflake's Arctic Embed (M) with Text Embeddings Inference (TEI) on GKE](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/gke/tei-deployment) - - [Deploying BGE Base v1.5 (English) with Text Embeddings Inference (TEI) from a GCS Bucket on GKE](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/gke/tei-from-gcs-deployment) - -#### Vertex AI - -- Training - - - [Full SFT fine-tuning of Mistral 7B v0.3 in a multi-GPU instance with TRL on Vertex AI](https://github.com/huggingface/Google-Cloud-Containers/blob/main/examples/vertex-ai/notebooks/trl-full-sft-fine-tuning-on-vertex-ai) - - [LoRA SFT fine-tuning of Mistral 7B v0.3 in a single GPU instance with TRL on Vertex AI](https://github.com/huggingface/Google-Cloud-Containers/blob/main/examples/vertex-ai/notebooks/trl-lora-sft-fine-tuning-on-vertex-ai) - -- Inference - - - [Deploying a BERT model for a text classification task using huggingface-inference-toolkit for a Custom Prediction Routine (CPR) on Vertex AI](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/vertex-ai/notebooks/deploy-bert-on-vertex-ai) - - [Deploying an embedding model with Text Embeddings Inference (TEI) on Vertex AI](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/vertex-ai/notebooks/deploy-embedding-on-vertex-ai) - - [Deploying Gemma 7B Instruct with Text Generation Inference (TGI) on Vertex AI](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/vertex-ai/notebooks/deploy-gemma-on-vertex-ai) - - [Deploying Gemma 7B Instruct with Text Generation Inference (TGI) from a GCS Bucket on Vertex AI](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/vertex-ai/notebooks/deploy-gemma-from-gcs-on-vertex-ai) - - [Deploying FLUX with Hugging Face PyTorch DLCs for Inference on Vertex AI](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/vertex-ai/notebooks/deploy-flux-on-vertex-ai) diff --git a/docs/source/resources.mdx b/docs/source/resources.mdx new file mode 100644 index 00000000..7ff83001 --- /dev/null +++ b/docs/source/resources.mdx @@ -0,0 +1,57 @@ +# 📄 Other Resources + +Learn how to use Hugging Face in Google Cloud by reading our blog posts, Google documentation and examples below. + +## Blog posts + +- [Hugging Face and Google partner for open AI collaboration](https://huggingface.co/blog/gcp-partnership) +- [Google Cloud TPUs made available to Hugging Face users](https://huggingface.co/blog/tpu-inference-endpoints-spaces) +- [Making thousands of open LLMs bloom in the Vertex AI Model Garden](https://huggingface.co/blog/google-cloud-model-garden) +- [Deploy Meta Llama 3.1 405B on Google Cloud Vertex AI](https://huggingface.co/blog/llama31-on-vertex-ai) + +## Google Documentation + +- [Google Cloud Hugging Face Deep Learning Containers](https://cloud.google.com/deep-learning-containers/docs/choosing-container#hugging-face) +- [Google Cloud public Artifact Registry for DLCs](https://console.cloud.google.com/artifacts/docker/deeplearning-platform-release/us/gcr.io) +- [Serve Gemma open models using GPUs on GKE with Hugging Face TGI](https://cloud.google.com/kubernetes-engine/docs/tutorials/serve-gemma-gpu-tgi) +- [Generative AI on Vertex - Use Hugging Face text generation models](https://cloud.google.com/vertex-ai/generative-ai/docs/open-models/use-hugging-face-models) + +## Examples + +- [All examples](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples) + +### GKE + +- Training + + - [Fine-tune Gemma 2B with PyTorch Training DLC using SFT on GKE](https://github.com/huggingface/Google-Cloud-Containers/blob/main/examples/gke/trl-full-fine-tuning) + - [Fine-tune Mistral 7B v0.3 with PyTorch Training DLC using SFT + LoRA on GKE](https://github.com/huggingface/Google-Cloud-Containers/blob/main/examples/gke/trl-lora-fine-tuning) + +- Inference + + - [Deploy Meta Llama 3 8B with TGI DLC on GKE](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/gke/tgi-deployment) + - [Deploying Llama3 8B with Text Generation Inference (TGI) on GKE](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/gke/tgi-from-gcs-deployment) + - [Deploy Snowflake's Arctic Embed with TEI DLC on GKE](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/gke/tei-deployment) + - [Deploy BGE Base v1.5 with TEI DLC from GCS on GKE](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/gke/tei-from-gcs-deployment) + +### Vertex AI + +- Training + + - [Fine-tune Mistral 7B v0.3 with PyTorch Training DLC using SFT on Vertex AI](https://github.com/huggingface/Google-Cloud-Containers/blob/main/examples/vertex-ai/notebooks/trl-full-sft-fine-tuning-on-vertex-ai/vertex-notebook.ipynb) + - [Fine-tune Gemma 2B with PyTorch Training DLC using SFT + LoRA on Vertex AI](https://github.com/huggingface/Google-Cloud-Containers/blob/main/examples/vertex-ai/notebooks/trl-lora-sft-fine-tuning-on-vertex-ai/vertex-notebook.ipynb) + +- Inference + + - [Deploy BERT Models with PyTorch Inference DLC on Vertex AI](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/vertex-ai/notebooks/deploy-bert-on-vertex-ai/vertex-notebook.ipynb) + - [Deploy Embedding Models with TEI DLC on Vertex AI](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/vertex-ai/notebooks/deploy-embedding-on-vertex-ai/vertex-notebook.ipynb) + - [Deploy Gemma 7B with TGI DLC on Vertex AI](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/vertex-ai/notebooks/deploy-gemma-on-vertex-ai/vertex-notebook.ipynb) + - [Deploy Gemma 7B with TGI DLC from GCS on Vertex AI](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/vertex-ai/notebooks/deploy-gemma-from-gcs-on-vertex-ai/vertex-notebook.ipynb) + - [Deploy FLUX with PyTorch Inference DLC on Vertex AI](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/vertex-ai/notebooks/deploy-flux-on-vertex-ai/vertex-notebook.ipynb) + - [Deploy Meta Llama 3.1 405B with TGI DLC on Vertex AI](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/vertex-ai/notebooks/deploy-llama-3-1-405b-on-vertex-ai/vertex-notebook.ipynb) + +### (Preview) Cloud Run + +- Inference + + - [Deploy Meta Llama 3.1 with TGI DLC on Cloud Run](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/cloud-run/tgi-deployment) diff --git a/examples/cloud-run/README.md b/examples/cloud-run/README.md index 17902fe9..19978b52 100644 --- a/examples/cloud-run/README.md +++ b/examples/cloud-run/README.md @@ -7,6 +7,6 @@ This directory contains usage examples of the Hugging Face Deep Learning Contain ## Inference Examples -| Example | Description | -| ---------------------------------- | ------------------------------------------------------------------------ | -| [tgi-deployment](./tgi-deployment) | Deploying Meta Llama 3.1 8B with Text Generation Inference on Cloud Run. | +| Example | Title | +| ---------------------------------- | ----------------------------------------------- | +| [tgi-deployment](./tgi-deployment) | Deploy Meta Llama 3.1 with TGI DLC on Cloud Run | diff --git a/examples/cloud-run/tgi-deployment/README.md b/examples/cloud-run/tgi-deployment/README.md index d9423770..b167db18 100644 --- a/examples/cloud-run/tgi-deployment/README.md +++ b/examples/cloud-run/tgi-deployment/README.md @@ -1,6 +1,13 @@ -# Deploy Meta Llama 3.1 8B with Text Generation Inference on Cloud Run +--- +title: Deploy Meta Llama 3.1 8B with TGI DLC on Cloud Run +type: inference +--- -Meta Llama 3.1 is the latest open LLM from Meta, released in July 2024. Meta Llama 3.1 comes in three sizes: 8B for efficient deployment and development on consumer-size GPU, 70B for large-scale AI native applications, and 405B for synthetic data, LLM as a Judge or distillation; among other use cases. Text Generation Inference (TGI) is a toolkit developed by Hugging Face for deploying and serving LLMs, with high performance text generation. Google Cloud Run is a serverless container platform that allows developers to deploy and manage containerized applications without managing infrastructure, enabling automatic scaling and billing only for usage. This example showcases how to deploy an LLM from the Hugging Face Hub, in this case Meta Llama 3.1 8B Instruct model quantized to INT4 using AWQ, with the Hugging Face DLC for TGI on Google Cloud Run with GPU support (on preview). +# Deploy Meta Llama 3.1 8B with TGI DLC on Cloud Run + +Meta Llama 3.1 is the latest open LLM from Meta, released in July 2024. Meta Llama 3.1 comes in three sizes: 8B for efficient deployment and development on consumer-size GPU, 70B for large-scale AI native applications, and 405B for synthetic data, LLM as a Judge or distillation; among other use cases. Text Generation Inference (TGI) is a toolkit developed by Hugging Face for deploying and serving LLMs, with high performance text generation. Google Cloud Run is a serverless container platform that allows developers to deploy and manage containerized applications without managing infrastructure, enabling automatic scaling and billing only for usage. + +This example showcases how to deploy an LLM from the Hugging Face Hub, in this case Meta Llama 3.1 8B Instruct model quantized to INT4 using AWQ, with the Hugging Face DLC for TGI on Google Cloud Run with GPU support (on preview). > [!NOTE] > GPU support on Cloud Run is only available as a waitlisted public preview. If you're interested in trying out the feature, [request a quota increase](https://cloud.google.com/run/quotas#increase) for `Total Nvidia L4 GPU allocation, per project per region`. At the time of writing this example, NVIDIA L4 GPUs (24GiB VRAM) are the only available GPUs on Cloud Run; enabling automatic scaling up to 7 instances by default (more available via quota), as well as scaling down to zero instances when there are no requests. diff --git a/examples/gke/README.md b/examples/gke/README.md index cb19954a..c7d0f014 100644 --- a/examples/gke/README.md +++ b/examples/gke/README.md @@ -4,16 +4,16 @@ This directory contains usage examples of the Hugging Face Deep Learning Contain ## Training Examples -| Example | Description | -| ---------------------------------------------- | --------------------------------------------------------------------------------- | -| [trl-full-fine-tuning](./trl-full-fine-tuning) | Full SFT fine-tuning of Gemma 2B in a multi-GPU instance with TRL on GKE. | -| [trl-lora-fine-tuning](./trl-lora-fine-tuning) | LoRA SFT fine-tuning of Mistral 7B v0.3 in a single GPU instance with TRL on GKE. | +| Example | Title | +| ---------------------------------------------- | --------------------------------------------------------------------------- | +| [trl-full-fine-tuning](./trl-full-fine-tuning) | Fine-tune Gemma 2B with PyTorch Training DLC using SFT on GKE | +| [trl-lora-fine-tuning](./trl-lora-fine-tuning) | Fine-tune Mistral 7B v0.3 with PyTorch Training DLC using SFT + LoRA on GKE | ## Inference Examples -| Example | Description | -| ---------------------------------------------------- | ------------------------------------------------------------------------------------------------ | -| [tgi-deployment](./tgi-deployment) | Deploying Llama3 8B with Text Generation Inference (TGI) on GKE. | -| [tgi-from-gcs-deployment](./tgi-from-gcs-deployment) | Deploying Qwen2 7B Instruct with Text Generation Inference (TGI) from a GCS Bucket on GKE. | -| [tei-deployment](./tei-deployment) | Deploying Snowflake's Arctic Embed (M) with Text Embeddings Inference (TEI) on GKE. | -| [tei-from-gcs-deployment](./tei-from-gcs-deployment) | Deploying BGE Base v1.5 (English) with Text Embeddings Inference (TEI) from a GCS Bucket on GKE. | +| Example | Title | +| ---------------------------------------------------- | --------------------------------------------------- | +| [tgi-deployment](./tgi-deployment) | Deploy Meta Llama 3 8B with TGI DLC on GKE | +| [tgi-from-gcs-deployment](./tgi-from-gcs-deployment) | Deploy Qwen2 7B with TGI DLC from GCS on GKE | +| [tei-deployment](./tei-deployment) | Deploy Snowflake's Arctic Embed with TEI DLC on GKE | +| [tei-from-gcs-deployment](./tei-from-gcs-deployment) | Deploy BGE Base v1.5 with TEI DLC from GCS on GKE | diff --git a/examples/gke/tei-deployment/README.md b/examples/gke/tei-deployment/README.md index f53f0d2a..fae8653a 100644 --- a/examples/gke/tei-deployment/README.md +++ b/examples/gke/tei-deployment/README.md @@ -1,4 +1,9 @@ -# Deploy Snowflake's Arctic Embed (M) with Text Embeddings Inference (TEI) on GKE +--- +title: Deploy Snowflake's Arctic Embed with TEI DLC on GKE +type: inference +--- + +# Deploy Snowflake's Arctic Embed with TEI DLC on GKE Snowflake's Arctic Embed is a suite of text embedding models that focuses on creating high-quality retrieval models optimized for performance, achieving state-of-the-art (SOTA) performance on the MTEB/BEIR leaderboard for each of their size variants. Text Embeddings Inference (TEI) is a toolkit developed by Hugging Face for deploying and serving open source text embeddings and sequence classification models; enabling high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5. And, Google Kubernetes Engine (GKE) is a fully-managed Kubernetes service in Google Cloud that can be used to deploy and operate containerized applications at scale using GCP's infrastructure. diff --git a/examples/gke/tei-from-gcs-deployment/README.md b/examples/gke/tei-from-gcs-deployment/README.md index e20bdf0e..e18fdbd3 100644 --- a/examples/gke/tei-from-gcs-deployment/README.md +++ b/examples/gke/tei-from-gcs-deployment/README.md @@ -1,4 +1,9 @@ -# Deploy BGE Base v1.5 (English) with Text Embeddings Inference (TEI) from a GCS Bucket on GKE +--- +title: Deploy BGE Base v1.5 with TEI DLC from GCS on GKE +type: inference +--- + +# Deploy BGE Base v1.5 with TEI DLC from GCS on GKE BGE, standing for BAAI General Embedding, is a collection of embedding models released by BAAI, which is an English base model for general embedding tasks ranked in the MTEB Leaderboard. Text Embeddings Inference (TEI) is a toolkit developed by Hugging Face for deploying and serving open source text embeddings and sequence classification models; enabling high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5. And, Google Kubernetes Engine (GKE) is a fully-managed Kubernetes service in Google Cloud that can be used to deploy and operate containerized applications at scale using GCP's infrastructure. @@ -91,7 +96,7 @@ gcloud container clusters get-credentials $CLUSTER_NAME --location=$LOCATION This is an optional step in the tutorial, since you may want to reuse an existing model on a GCS Bucket, if that is the case, then feel free to jump to the next step of the tutorial on how to configure the IAM for GCS so that you can access the bucket from a pod in the GKE Cluster. -Otherwise, to upload a model from the Hugging Face Hub to a GCS Bucket, you can use the script [./scripts/upload_model_to_gcs.sh](./scripts/upload_model_to_gcs.sh), which will download the model from the Hugging Face Hub and upload it to the GCS Bucket (and create the bucket if not created already). +Otherwise, to upload a model from the Hugging Face Hub to a GCS Bucket, you can use the script [scripts/upload_model_to_gcs.sh](../../../scripts/upload_model_to_gcs.sh), which will download the model from the Hugging Face Hub and upload it to the GCS Bucket (and create the bucket if not created already). The `gsutil` component should be installed via `gcloud`, and the Python packages `huggingface_hub` with the extra `hf_transfer`, and the package `crcmod` should also be installed. diff --git a/examples/gke/tgi-deployment/README.md b/examples/gke/tgi-deployment/README.md index d91720f3..17549fc9 100644 --- a/examples/gke/tgi-deployment/README.md +++ b/examples/gke/tgi-deployment/README.md @@ -1,6 +1,13 @@ -# Deploy Meta Llama 3 8B with Text Generation Inference (TGI) on GKE +--- +title: Deploy Meta Llama 3 8B with TGI DLC on GKE +type: inference +--- -Meta Llama 3 is the latest LLM from the Llama family, released by Meta; coming in two sizes 8B and 70B, including both the base model and the instruction-tuned model. Text Generation Inference (TGI) is a toolkit developed by Hugging Face for deploying and serving LLMs, with high performance text generation. And, Google Kubernetes Engine (GKE) is a fully-managed Kubernetes service in Google Cloud that can be used to deploy and operate containerized applications at scale using GCP's infrastructure. This post explains how to deploy an LLM from the Hugging Face Hub, as Llama3 8B Instruct, on a GKE Cluster running a purpose-built container to deploy LLMs in a secure and managed environment with the Hugging Face DLC for TGI. +# Deploy Meta Llama 3 8B with TGI DLC on GKE + +Meta Llama 3 is the latest LLM from the Llama family, released by Meta; coming in two sizes 8B and 70B, including both the base model and the instruction-tuned model. Text Generation Inference (TGI) is a toolkit developed by Hugging Face for deploying and serving LLMs, with high performance text generation. And, Google Kubernetes Engine (GKE) is a fully-managed Kubernetes service in Google Cloud that can be used to deploy and operate containerized applications at scale using GCP's infrastructure. + +This example showcases how to deploy an LLM from the Hugging Face Hub, as Meta Llama 3 8B Instruct, on a GKE Cluster running a purpose-built container to deploy LLMs in a secure and managed environment with the Hugging Face DLC for TGI. ## Setup / Configuration diff --git a/examples/gke/tgi-from-gcs-deployment/README.md b/examples/gke/tgi-from-gcs-deployment/README.md index e0bd00f6..11d222a9 100644 --- a/examples/gke/tgi-from-gcs-deployment/README.md +++ b/examples/gke/tgi-from-gcs-deployment/README.md @@ -1,6 +1,13 @@ -# Deploy Qwen2 7B Instruct with Text Generation Inference (TGI) from a GCS Bucket on GKE +--- +title: Deploy Qwen2 7B with TGI DLC from GCS on GKE +type: inference +--- -Qwen2 is the new series of Qwen Large Language Models (LLMs) built by Alibaba Cloud, with both base and instruction-tuned language models ranging from 0.5 to 72 billion parameters, including a Mixture-of-Experts model; the 7B variant sitting in the second place in the 7B size range in the Open LLM Leaderboard by Hugging Face and the 72B one in the first place amongst any size. Text Generation Inference (TGI) is a toolkit developed by Hugging Face for deploying and serving LLMs, with high performance text generation. And, Google Kubernetes Engine (GKE) is a fully-managed Kubernetes service in Google Cloud that can be used to deploy and operate containerized applications at scale using GCP's infrastructure. This post explains how to deploy an LLM from a Google Cloud Storage (GCS) Bucket on a GKE Cluster running a purpose-built container to deploy LLMs in a secure and managed environment with the Hugging Face DLC for TGI. +# Deploy Qwen2 7B with TGI DLC from GCS on GKE + +Qwen2 is the new series of Qwen Large Language Models (LLMs) built by Alibaba Cloud, with both base and instruction-tuned language models ranging from 0.5 to 72 billion parameters, including a Mixture-of-Experts model; the 7B variant sitting in the second place in the 7B size range in the Open LLM Leaderboard by Hugging Face and the 72B one in the first place amongst any size. Text Generation Inference (TGI) is a toolkit developed by Hugging Face for deploying and serving LLMs, with high performance text generation. And, Google Kubernetes Engine (GKE) is a fully-managed Kubernetes service in Google Cloud that can be used to deploy and operate containerized applications at scale using GCP's infrastructure. + +This example showcases how to deploy an LLM from a Google Cloud Storage (GCS) Bucket on a GKE Cluster running a purpose-built container to deploy LLMs in a secure and managed environment with the Hugging Face DLC for TGI. ## Setup / Configuration diff --git a/examples/gke/trl-full-fine-tuning/README.md b/examples/gke/trl-full-fine-tuning/README.md index 9db76d7a..73975e8c 100644 --- a/examples/gke/trl-full-fine-tuning/README.md +++ b/examples/gke/trl-full-fine-tuning/README.md @@ -1,6 +1,13 @@ -# Fine-tune Gemma 2B with TRL on GKE +--- +title: Fine-tune Gemma 2B with PyTorch Training DLC using SFT on GKE +type: training +--- -Gemma is a family of lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models, developed by Google DeepMind and other teams across Google. TRL is a full stack library to fine-tune and align Large Language Models (LLMs) developed by Hugging Face. And, Google Kubernetes Engine (GKE) is a fully-managed Kubernetes service in Google Cloud that can be used to deploy and operate containerized applications at scale using GCP's infrastructure. This post explains how to full fine-tune Gemma 2B with TRL via Supervised Fine-Tuning (SFT) in a multi-GPU setting on a GKE Cluster. +# Fine-tune Gemma 2B with PyTorch Training DLC using SFT on GKE + +Gemma is a family of lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models, developed by Google DeepMind and other teams across Google. TRL is a full stack library to fine-tune and align Large Language Models (LLMs) developed by Hugging Face. And, Google Kubernetes Engine (GKE) is a fully-managed Kubernetes service in Google Cloud that can be used to deploy and operate containerized applications at scale using GCP's infrastructure. + +This example showcases how to full fine-tune Gemma 2B with TRL via Supervised Fine-Tuning (SFT) in a multi-GPU setting on a GKE Cluster. ## Setup / Configuration diff --git a/examples/gke/trl-lora-fine-tuning/README.md b/examples/gke/trl-lora-fine-tuning/README.md index 47e96cc6..61c9cf85 100644 --- a/examples/gke/trl-lora-fine-tuning/README.md +++ b/examples/gke/trl-lora-fine-tuning/README.md @@ -1,6 +1,13 @@ -# Fine-tune Mistral 7B v0.3 with TRL on GKE +--- +title: Fine-tune Mistral 7B v0.3 with PyTorch Training DLC using SFT + LoRA on GKE +type: training +--- -Mistral is a family of models with varying sizes, created by the Mistral AI team; the Mistral 7B v0.3 LLM is a Mistral 7B v0.2 with extended vocabulary. TRL is a full stack library to fine-tune and align Large Language Models (LLMs) developed by Hugging Face. And, Google Kubernetes Engine (GKE) is a fully-managed Kubernetes service in Google Cloud that can be used to deploy and operate containerized applications at scale using GCP's infrastructure. This post explains how to fine-tune Mistral 7B v0.3 with TRL via Supervised Fine-Tuning (SFT) and Low-Rank Adaptation (LoRA) in a single GPU on a GKE Cluster. +# Fine-tune Mistral 7B v0.3 with PyTorch Training DLC using SFT + LoRA on GKE + +Mistral is a family of models with varying sizes, created by the Mistral AI team; the Mistral 7B v0.3 LLM is a Mistral 7B v0.2 with extended vocabulary. TRL is a full stack library to fine-tune and align Large Language Models (LLMs) developed by Hugging Face. And, Google Kubernetes Engine (GKE) is a fully-managed Kubernetes service in Google Cloud that can be used to deploy and operate containerized applications at scale using GCP's infrastructure. + +This example showcases how to fine-tune Mistral 7B v0.3 with TRL via Supervised Fine-Tuning (SFT) and Low-Rank Adaptation (LoRA) in a single GPU on a GKE Cluster. ## Setup / Configuration diff --git a/examples/vertex-ai/README.md b/examples/vertex-ai/README.md index 37673bba..45a7dc55 100644 --- a/examples/vertex-ai/README.md +++ b/examples/vertex-ai/README.md @@ -8,22 +8,22 @@ For Google Vertex AI, we differentiate between the executable Jupyter Notebook e ### Training Examples -| Example | Description | -| ------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------- | -| [trl-full-sft-fine-tuning-on-vertex-ai](./notebooks/trl-full-sft-fine-tuning-on-vertex-ai) | Full SFT fine-tuning of Mistral 7B v0.3 in a multi-GPU instance with TRL on Vertex AI. | -| [trl-lora-sft-fine-tuning-on-vertex-ai](./notebooks/trl-lora-sft-fine-tuning-on-vertex-ai) | LoRA SFT fine-tuning of Mistral 7B v0.3 in a single GPU instance with TRL on Vertex AI. | +| Example | Title | +| ------------------------------------------------------------------------------------------ | -------------------------------------------------------------------------- | +| [trl-full-sft-fine-tuning-on-vertex-ai](./notebooks/trl-full-sft-fine-tuning-on-vertex-ai) | Fine-tune Mistral 7B v0.3 with PyTorch Training DLC using SFT on Vertex AI | +| [trl-lora-sft-fine-tuning-on-vertex-ai](./notebooks/trl-lora-sft-fine-tuning-on-vertex-ai) | Fine-tune Gemma 2B with PyTorch Training DLC using SFT + LoRA on Vertex AI | ### Inference Examples -| Example | Description | -| ------------------------------------------------------------------------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------- | -| [deploy-bert-on-vertex-ai](./notebooks/deploy-bert-on-vertex-ai) | Deploying a BERT model for a text classification task using `huggingface-inference-toolkit` for a Custom Prediction Routine (CPR) on Vertex AI. | -| [deploy-embedding-on-vertex-ai](./notebooks/deploy-embedding-on-vertex-ai) | Deploying an embedding model with Text Embeddings Inference (TEI) on Vertex AI. | -| [deploy-gemma-on-vertex-ai](./notebooks/deploy-gemma-on-vertex-ai) | Deploying Gemma 7B Instruct with Text Generation Inference (TGI) on Vertex AI. | -| [deploy-gemma-from-gcs-on-vertex-ai](./notebooks/deploy-gemma-from-gcs-on-vertex-ai) | Deploying Gemma 7B Instruct with Text Generation Inference (TGI) from a GCS Bucket on Vertex AI. | -| [deploy-flux-on-vertex-ai](./notebooks/deploy-flux-on-vertex-ai) | Deploying FLUX with Hugging Face PyTorch DLCs for Inference on Vertex AI. | -| [deploy-llama-3-1-405b-on-vertex-ai](./notebooks/deploy-llama-405b-on-vertex-ai/vertex-notebook.ipynb) | Deploying Meta Llama 3.1 405B in FP8 with Hugging Face DLC for TGI on Vertex AI. | +| Example | Title | +| ------------------------------------------------------------------------------------------------------ | ---------------------------------------------------------- | +| [deploy-bert-on-vertex-ai](./notebooks/deploy-bert-on-vertex-ai) | Deploy BERT Models with PyTorch Inference DLC on Vertex AI | +| [deploy-embedding-on-vertex-ai](./notebooks/deploy-embedding-on-vertex-ai) | Deploy Embedding Models with TEI DLC on Vertex AI | +| [deploy-gemma-on-vertex-ai](./notebooks/deploy-gemma-on-vertex-ai) | Deploy Gemma 7B with TGI DLC on Vertex AI | +| [deploy-gemma-from-gcs-on-vertex-ai](./notebooks/deploy-gemma-from-gcs-on-vertex-ai) | Deploy Gemma 7B with TGI DLC from GCS on Vertex AI | +| [deploy-flux-on-vertex-ai](./notebooks/deploy-flux-on-vertex-ai) | Deploy FLUX with PyTorch Inference DLC on Vertex AI | +| [deploy-llama-3-1-405b-on-vertex-ai](./notebooks/deploy-llama-405b-on-vertex-ai/vertex-notebook.ipynb) | Deploy Meta Llama 3.1 405B with TGI DLC on Vertex AI | ## Pipelines -More to come soon! +Coming soon! diff --git a/examples/vertex-ai/notebooks/deploy-bert-on-vertex-ai/vertex-notebook.ipynb b/examples/vertex-ai/notebooks/deploy-bert-on-vertex-ai/vertex-notebook.ipynb index 09a1fd5c..9c4728e8 100644 --- a/examples/vertex-ai/notebooks/deploy-bert-on-vertex-ai/vertex-notebook.ipynb +++ b/examples/vertex-ai/notebooks/deploy-bert-on-vertex-ai/vertex-notebook.ipynb @@ -4,14 +4,26 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Deploy BERT Models with PyTorch Inference on Vertex AI" + "<!-- ---\n", + "title: Deploy BERT Models with PyTorch Inference DLC on Vertex AI\n", + "type: inference\n", + "--- -->" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "DistilBERT is a small, fast, cheap and light Transformer model trained by distilling BERT, which is a bidirectional transformer pretrained using a combination of masked language modeling objective and next sentence prediction on a large corpus. And, Google Vertex AI is a Machine Learning (ML) platform that lets you train and deploy ML models and AI applications, and customize large language models (LLMs) for use in your AI-powered applications. This example showcases how to deploy any supported PyTorch model from the Hugging Face Hub, in this case [`distilbert/distilbert-base-uncased-finetuned-sst-2-english`](https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english), on Vertex AI using the PyTorch Inference DLC available in Google Cloud Platform (GCP) in both CPU and GPU instances." + "# Deploy BERT Models with PyTorch Inference DLC on Vertex AI" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "DistilBERT is a small, fast, cheap and light Transformer model trained by distilling BERT, which is a bidirectional transformer pretrained using a combination of masked language modeling objective and next sentence prediction on a large corpus. And, Google Vertex AI is a Machine Learning (ML) platform that lets you train and deploy ML models and AI applications, and customize large language models (LLMs) for use in your AI-powered applications.\n", + "\n", + "This example showcases how to deploy any supported PyTorch model from the Hugging Face Hub, in this case [`distilbert/distilbert-base-uncased-finetuned-sst-2-english`](https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english), on Vertex AI using the PyTorch Inference DLC available in Google Cloud Platform (GCP) in both CPU and GPU instances." ] }, { diff --git a/examples/vertex-ai/notebooks/deploy-embedding-on-vertex-ai/vertex-notebook.ipynb b/examples/vertex-ai/notebooks/deploy-embedding-on-vertex-ai/vertex-notebook.ipynb index 8c829009..70e765fc 100644 --- a/examples/vertex-ai/notebooks/deploy-embedding-on-vertex-ai/vertex-notebook.ipynb +++ b/examples/vertex-ai/notebooks/deploy-embedding-on-vertex-ai/vertex-notebook.ipynb @@ -4,14 +4,26 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Deploy Embedding Models with TEI on Vertex AI" + "<!-- ---\n", + "title: Deploy Embedding Models with TEI DLC on Vertex AI\n", + "type: inference\n", + "--- -->" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "BGE, standing for BAAI General Embedding, is a collection of embedding models released by BAAI, which is an English base model for general embedding tasks ranked in the MTEB Leaderboard. Text Embeddings Inference (TEI) is a toolkit developed by Hugging Face for deploying and serving open source text embeddings and sequence classification models; enabling high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5. And, Google Vertex AI is a Machine Learning (ML) platform that lets you train and deploy ML models and AI applications, and customize large language models (LLMs) for use in your AI-powered applications. This example showcases how to deploy any supported embedding model, in this case [`BAAI/bge-large-en-v1.5`](https://huggingface.co/BAAI/bge-large-en-v1.5), from the Hugging Face Hub on Vertex AI using the TEI DLC available in Google Cloud Platform (GCP) in both CPU and GPU instances." + "# Deploy Embedding Models with TEI DLC on Vertex AI" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "BGE, standing for BAAI General Embedding, is a collection of embedding models released by BAAI, which is an English base model for general embedding tasks ranked in the MTEB Leaderboard. Text Embeddings Inference (TEI) is a toolkit developed by Hugging Face for deploying and serving open source text embeddings and sequence classification models; enabling high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5. And, Google Vertex AI is a Machine Learning (ML) platform that lets you train and deploy ML models and AI applications, and customize large language models (LLMs) for use in your AI-powered applications.\n", + "\n", + "This example showcases how to deploy any supported embedding model, in this case [`BAAI/bge-large-en-v1.5`](https://huggingface.co/BAAI/bge-large-en-v1.5), from the Hugging Face Hub on Vertex AI using the TEI DLC available in Google Cloud Platform (GCP) in both CPU and GPU instances." ] }, { diff --git a/examples/vertex-ai/notebooks/deploy-flux-on-vertex-ai/vertex-notebook.ipynb b/examples/vertex-ai/notebooks/deploy-flux-on-vertex-ai/vertex-notebook.ipynb index 4340b179..eb298bde 100644 --- a/examples/vertex-ai/notebooks/deploy-flux-on-vertex-ai/vertex-notebook.ipynb +++ b/examples/vertex-ai/notebooks/deploy-flux-on-vertex-ai/vertex-notebook.ipynb @@ -4,14 +4,26 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Deploy FLUX with Hugging Face PyTorch DLCs for Inference on Vertex AI" + "<!-- ---\n", + "title: Deploy FLUX with PyTorch Inference DLC on Vertex AI\n", + "type: inference\n", + "--- -->" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "FLUX is an open-weights 12B parameter rectified flow transformer that generates images from text descriptions, pushing the boundaries of text-to-image generation created by Black Forest Labs, with a non-commercial license making it widely accessible for exploration and experimentation. And, Google Cloud Vertex AI is a Machine Learning (ML) platform that lets you train and deploy ML models and AI applications, and customize large language models (LLMs) for use in your AI-powered applications. This example showcases how to deploy any supported [`diffusers`](https://github.com/huggingface/diffusers) text-to-image model from the Hugging Face Hub, in this case [`black-forest-labs/FLUX.1-dev`](https://huggingface.co/black-forest-labs/FLUX.1-dev), on Vertex AI using the Hugging Face PyTorch DLC for Inference available in Google Cloud Platform (GCP) in both CPU and GPU instances." + "# Deploy FLUX with PyTorch Inference DLC on Vertex AI" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "FLUX is an open-weights 12B parameter rectified flow transformer that generates images from text descriptions, pushing the boundaries of text-to-image generation created by Black Forest Labs, with a non-commercial license making it widely accessible for exploration and experimentation. And, Google Cloud Vertex AI is a Machine Learning (ML) platform that lets you train and deploy ML models and AI applications, and customize large language models (LLMs) for use in your AI-powered applications.\n", + "\n", + "This example showcases how to deploy any supported [`diffusers`](https://github.com/huggingface/diffusers) text-to-image model from the Hugging Face Hub, in this case [`black-forest-labs/FLUX.1-dev`](https://huggingface.co/black-forest-labs/FLUX.1-dev), on Vertex AI using the Hugging Face PyTorch DLC for Inference available in Google Cloud Platform (GCP) in both CPU and GPU instances." ] }, { diff --git a/examples/vertex-ai/notebooks/deploy-gemma-from-gcs-on-vertex-ai/vertex-notebook.ipynb b/examples/vertex-ai/notebooks/deploy-gemma-from-gcs-on-vertex-ai/vertex-notebook.ipynb index c2f98ba3..0ccdaf98 100644 --- a/examples/vertex-ai/notebooks/deploy-gemma-from-gcs-on-vertex-ai/vertex-notebook.ipynb +++ b/examples/vertex-ai/notebooks/deploy-gemma-from-gcs-on-vertex-ai/vertex-notebook.ipynb @@ -4,14 +4,26 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Deploy Gemma 7B from GCS with TGI on Vertex AI " + "<!-- ---\n", + "title: Deploy Gemma 7B with TGI DLC from GCS on Vertex AI\n", + "type: inference\n", + "--- -->" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Gemma is a family of lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models, developed by Google DeepMind and other teams across Google. Text Generation Inference (TGI) is a toolkit developed by Hugging Face for deploying and serving LLMs, with high performance text generation. And, Google Vertex AI is a Machine Learning (ML) platform that lets you train and deploy ML models and AI applications, and customize large language models (LLMs) for use in your AI-powered applications. This example showcases how to deploy any supported text-generation model, in this case [`google/gemma-7b-it`](https://huggingface.co/google/gemma-7b-it), downloaded from the Hugging Face Hub and uploaded to a Google Cloud Storage (GCS) Bucket, on Vertex AI using the Hugging Face DLC for TGI available in Google Cloud Platform (GCP)." + "# Deploy Gemma 7B with TGI DLC from GCS on Vertex AI" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Gemma is a family of lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models, developed by Google DeepMind and other teams across Google. Text Generation Inference (TGI) is a toolkit developed by Hugging Face for deploying and serving LLMs, with high performance text generation. And, Google Vertex AI is a Machine Learning (ML) platform that lets you train and deploy ML models and AI applications, and customize large language models (LLMs) for use in your AI-powered applications.\n", + "\n", + "This example showcases how to deploy any supported text-generation model, in this case [`google/gemma-7b-it`](https://huggingface.co/google/gemma-7b-it), downloaded from the Hugging Face Hub and uploaded to a Google Cloud Storage (GCS) Bucket, on Vertex AI using the Hugging Face DLC for TGI available in Google Cloud Platform (GCP)." ] }, { diff --git a/examples/vertex-ai/notebooks/deploy-gemma-on-vertex-ai/vertex-notebook.ipynb b/examples/vertex-ai/notebooks/deploy-gemma-on-vertex-ai/vertex-notebook.ipynb index 184380f3..e496fb62 100644 --- a/examples/vertex-ai/notebooks/deploy-gemma-on-vertex-ai/vertex-notebook.ipynb +++ b/examples/vertex-ai/notebooks/deploy-gemma-on-vertex-ai/vertex-notebook.ipynb @@ -4,14 +4,26 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Deploy Gemma 7B with TGI on Vertex AI" + "<!-- ---\n", + "title: Deploy Gemma 7B with TGI DLC on Vertex AI\n", + "type: inference\n", + "--- -->" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Gemma is a family of lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models, developed by Google DeepMind and other teams across Google. Text Generation Inference (TGI) is a toolkit developed by Hugging Face for deploying and serving LLMs, with high performance text generation. And, Google Vertex AI is a Machine Learning (ML) platform that lets you train and deploy ML models and AI applications, and customize large language models (LLMs) for use in your AI-powered applications. This example showcases how to deploy any supported text-generation model, in this case [`google/gemma-7b-it`](https://huggingface.co/google/gemma-7b-it), from the Hugging Face Hub on Vertex AI using the TGI DLC available in Google Cloud Platform (GCP)." + "# Deploy Gemma 7B with TGI DLC on Vertex AI" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Gemma is a family of lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models, developed by Google DeepMind and other teams across Google. Text Generation Inference (TGI) is a toolkit developed by Hugging Face for deploying and serving LLMs, with high performance text generation. And, Google Vertex AI is a Machine Learning (ML) platform that lets you train and deploy ML models and AI applications, and customize large language models (LLMs) for use in your AI-powered applications.\n", + "\n", + "This example showcases how to deploy any supported text-generation model, in this case [`google/gemma-7b-it`](https://huggingface.co/google/gemma-7b-it), from the Hugging Face Hub on Vertex AI using the TGI DLC available in Google Cloud Platform (GCP)." ] }, { diff --git a/examples/vertex-ai/notebooks/deploy-llama-3-1-405b-on-vertex-ai/vertex-notebook.ipynb b/examples/vertex-ai/notebooks/deploy-llama-3-1-405b-on-vertex-ai/vertex-notebook.ipynb index 9c8f267e..a8cd7502 100644 --- a/examples/vertex-ai/notebooks/deploy-llama-3-1-405b-on-vertex-ai/vertex-notebook.ipynb +++ b/examples/vertex-ai/notebooks/deploy-llama-3-1-405b-on-vertex-ai/vertex-notebook.ipynb @@ -1,11 +1,22 @@ { "cells": [ + { + "cell_type": "markdown", + "id": "d7c432cf-dc16-4bd8-89bd-7c1c0eb58d37", + "metadata": {}, + "source": [ + "<!-- ---\n", + "title: Deploy Meta Llama 3.1 405B with TGI DLC on Vertex AI\n", + "type: inference\n", + "--- -->" + ] + }, { "cell_type": "markdown", "id": "e4e7faed-c34a-4f01-84ec-eefbfb65506d", "metadata": {}, "source": [ - "# Deploy Meta Llama 3.1 405B on Vertex AI with 🤗 Hugging Face DLCs" + "# Deploy Meta Llama 3.1 405B with TGI DLC on Vertex AI" ] }, { @@ -499,7 +510,7 @@ "source": [ "If the Vertex AI Endpoint was deployed in a different session and you want to use it but don't have access to the `deployed_model` variable returned by the `aiplatform.Model.deploy` method as in the previous section; you can also run the following snippet to instantiate the deployed `aiplatform.Endpoint` via its resource name as `projects/{PROJECT_ID}/locations/{LOCATION}/endpoints/{ENDPOINT_ID}`.\n", "\n", - "> Note that you will need to either retrieve the resource name i.e. the `projects/{PROJECT_ID}/locations/{LOCATION}/endpoints/{ENDPOINT_ID}` URL youself via the Google Cloud Console, or just replace the `ENDPOINT_ID` below that can either be found via the previously instantiated `endpoint` as `endpoint.id` or via the Google Cloud Console under the Online predictions where the endpoint is listed." + "> Note that you will need to either retrieve the resource name i.e. the `projects/{PROJECT_ID}/locations/{LOCATION}/endpoints/{ENDPOINT_ID}` URL yourself via the Google Cloud Console, or just replace the `ENDPOINT_ID` below that can either be found via the previously instantiated `endpoint` as `endpoint.id` or via the Google Cloud Console under the Online predictions where the endpoint is listed." ] }, { diff --git a/examples/vertex-ai/notebooks/trl-full-sft-fine-tuning-on-vertex-ai/vertex-notebook.ipynb b/examples/vertex-ai/notebooks/trl-full-sft-fine-tuning-on-vertex-ai/vertex-notebook.ipynb index d46087ed..597878f3 100644 --- a/examples/vertex-ai/notebooks/trl-full-sft-fine-tuning-on-vertex-ai/vertex-notebook.ipynb +++ b/examples/vertex-ai/notebooks/trl-full-sft-fine-tuning-on-vertex-ai/vertex-notebook.ipynb @@ -4,14 +4,26 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Fine-tune LLMs using SFT with TRL's CLI on Vertex AI" + "<!-- ---\n", + "title: Fine-tune Mistral 7B v0.3 with PyTorch Training DLC using SFT on Vertex AI\n", + "type: training\n", + "--- -->" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "[Transformer Reinforcement Learning (TRL)](https://github.com/huggingface/trl) is a framework developed by Hugging Face to fine-tune and align both transformer language and diffusion models using methods such as Supervised Fine-Tuning (SFT), Reward Modeling (RM), Proximal Policy Optimization (PPO), Direct Preference Optimization (DPO), and others. On the other hand, Vertex AI is a Machine Learning (ML) platform that lets you train and deploy ML models and AI applications, and customize large language models (LLMs) for use in your AI-powered applications. This example showcases how to create a custom training job on Vertex AI running the Hugging Face PyTorch DLC for training, using the TRL CLI to full fine-tune a 7B LLM with SFT in a multi-GPU setting." + "# Fine-tune Mistral 7B v0.3 with PyTorch Training DLC using SFT on Vertex AI" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[Transformer Reinforcement Learning (TRL)](https://github.com/huggingface/trl) is a framework developed by Hugging Face to fine-tune and align both transformer language and diffusion models using methods such as Supervised Fine-Tuning (SFT), Reward Modeling (RM), Proximal Policy Optimization (PPO), Direct Preference Optimization (DPO), and others. On the other hand, Vertex AI is a Machine Learning (ML) platform that lets you train and deploy ML models and AI applications, and customize large language models (LLMs) for use in your AI-powered applications.\n", + "\n", + "This example showcases how to create a custom training job on Vertex AI running the Hugging Face PyTorch DLC for training, using the TRL CLI to full fine-tune a 7B LLM with SFT in a multi-GPU setting." ] }, { diff --git a/examples/vertex-ai/notebooks/trl-lora-sft-fine-tuning-on-vertex-ai/vertex-notebook.ipynb b/examples/vertex-ai/notebooks/trl-lora-sft-fine-tuning-on-vertex-ai/vertex-notebook.ipynb index 4c951131..1288fbfb 100644 --- a/examples/vertex-ai/notebooks/trl-lora-sft-fine-tuning-on-vertex-ai/vertex-notebook.ipynb +++ b/examples/vertex-ai/notebooks/trl-lora-sft-fine-tuning-on-vertex-ai/vertex-notebook.ipynb @@ -4,14 +4,26 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Fine-tune LLMs using SFT + LoRA with TRL's CLI on Vertex AI" + "<!-- ---\n", + "title: Fine-tune Gemma 2B with PyTorch Training DLC using SFT + LoRA on Vertex AI\n", + "type: training\n", + "--- -->" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "[Transformer Reinforcement Learning (TRL)](https://github.com/huggingface/trl) is a framework developed by Hugging Face to fine-tune and align both transformer language and diffusion models using methods such as Supervised Fine-Tuning (SFT), Reward Modeling (RM), Proximal Policy Optimization (PPO), Direct Preference Optimization (DPO), and others. On the other hand, Vertex AI is a Machine Learning (ML) platform that lets you train and deploy ML models and AI applications, and customize large language models (LLMs) for use in your AI-powered applications. This example showcases how to create a custom training job on Vertex AI running the Hugging Face PyTorch DLC for training, using the TRL CLI to fine-tune a 7B LLM with SFT + LoRA in a single GPU." + "# Fine-tune Gemma 2B with PyTorch Training DLC using SFT + LoRA on Vertex AI" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[Transformer Reinforcement Learning (TRL)](https://github.com/huggingface/trl) is a framework developed by Hugging Face to fine-tune and align both transformer language and diffusion models using methods such as Supervised Fine-Tuning (SFT), Reward Modeling (RM), Proximal Policy Optimization (PPO), Direct Preference Optimization (DPO), and others. On the other hand, Vertex AI is a Machine Learning (ML) platform that lets you train and deploy ML models and AI applications, and customize large language models (LLMs) for use in your AI-powered applications.\n", + "\n", + "This example showcases how to create a custom training job on Vertex AI running the Hugging Face PyTorch DLC for training, using the TRL CLI to fine-tune a 7B LLM with SFT + LoRA in a single GPU." ] }, {