huggingface · alvarobartt · Sep 5, 2024 · Aug 12, 2024 · Aug 13, 2024 · Aug 13, 2024
diff --git a/.github/workflows/doc-build.yml b/.github/workflows/doc-build.yml
@@ -0,0 +1,21 @@
+name: Build Documentation
+
+on:
+  push:
+    branches:
+      - main
+      - doc-builder*
+    paths:
+      - docs/source/**
+      - .github/workflows/doc-build.yml
+
+jobs:
+   build:
+    uses: huggingface/doc-builder/.github/workflows/build_main_documentation.yml@main
+    with:
+      commit_sha: ${{ github.sha }}
+      package: Google-Cloud-Containers
+      additional_args: --not_python_module
+    secrets:
+      token: ${{ secrets.HUGGINGFACE_PUSH }}
+      hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}
diff --git a/.github/workflows/doc-pr-build.yml b/.github/workflows/doc-pr-build.yml
@@ -0,0 +1,20 @@
+name: Build PR Documentation
+
+on:
+  pull_request:
+    paths:
+      - docs/source/**
+      - .github/workflows/doc-pr-build.yml
+
+concurrency:
+  group: ${{ github.workflow }}-${{ github.head_ref || github.run_id }}
+  cancel-in-progress: true
+
+jobs:
+  build:
+    uses: huggingface/doc-builder/.github/workflows/build_pr_documentation.yml@main
+    with:
+      commit_sha: ${{ github.event.pull_request.head.sha }}
+      pr_number: ${{ github.event.number }}
+      package: Google-Cloud-Containers
+      additional_args: --not_python_module
diff --git a/.github/workflows/doc-pr-upload.yml b/.github/workflows/doc-pr-upload.yml
@@ -0,0 +1,16 @@
+name: Upload PR Documentation
+
+on:
+  workflow_run:
+    workflows: ["Build PR Documentation"]
+    types:
+      - completed
+
+jobs:
+  build:
+    uses: huggingface/doc-builder/.github/workflows/upload_pr_documentation.yml@main
+    with:
+      package_name: Google-Cloud-Containers
+    secrets:
+      hf_token: ${{ secrets.HF_DOC_BUILD_PUSH }}
+      comment_bot_token: ${{ secrets.COMMENT_BOT_TOKEN }}
diff --git a/docs/source/_toctree.yml b/docs/source/_toctree.yml
@@ -0,0 +1,4 @@
+- sections:
+  - local: index
+    title: 🤗 DLCs for Google Cloud
+  title: Getting Started
diff --git a/docs/source/index.mdx b/docs/source/index.mdx
@@ -0,0 +1,79 @@
+# Hugging Face on Google Cloud
+
+![Hugging Face x Google Cloud](https://raw.githubusercontent.com/huggingface/blog/main/assets/173_gcp-partnership/thumbnail.jpg)
+Hugging Face collaborates with Google across open science, open source, cloud, and hardware to enable companies to build their own AI with the latest open models from Hugging Face and the latest cloud and hardware features from Google Cloud.
+
+Hugging Face enables new experiences for Google Cloud customers. They can easily train and deploy Hugging Face models within Google Kubernetes Engine (GKE) and Vertex AI, on any hardware available in Google Cloud using Hugging Face Deep Learning Containers.
+
+## Train and Deploy Models on Google Cloud with Hugging Face Deep Learning Containers
+
+Hugging Face built Deep Learning Containers (DLCs) for Google Cloud customers to run any of their machine learning workload in an optimized environment, with no configuration or maintenance on their part. These are Docker images pre-installed with deep learning frameworks and libraries such as 🤗 Transformers, 🤗 Datasets, and 🤗 Tokenizers. The DLCs allow you to directly serve and train any models, skipping the complicated process of building and optimizing your serving and training environments from scratch. 
+
+For training, our DLCs are available for PyTorch via 🤗 Transformers. They include support for training on both GPUs and TPUs with libraries such as 🤗 TRL, Sentence Transformers, or 🧨 Diffusers. 
+
+For inference, we have a general-purpose PyTorch inference DLC, for serving models trained with any of those frameworks mentioned before on both CPU and GPU. There is also the Text Generation Inference (TGI) DLC for high-performance text generation of LLMs on both GPU and TPU. Finally, there is a Text Embeddings Inference (TEI) DLC for high-performance serving of embedding models on both CPU and GPU. 
+
+The DLCs are hosted in [Google Cloud Artifact Registry](https://console.cloud.google.com/artifacts/docker/deeplearning-platform-release/us/gcr.io) and can be used from any Google Cloud service such as Google Kubernetes Engine (GKE), Vertex AI, or Google Compute Engine.
+
+Hugging Face DLCs are open source and licensed under Apache 2.0 within the [Google-Cloud-Containers](https://github.com/huggingface/Google-Cloud-Containers) repository. For premium support, our [Expert Support Program](https://huggingface.co/support) gives you direct dedicated support from our team.
+You have two options to take advantage of these DLCs as a Google Cloud customer: 
+
+1. To [get started](https://huggingface.co/blog/google-cloud-model-garden), you can use our no-code integrations within Vertex AI or GKE. 
+2. For more advanced scenarios, you can pull the containers from the Google Cloud Artifact Registry directly in your environment. [Here](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples) is a list of notebooks examples.
+
+## Features & benefits 🔥
+
+The Hugging Face DLCs provide ready-to-use, tested environments to train and deploy Hugging Face models. They can be used in combination with Google Cloud offerings including Google Kubernetes Engine (GKE) and Vertex AI. GKE is a fully-managed Kubernetes service in Google Cloud that can be used to deploy and operate containerized applications at scale using Google Cloud's infrastructure. Vertex AI is a Machine Learning (ML) platform that lets you train and deploy ML models and AI applications, and customize Large Language Models (LLMs). 
+
+Using Hugging Face DLCs in GKE and Vertex AI offer the following advantages:
+
+- **Integrated Deployment and Management**: Vertex AI and GKE provide a seamless integration for deploying and managing machine learning models. This allows users to deploy models from Vertex AI onto GKE, leveraging Kubernetes' orchestration capabilities while benefiting from Vertex AI's model management features. This integration simplifies the deployment process, enabling a unified approach to managing AI workloads on Google Cloud, whilst both training and inference can also happen within Vertex AI or GKE separately, as both support model training either natively or via a Kubernetes job, respectively; as well as running an endpoint serving a model from a container.
+
+- **Security and Compliance**: Both Vertex AI and GKE are part of Google Cloud's robust infrastructure, which offers comprehensive security features such as data encryption, identity management, and compliance with industry standards. This ensures that AI models and containerized applications are secure and meet regulatory requirements.
+
+- **Scalability and Flexibility**: The combination of Vertex AI and GKE allows for efficient scaling of AI models. GKE provides the infrastructure to handle large-scale deployments, while Vertex AI simplifies the scaling of machine learning workflows, offering flexible resource allocation and management. This is particularly useful for handling demanding training and inference scenarios.
+
+- **Ease of Use and Automation**: Using Vertex AI with GKE enables automated deployments and management of AI models, reducing the need for manual infrastructure management. This integration allows for quick deployment of models at scale, simplifying the process and reducing operational overhead.
+
+- **Optimized Performance**: Both platforms are optimized for performance, with GKE offering efficient resource management and Vertex AI providing tools for model tuning and monitoring. This ensures that AI applications run smoothly and efficiently, leveraging Google Cloud's infrastructure capabilities. Hugging Face offers Deep Learning Containers optimized for Cloud TPU for optimized cost performance.
+
+For more detailed information, you can refer to the official Google Cloud documentation on Vertex AI and Google Kubernetes Engine.
+
+---
+
+## Resources, Documentation & Examples 📄
+
+Learn how to use Hugging Face in Google Cloud by reading our blog posts, documentation and examples below.
+
+### Blog posts
+
+- [Hugging Face and Google partner for open AI collaboration](https://huggingface.co/blog/gcp-partnership)
+- [Google Cloud TPUs made available to Hugging Face users](https://huggingface.co/blog/tpu-inference-endpoints-spaces)
+- [Making thousands of open LLMs bloom in the Vertex AI Model Garden](https://huggingface.co/blog/google-cloud-model-garden)
+
+### Documentation
+
+- [Serve Gemma open models using GPUs on GKE with Hugging Face TGI](https://cloud.google.com/kubernetes-engine/docs/tutorials/serve-gemma-gpu-tgi)
+- [Generative AI on Vertex - Use Hugging Face text generation models](https://cloud.google.com/vertex-ai/generative-ai/docs/open-models/use-hugging-face-models)
+
+### Examples
+
+- [All examples](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples)
+
+#### GKE
+
+- [Full SFT fine-tuning of Gemma 2B in a multi-GPU instance with TRL in GKE](https://github.com/huggingface/Google-Cloud-Containers/blob/main/examples/gke/trl-full-fine-tuning)
+- [LoRA SFT fine-tuning of Mistral 7B v0.3 in a single GPU instance with TRL in GKE](https://github.com/huggingface/Google-Cloud-Containers/blob/main/examples/gke/trl-lora-fine-tuning)
+- [Deploying Llama3 8B with Text Generation Inference (TGI) in GKE](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/gke/tgi-deployment)
+- [Deploying Qwen2 7B Instruct with Text Generation Inference (TGI) from a GCS Bucket in GKE](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/gke/tgi-from-gcs-deployment)
+- [Deploying Snowflake's Arctic Embed (M) with Text Embeddings Inference (TEI) in GKE](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/gke/tei-deployment)
+- [Deploying BGE Base v1.5 (English) with Text Embeddings Inference (TEI) from a GCS Bucket in GKE](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/gke/tei-from-gcs-deployment)
+
+#### Vertex AI
+
+- [Full SFT fine-tuning of Mistral 7B v0.3 in a multi-GPU instance with TRL on Vertex AI](https://github.com/huggingface/Google-Cloud-Containers/blob/main/examples/vertex-ai/notebooks/trl-full-sft-fine-tuning-on-vertex-ai)
+- [LoRA SFT fine-tuning of Mistral 7B v0.3 in a single GPU instance with TRL on Vertex AI](https://github.com/huggingface/Google-Cloud-Containers/blob/main/examples/vertex-ai/notebooks/trl-lora-sft-fine-tuning-on-vertex-ai)
+- [Deploying a BERT model for a text classification task using huggingface-inference-toolkit for a Custom Prediction Routine (CPR) on Vertex AI](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/vertex-ai/notebooks/deploy-bert-on-vertex-ai)
+- [Deploying an embedding model with Text Embeddings Inference (TEI) on Vertex AI](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/vertex-ai/notebooks/deploy-embedding-on-vertex-ai)
+- [Deploying Gemma 7B Instruct with Text Generation Inference (TGI) on Vertex AI](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/vertex-ai/notebooks/deploy-gemma-on-vertex-ai)
+- [Deploying Gemma 7B Instruct with Text Generation Inference (TGI) from a GCS Bucket on Vertex AI](https://github.com/huggingface/Google-Cloud-Containers/tree/main/examples/vertex-ai/notebooks/deploy-gemma-from-gcs-on-vertex-ai)