Skip to content

Commit

Permalink
Update examples/gke to include paths to Kubernetes files (#112)
Browse files Browse the repository at this point in the history
  • Loading branch information
alvarobartt authored Oct 16, 2024
1 parent f042e54 commit 26de5cf
Show file tree
Hide file tree
Showing 9 changed files with 35 additions and 43 deletions.
8 changes: 4 additions & 4 deletions examples/gke/tei-deployment/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,11 +98,11 @@ Now you can proceed to the Kubernetes deployment of the Hugging Face DLC for TEI
> [!NOTE]
> Recently, the Hugging Face Hub team has included the `text-embeddings-inference` tag in the Hub, so feel free to explore all the embedding models in the Hub that can be served via TEI at <https://huggingface.co/models?other=text-embeddings-inference>.
The Hugging Face DLC for TEI will be deployed via `kubectl`, from the configuration files in either the `cpu-config/` or the `gpu-config/` directories depending on whether you want to use the CPU or GPU accelerators, respectively:
The Hugging Face DLC for TEI will be deployed via `kubectl`, from the configuration files in either the [`cpu-config/`](./cpu-config/) or the [`gpu-config/`](./gpu-config/) directories depending on whether you want to use the CPU or GPU accelerators, respectively:

- `deployment.yaml`: contains the deployment details of the pod including the reference to the Hugging Face DLC for TEI setting the `MODEL_ID` to [`Snowflake/snowflake-arctic-embed-m`](https://huggingface.co/Snowflake/snowflake-arctic-embed-m).
- `service.yaml`: contains the service details of the pod, exposing the port 8080 for the TEI service.
- (optional) `ingress.yaml`: contains the ingress details of the pod, exposing the service to the external world so that it can be accessed via the ingress IP.
- [`deployment.yaml`](./cpu-config/deployment.yaml): contains the deployment details of the pod including the reference to the Hugging Face DLC for TEI setting the `MODEL_ID` to [`Snowflake/snowflake-arctic-embed-m`](https://huggingface.co/Snowflake/snowflake-arctic-embed-m).
- [`service.yaml`](./cpu-config/service.yaml): contains the service details of the pod, exposing the port 8080 for the TEI service.
- (optional) [`ingress.yaml`](./cpu-config/ingress.yaml): contains the ingress details of the pod, exposing the service to the external world so that it can be accessed via the ingress IP.

```bash
kubectl apply -f cpu-config/
Expand Down
10 changes: 5 additions & 5 deletions examples/gke/tei-from-gcs-deployment/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,12 +150,12 @@ Now you can proceed to the Kubernetes deployment of the Hugging Face DLC for TEI
> [!NOTE]
> Recently, the Hugging Face Hub team has included the `text-embeddings-inference` tag in the Hub, so feel free to explore all the embedding models in the Hub that can be served via TEI at <https://huggingface.co/models?other=text-embeddings-inference>.
The Hugging Face DLC for TEI will be deployed via `kubectl`, from the configuration files in either the `cpu-config/` or the `gpu-config/` directories depending on whether you want to use the CPU or GPU accelerators, respectively:
The Hugging Face DLC for TEI will be deployed via `kubectl`, from the configuration files in either the [`cpu-config/`](./cpu-config/) or the [`gpu-config/`](./gpu-config/) directories depending on whether you want to use the CPU or GPU accelerators, respectively:

- `deployment.yaml`: contains the deployment details of the pod including the reference to the Hugging Face DLC for TEI setting the `MODEL_ID` to the model path in the volume mount, in this case `/data/bge-base-en-v1.5`.
- `service.yaml`: contains the service details of the pod, exposing the port 80 for the TEI service.
- `storageclass.yaml`: contains the storage class details of the pod, defining the storage class for the volume mount.
- (optional) `ingress.yaml`: contains the ingress details of the pod, exposing the service to the external world so that it can be accessed via the ingress IP.
- [`deployment.yaml`](./cpu-config/deployment.yaml): contains the deployment details of the pod including the reference to the Hugging Face DLC for TEI setting the `MODEL_ID` to the model path in the volume mount, in this case `/data/bge-base-en-v1.5`.
- [`service.yaml`](./cpu-config/service.yaml): contains the service details of the pod, exposing the port 8080 for the TEI service.
- [`storageclass.yaml`](./cpu-config): contains the storage class details of the pod, defining the storage class for the volume mount.
- (optional) [`ingress.yaml`](./cpu-config/ingress.yaml): contains the ingress details of the pod, exposing the service to the external world so that it can be accessed via the ingress IP.

```bash
kubectl apply -f cpu-config/
Expand Down
9 changes: 4 additions & 5 deletions examples/gke/tgi-deployment/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,12 +131,11 @@ Now you can proceed to the Kubernetes deployment of the Hugging Face DLC for TGI
> [!NOTE]
> To explore all the models that can be served via TGI, you can explore the models tagged with `text-generation-inference` in the Hub at <https://huggingface.co/models?other=text-generation-inference>.
The Hugging Face DLC for TGI will be deployed via `kubectl`, from the configuration files in the `config/` directory:
The Hugging Face DLC for TGI will be deployed via `kubectl`, from the configuration files in the [`config/`](./config/) directory:

- `deployment.yaml`: contains the deployment details of the pod including the reference to the Hugging Face DLC for TGI setting the `MODEL_ID` to [`meta-llama/Meta-Llama-3.1-8B-Instruct`](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct).

- `service.yaml`: contains the service details of the pod, exposing the port 80 for the TGI service.
- (optional) `ingress.yaml`: contains the ingress details of the pod, exposing the service to the external world so that it can be accessed via the ingress IP.
- [`deployment.yaml`](./config/deployment.yaml): contains the deployment details of the pod including the reference to the Hugging Face DLC for TGI setting the `MODEL_ID` to [`meta-llama/Meta-Llama-3.1-8B-Instruct`](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct).
- [`service.yaml`](./config/service.yaml): contains the service details of the pod, exposing the port 8080 for the TGI service.
- (optional) [`ingress.yaml`](./config/ingress.yaml): contains the ingress details of the pod, exposing the service to the external world so that it can be accessed via the ingress IP.

```bash
kubectl apply -f config/
Expand Down
10 changes: 5 additions & 5 deletions examples/gke/tgi-from-gcs-deployment/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,12 +147,12 @@ Now you can proceed to the Kubernetes deployment of the Hugging Face DLC for TGI
> [!NOTE]
> To explore all the models that can be served via TGI, you can explore the models tagged with `text-generation-inference` in the Hub at <https://huggingface.co/models?other=text-generation-inference>.
The Hugging Face DLC for TGI will be deployed via `kubectl`, from the configuration files in the `config/` directory:
The Hugging Face DLC for TGI will be deployed via `kubectl`, from the configuration files in the [`config/`](./config/) directory:

- `deployment.yaml`: contains the deployment details of the pod including the reference to the Hugging Face DLC for TGI setting the `MODEL_ID` to the model path in the volume mount, in this case `/data/Qwen2-7B-Instruct`.
- `service.yaml`: contains the service details of the pod, exposing the port 80 for the TEI service.
- `storageclass.yaml`: contains the storage class details of the pod, defining the storage class for the volume mount.
- (optional) `ingress.yaml`: contains the ingress details of the pod, exposing the service to the external world so that it can be accessed via the ingress IP.
- [`deployment.yaml`](./config/deployment.yaml): contains the deployment details of the pod including the reference to the Hugging Face DLC for TGI setting the `MODEL_ID` to the model path in the volume mount, in this case `/data/Qwen2-7B-Instruct`.
- [`service.yaml`](./config/service.yaml): contains the service details of the pod, exposing the port 8080 for the TGI service.
- [`storageclass.yaml`](./config/storageclass.yaml): contains the storage class details of the pod, defining the storage class for the volume mount.
- (optional) [`ingress.yaml`](./config/ingress.yaml): contains the ingress details of the pod, exposing the service to the external world so that it can be accessed via the ingress IP.

```bash
kubectl apply -f config/
Expand Down
12 changes: 5 additions & 7 deletions examples/gke/tgi-llama-405b-deployment/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,13 +141,11 @@ Now you can proceed to the Kubernetes deployment of the Hugging Face DLC for TGI
> [!NOTE]
> To explore all the models that can be served via TGI, you can explore [the models tagged with `text-generation-inference` in the Hub](https://huggingface.co/models?other=text-generation-inference).
The Hugging Face DLC for TGI will be deployed via `kubectl`, from the configuration files in the `config/` directory:
The Hugging Face DLC for TGI will be deployed via `kubectl`, from the configuration files in the [`config/`](./config/) directory:

- `deployment.yaml`: contains the deployment details of the pod including the reference to the Hugging Face DLC for TGI setting the `MODEL_ID` to [`meta-llama/Llama-3.1-405B-Instruct-FP8`](https://hf.co/meta-llama/Llama-3.1-405B-Instruct-FP8).

- `service.yaml`: contains the service details of the pod, exposing the port 8080 for the TGI service.

- (optional) `ingress.yaml`: contains the ingress details of the pod, exposing the service to the external world so that it can be accessed via the ingress IP.
- [`deployment.yaml`](./config/deployment.yaml): contains the deployment details of the pod including the reference to the Hugging Face DLC for TGI setting the `MODEL_ID` to [`meta-llama/Llama-3.1-405B-Instruct-FP8`](https://hf.co/meta-llama/Llama-3.1-405B-Instruct-FP8).
- [`service.yaml`](./config/service.yaml): contains the service details of the pod, exposing the port 8080 for the TGI service.
- (optional) [`ingress.yaml`](./config/ingress.yaml): contains the ingress details of the pod, exposing the service to the external world so that it can be accessed via the ingress IP.

```bash
kubectl apply -f config/
Expand All @@ -163,7 +161,7 @@ kubectl apply -f config/
> Alternatively, you can just wait for the deployment to be ready with the following command:
>
> ```bash
> kubectl wait --for=condition=Available --timeout=700s deployment/tei-deployment
> kubectl wait --for=condition=Available --timeout=700s deployment/tgi-deployment
> ```
![GKE Deployment in the GCP Console](./imgs/gke-deployment.png)
Expand Down
10 changes: 4 additions & 6 deletions examples/gke/tgi-llama-vision-deployment/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,13 +136,11 @@ Now you can proceed to the Kubernetes deployment of the Hugging Face DLC for TGI
> [!NOTE]
> To explore all the models that can be served via TGI, you can explore [the models tagged with `text-generation-inference` in the Hub](https://huggingface.co/models?other=text-generation-inference); specifically, if you are interested in Vision Language Models (VLMs) you can explore [the models tagged with both `text-generation-inference` and `image-text-to-text` in the Hub](https://huggingface.co/models?pipeline_tag=image-text-to-text&other=text-generation-inference&sort=trending).
The Hugging Face DLC for TGI will be deployed via `kubectl`, from the configuration files in the `config/` directory:
The Hugging Face DLC for TGI will be deployed via `kubectl`, from the configuration files in the [`config/`](./config/) directory:

- `deployment.yaml`: contains the deployment details of the pod including the reference to the Hugging Face DLC for TGI setting the `MODEL_ID` to [`meta-llama/Llama-3.2-11B-Vision-Instruct`](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct). As the GKE Cluster was deployed in Autopilot mode, the specified resources i.e. 2 x L4s, will be automatically allocated; but if you used the Standard mode instead, you should make sure that your node pool has those GPUs available.

- `service.yaml`: contains the service details of the pod, exposing the port 8080 for the TGI service.

- (optional) `ingress.yaml`: contains the ingress details of the pod, exposing the service to the external world so that it can be accessed via the ingress IP.
- [`deployment.yaml`](./config/deployment.yaml): contains the deployment details of the pod including the reference to the Hugging Face DLC for TGI setting the `MODEL_ID` to [`meta-llama/Llama-3.2-11B-Vision-Instruct`](https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct). As the GKE Cluster was deployed in Autopilot mode, the specified resources i.e. 2 x L4s, will be automatically allocated; but if you used the Standard mode instead, you should make sure that your node pool has those GPUs available.
- [`service.yaml`](./config/service.yaml): contains the service details of the pod, exposing the port 8080 for the TGI service.
- (optional) [`ingress.yaml`](./config/ingress.yaml): contains the ingress details of the pod, exposing the service to the external world so that it can be accessed via the ingress IP.

```bash
kubectl apply -f config/
Expand Down
11 changes: 4 additions & 7 deletions examples/gke/tgi-multi-lora-deployment/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,17 +130,14 @@ Now you can proceed to the Kubernetes deployment of the Hugging Face DLC for TGI
> [!NOTE]
> To explore all the models that can be served via TGI, you can explore [the models tagged with `text-generation-inference` in the Hub](https://huggingface.co/models?other=text-generation-inference).
The Hugging Face DLC for TGI will be deployed via `kubectl`, from the configuration files in the `config/` directory:

- `deployment.yaml`: contains the deployment details of the pod including the reference to the Hugging Face DLC for TGI setting the `MODEL_ID` to [`google/gemma-2-2b-it`](https://huggingface.co/google/gemma-2-2b-it), and the `LORA_ADAPTERS` to `google-cloud-partnership/gemma-2-2b-it-lora-magicoder,google-cloud-partnership/gemma-2-2b-it-lora-sql`, being the following adapters:
The Hugging Face DLC for TGI will be deployed via `kubectl`, from the configuration files in the [`config/`](./config/) directory:

- [`deployment.yaml`](./config/deployment.yaml): contains the deployment details of the pod including the reference to the Hugging Face DLC for TGI setting the `MODEL_ID` to [`google/gemma-2-2b-it`](https://huggingface.co/google/gemma-2-2b-it), and the `LORA_ADAPTERS` to `google-cloud-partnership/gemma-2-2b-it-lora-magicoder,google-cloud-partnership/gemma-2-2b-it-lora-sql`, being the following adapters:
- [`google-cloud-partnership/gemma-2-2b-it-lora-sql`](https://huggingface.co/google-cloud-partnership/gemma-2-2b-it-lora-sql): fine-tuned with [`gretelai/synthetic_text_to_sql`](https://huggingface.co/datasets/gretelai/synthetic_text_to_sql) to generate SQL queries with an explanation, given an SQL context and a prompt / question about it.
- [`google-cloud-partnership/gemma-2-2b-it-lora-magicoder`](https://huggingface.co/google-cloud-partnership/gemma-2-2b-it-lora-magicoder): fine-tuned with [`ise-uiuc/Magicoder-OSS-Instruct-75K`](https://huggingface.co/datasets/ise-uiuc/Magicoder-OSS-Instruct-75K) to generate code in diverse programming languages such as Python, Rust, or C, among many others; based on an input problem.
- [`google-cloud-partnership/gemma-2-2b-it-lora-jap-en`](https://huggingface.co/google-cloud-partnership/gemma-2-2b-it-lora-jap-en): fine-tuned with [`Jofthomas/japanese-english-translation`](https://huggingface.co/datasets/Jofthomas/japanese-english-translation), a synthetically generated dataset of short Japanese sentences translated to English; to translate English to Japanese and the other way around.

- `service.yaml`: contains the service details of the pod, exposing the port 8080 for the TGI service.

- (optional) `ingress.yaml`: contains the ingress details of the pod, exposing the service to the external world so that it can be accessed via the ingress IP.
- [`service.yaml`](./config/service.yaml): contains the service details of the pod, exposing the port 8080 for the TGI service.
- (optional) [`ingress.yaml`](./config/ingress.yaml): contains the ingress details of the pod, exposing the service to the external world so that it can be accessed via the ingress IP.

> [!WARNING]
> Note that the selected LoRA adapters are not intended to be used on production environments, as the fine-tuned adapters have not been tested extensively.
Expand Down
4 changes: 2 additions & 2 deletions examples/gke/trl-full-fine-tuning/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,7 @@ Alternatively, if your model is uploaded to the Hugging Face Hub, you can check

## Run Job

Now you can already run the Kubernetes job in the Hugging Face PyTorch DLC for training on the GKE Cluster via `kubectl` from the `job.yaml` configuration file, that contains the job specification for running the command `trl sft` provided by the TRL CLI for the SFT full fine-tuning of [`google/gemma-2b`](https://huggingface.co/google/gemma-2b) in `bfloat16` using [`timdettmers/openassistant-guanaco`](https://huggingface.co/datasets/timdettmers/openassistant-guanaco), which is a subset from [`OpenAssistant/oasst1`](https://huggingface.co/datasets/OpenAssistant/oasst1) with ~10k samples in 4 x A100 40GiB GPUs, storing the generated artifacts into a volume mount under `/data` linked to a GCS Bucket.
Now you can already run the Kubernetes job in the Hugging Face PyTorch DLC for training on the GKE Cluster via `kubectl` from the [`job.yaml`](./job.yaml) configuration file, that contains the job specification for running the command `trl sft` provided by the TRL CLI for the SFT full fine-tuning of [`google/gemma-2b`](https://huggingface.co/google/gemma-2b) in `bfloat16` using [`timdettmers/openassistant-guanaco`](https://huggingface.co/datasets/timdettmers/openassistant-guanaco), which is a subset from [`OpenAssistant/oasst1`](https://huggingface.co/datasets/OpenAssistant/oasst1) with ~10k samples in 4 x A100 40GiB GPUs, storing the generated artifacts into a volume mount under `/data` linked to a GCS Bucket.

```bash
kubectl apply -f job.yaml
Expand All @@ -174,7 +174,7 @@ kubectl apply -f job.yaml
![GKE Job Running in the GCP Console](./imgs/gke-job-running.png)

> [!NOTE]
> In this case, since you are running a batch job, it will only use one node as specified within the `job.yaml` file, since you don't need anything else than that. So on, the job will deploy one pod running the `trl sft` command on top of the Hugging Face PyTorch DLC container for training, and also the GCS FUSE container that is mounting the GCS Bucket into the `/data` path so as to store the generated artifacts in GCS. Once the job is completed, it will automatically scale back to 0, meaning that it will not consume resources.
> In this case, since you are running a batch job, it will only use one node as specified within the [`job.yaml`](./job.yaml) file, since you don't need anything else than that. So on, the job will deploy one pod running the `trl sft` command on top of the Hugging Face PyTorch DLC container for training, and also the GCS FUSE container that is mounting the GCS Bucket into the `/data` path so as to store the generated artifacts in GCS. Once the job is completed, it will automatically scale back to 0, meaning that it will not consume resources.
Additionally, you can use `kubectl` to stream the logs of the job as it follows:

Expand Down
Loading

0 comments on commit 26de5cf

Please sign in to comment.