Add metadata to every example under examples

huggingface · Sep 18, 2024 · 0fc35ea · 0fc35ea
1 parent 6fc2c88
commit 0fc35ea
Show file tree

Hide file tree

Showing 15 changed files with 113 additions and 0 deletions.
diff --git a/examples/cloud-run/tgi-deployment/README.md b/examples/cloud-run/tgi-deployment/README.md
@@ -1,3 +1,8 @@
+---
+title: Deploy Meta Llama 3.1 8B with Text Generation Inference on Cloud Run
+type: inference
+---
+
 # Deploy Meta Llama 3.1 8B with Text Generation Inference on Cloud Run
 
 Meta Llama 3.1 is the latest open LLM from Meta, released in July 2024. Meta Llama 3.1 comes in three sizes: 8B for efficient deployment and development on consumer-size GPU, 70B for large-scale AI native applications, and 405B for synthetic data, LLM as a Judge or distillation; among other use cases. Text Generation Inference (TGI) is a toolkit developed by Hugging Face for deploying and serving LLMs, with high performance text generation. Google Cloud Run is a serverless container platform that allows developers to deploy and manage containerized applications without managing infrastructure, enabling automatic scaling and billing only for usage. This example showcases how to deploy an LLM from the Hugging Face Hub, in this case Meta Llama 3.1 8B Instruct model quantized to INT4 using AWQ, with the Hugging Face DLC for TGI on Google Cloud Run with GPU support (on preview).

diff --git a/examples/gke/tei-deployment/README.md b/examples/gke/tei-deployment/README.md
@@ -1,3 +1,8 @@
+---
+title: Deploy Snowflake's Arctic Embed (M) with Text Embeddings Inference (TEI) on GKE
+type: inference
+---
+
 # Deploy Snowflake's Arctic Embed (M) with Text Embeddings Inference (TEI) on GKE
 
 Snowflake's Arctic Embed is a suite of text embedding models that focuses on creating high-quality retrieval models optimized for performance, achieving state-of-the-art (SOTA) performance on the MTEB/BEIR leaderboard for each of their size variants. Text Embeddings Inference (TEI) is a toolkit developed by Hugging Face for deploying and serving open source text embeddings and sequence classification models; enabling high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5. And, Google Kubernetes Engine (GKE) is a fully-managed Kubernetes service in Google Cloud that can be used to deploy and operate containerized applications at scale using GCP's infrastructure.

diff --git a/examples/gke/tei-from-gcs-deployment/README.md b/examples/gke/tei-from-gcs-deployment/README.md
@@ -1,3 +1,8 @@
+---
+title: Deploy BGE Base v1.5 (English) with Text Embeddings Inference (TEI) from a GCS Bucket on GKE
+type: inference
+---
+
 # Deploy BGE Base v1.5 (English) with Text Embeddings Inference (TEI) from a GCS Bucket on GKE
 
 BGE, standing for BAAI General Embedding, is a collection of embedding models released by BAAI, which is an English base model for general embedding tasks ranked in the MTEB Leaderboard. Text Embeddings Inference (TEI) is a toolkit developed by Hugging Face for deploying and serving open source text embeddings and sequence classification models; enabling high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5. And, Google Kubernetes Engine (GKE) is a fully-managed Kubernetes service in Google Cloud that can be used to deploy and operate containerized applications at scale using GCP's infrastructure.

diff --git a/examples/gke/tgi-deployment/README.md b/examples/gke/tgi-deployment/README.md
@@ -1,3 +1,8 @@
+---
+title: Deploy Meta Llama 3 8B with Text Generation Inference (TGI) on GKE
+type: inference
+---
+
 # Deploy Meta Llama 3 8B with Text Generation Inference (TGI) on GKE
 
 Meta Llama 3 is the latest LLM from the Llama family, released by Meta; coming in two sizes 8B and 70B, including both the base model and the instruction-tuned model. Text Generation Inference (TGI) is a toolkit developed by Hugging Face for deploying and serving LLMs, with high performance text generation. And, Google Kubernetes Engine (GKE) is a fully-managed Kubernetes service in Google Cloud that can be used to deploy and operate containerized applications at scale using GCP's infrastructure. This post explains how to deploy an LLM from the Hugging Face Hub, as Llama3 8B Instruct, on a GKE Cluster running a purpose-built container to deploy LLMs in a secure and managed environment with the Hugging Face DLC for TGI.

diff --git a/examples/gke/tgi-from-gcs-deployment/README.md b/examples/gke/tgi-from-gcs-deployment/README.md
@@ -1,3 +1,8 @@
+---
+title: Deploy Qwen2 7B Instruct with Text Generation Inference (TGI) from a GCS Bucket on GKE
+type: inference
+---
+
 # Deploy Qwen2 7B Instruct with Text Generation Inference (TGI) from a GCS Bucket on GKE
 
 Qwen2 is the new series of Qwen Large Language Models (LLMs) built by Alibaba Cloud, with both base and instruction-tuned language models ranging from 0.5 to 72 billion parameters, including a Mixture-of-Experts model; the 7B variant sitting in the second place in the 7B size range in the Open LLM Leaderboard by Hugging Face and the 72B one in the first place amongst any size. Text Generation Inference (TGI) is a toolkit developed by Hugging Face for deploying and serving LLMs, with high performance text generation. And, Google Kubernetes Engine (GKE) is a fully-managed Kubernetes service in Google Cloud that can be used to deploy and operate containerized applications at scale using GCP's infrastructure. This post explains how to deploy an LLM from a Google Cloud Storage (GCS) Bucket on a GKE Cluster running a purpose-built container to deploy LLMs in a secure and managed environment with the Hugging Face DLC for TGI.

diff --git a/examples/gke/trl-full-fine-tuning/README.md b/examples/gke/trl-full-fine-tuning/README.md
@@ -1,3 +1,8 @@
+---
+title: Fine-tune Gemma 2B with TRL on GKE
+type: training
+---
+
 # Fine-tune Gemma 2B with TRL on GKE
 
 Gemma is a family of lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models, developed by Google DeepMind and other teams across Google. TRL is a full stack library to fine-tune and align Large Language Models (LLMs) developed by Hugging Face. And, Google Kubernetes Engine (GKE) is a fully-managed Kubernetes service in Google Cloud that can be used to deploy and operate containerized applications at scale using GCP's infrastructure. This post explains how to full fine-tune Gemma 2B with TRL via Supervised Fine-Tuning (SFT) in a multi-GPU setting on a GKE Cluster.

diff --git a/examples/gke/trl-lora-fine-tuning/README.md b/examples/gke/trl-lora-fine-tuning/README.md
@@ -1,3 +1,8 @@
+---
+title: Fine-tune Mistral 7B v0.3 with TRL on GKE
+type: training
+---
+
 # Fine-tune Mistral 7B v0.3 with TRL on GKE
 
 Mistral is a family of models with varying sizes, created by the Mistral AI team; the Mistral 7B v0.3 LLM is a Mistral 7B v0.2 with extended vocabulary. TRL is a full stack library to fine-tune and align Large Language Models (LLMs) developed by Hugging Face. And, Google Kubernetes Engine (GKE) is a fully-managed Kubernetes service in Google Cloud that can be used to deploy and operate containerized applications at scale using GCP's infrastructure. This post explains how to fine-tune Mistral 7B v0.3 with TRL via Supervised Fine-Tuning (SFT) and Low-Rank Adaptation (LoRA) in a single GPU on a GKE Cluster.

diff --git a/examples/vertex-ai/notebooks/deploy-bert-on-vertex-ai/vertex-notebook.ipynb b/examples/vertex-ai/notebooks/deploy-bert-on-vertex-ai/vertex-notebook.ipynb
@@ -1,5 +1,12 @@
 {
  "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<!-- ---\ntitle: Deploy BERT Models with PyTorch Inference on Vertex AI\ntype: inference\n--- -->"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},

diff --git a/examples/vertex-ai/notebooks/deploy-embedding-on-vertex-ai/vertex-notebook.ipynb b/examples/vertex-ai/notebooks/deploy-embedding-on-vertex-ai/vertex-notebook.ipynb
@@ -1,5 +1,15 @@
 {
  "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<!-- ---\n",
+    "title: Deploy Embedding Models with TEI on Vertex AI\n",
+    "type: inference\n",
+    "--- -->"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},

diff --git a/examples/vertex-ai/notebooks/deploy-flux-on-vertex-ai/vertex-notebook.ipynb b/examples/vertex-ai/notebooks/deploy-flux-on-vertex-ai/vertex-notebook.ipynb
@@ -1,5 +1,15 @@
 {
  "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<!-- ---\n",
+    "title: Deploy FLUX with Hugging Face PyTorch DLCs for Inference on Vertex AI\n",
+    "type: inference\n",
+    "--- -->"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},

diff --git a/examples/vertex-ai/notebooks/deploy-gemma-from-gcs-on-vertex-ai/vertex-notebook.ipynb b/examples/vertex-ai/notebooks/deploy-gemma-from-gcs-on-vertex-ai/vertex-notebook.ipynb
@@ -1,5 +1,15 @@
 {
  "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<!-- ---\n",
+    "title: Deploy Gemma 7B from GCS with TGI on Vertex AI\n",
+    "type: inference\n",
+    "--- -->"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},

diff --git a/examples/vertex-ai/notebooks/deploy-gemma-on-vertex-ai/vertex-notebook.ipynb b/examples/vertex-ai/notebooks/deploy-gemma-on-vertex-ai/vertex-notebook.ipynb
@@ -1,5 +1,15 @@
 {
  "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<!-- ---\n",
+    "title: Deploy Gemma 7B with TGI on Vertex AI\n",
+    "type: inference\n",
+    "--- -->"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},

diff --git a/examples/vertex-ai/notebooks/deploy-llama-3-1-405b-on-vertex-ai/vertex-notebook.ipynb b/examples/vertex-ai/notebooks/deploy-llama-3-1-405b-on-vertex-ai/vertex-notebook.ipynb
@@ -1,5 +1,16 @@
 {
  "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "d7c432cf-dc16-4bd8-89bd-7c1c0eb58d37",
+   "metadata": {},
+   "source": [
+    "<!-- ---\n",
+    "title: Deploy Meta Llama 3.1 405B on Vertex AI with Hugging Face DLCs\n",
+    "type: inference\n",
+    "--- -->"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "e4e7faed-c34a-4f01-84ec-eefbfb65506d",

diff --git a/examples/vertex-ai/notebooks/trl-full-sft-fine-tuning-on-vertex-ai/vertex-notebook.ipynb b/examples/vertex-ai/notebooks/trl-full-sft-fine-tuning-on-vertex-ai/vertex-notebook.ipynb
@@ -1,5 +1,15 @@
 {
  "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<!-- ---\n",
+    "title: Fine-tune LLMs using SFT with TRL's CLI on Vertex AI\n",
+    "type: training\n",
+    "--- -->"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},

diff --git a/examples/vertex-ai/notebooks/trl-lora-sft-fine-tuning-on-vertex-ai/vertex-notebook.ipynb b/examples/vertex-ai/notebooks/trl-lora-sft-fine-tuning-on-vertex-ai/vertex-notebook.ipynb
@@ -1,5 +1,15 @@
 {
  "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "<!-- ---\n",
+    "title: Fine-tune LLMs using SFT + LoRA with TRL's CLI on Vertex AI\n",
+    "type: training\n",
+    "--- -->"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},