Skip to content

Commit

Permalink
Add metadata to every example under examples
Browse files Browse the repository at this point in the history
  • Loading branch information
alvarobartt committed Sep 18, 2024
1 parent 6fc2c88 commit 0fc35ea
Show file tree
Hide file tree
Showing 15 changed files with 113 additions and 0 deletions.
5 changes: 5 additions & 0 deletions examples/cloud-run/tgi-deployment/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
---
title: Deploy Meta Llama 3.1 8B with Text Generation Inference on Cloud Run
type: inference
---

# Deploy Meta Llama 3.1 8B with Text Generation Inference on Cloud Run

Meta Llama 3.1 is the latest open LLM from Meta, released in July 2024. Meta Llama 3.1 comes in three sizes: 8B for efficient deployment and development on consumer-size GPU, 70B for large-scale AI native applications, and 405B for synthetic data, LLM as a Judge or distillation; among other use cases. Text Generation Inference (TGI) is a toolkit developed by Hugging Face for deploying and serving LLMs, with high performance text generation. Google Cloud Run is a serverless container platform that allows developers to deploy and manage containerized applications without managing infrastructure, enabling automatic scaling and billing only for usage. This example showcases how to deploy an LLM from the Hugging Face Hub, in this case Meta Llama 3.1 8B Instruct model quantized to INT4 using AWQ, with the Hugging Face DLC for TGI on Google Cloud Run with GPU support (on preview).
Expand Down
5 changes: 5 additions & 0 deletions examples/gke/tei-deployment/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
---
title: Deploy Snowflake's Arctic Embed (M) with Text Embeddings Inference (TEI) on GKE
type: inference
---

# Deploy Snowflake's Arctic Embed (M) with Text Embeddings Inference (TEI) on GKE

Snowflake's Arctic Embed is a suite of text embedding models that focuses on creating high-quality retrieval models optimized for performance, achieving state-of-the-art (SOTA) performance on the MTEB/BEIR leaderboard for each of their size variants. Text Embeddings Inference (TEI) is a toolkit developed by Hugging Face for deploying and serving open source text embeddings and sequence classification models; enabling high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5. And, Google Kubernetes Engine (GKE) is a fully-managed Kubernetes service in Google Cloud that can be used to deploy and operate containerized applications at scale using GCP's infrastructure.
Expand Down
5 changes: 5 additions & 0 deletions examples/gke/tei-from-gcs-deployment/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
---
title: Deploy BGE Base v1.5 (English) with Text Embeddings Inference (TEI) from a GCS Bucket on GKE
type: inference
---

# Deploy BGE Base v1.5 (English) with Text Embeddings Inference (TEI) from a GCS Bucket on GKE

BGE, standing for BAAI General Embedding, is a collection of embedding models released by BAAI, which is an English base model for general embedding tasks ranked in the MTEB Leaderboard. Text Embeddings Inference (TEI) is a toolkit developed by Hugging Face for deploying and serving open source text embeddings and sequence classification models; enabling high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5. And, Google Kubernetes Engine (GKE) is a fully-managed Kubernetes service in Google Cloud that can be used to deploy and operate containerized applications at scale using GCP's infrastructure.
Expand Down
5 changes: 5 additions & 0 deletions examples/gke/tgi-deployment/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
---
title: Deploy Meta Llama 3 8B with Text Generation Inference (TGI) on GKE
type: inference
---

# Deploy Meta Llama 3 8B with Text Generation Inference (TGI) on GKE

Meta Llama 3 is the latest LLM from the Llama family, released by Meta; coming in two sizes 8B and 70B, including both the base model and the instruction-tuned model. Text Generation Inference (TGI) is a toolkit developed by Hugging Face for deploying and serving LLMs, with high performance text generation. And, Google Kubernetes Engine (GKE) is a fully-managed Kubernetes service in Google Cloud that can be used to deploy and operate containerized applications at scale using GCP's infrastructure. This post explains how to deploy an LLM from the Hugging Face Hub, as Llama3 8B Instruct, on a GKE Cluster running a purpose-built container to deploy LLMs in a secure and managed environment with the Hugging Face DLC for TGI.
Expand Down
5 changes: 5 additions & 0 deletions examples/gke/tgi-from-gcs-deployment/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
---
title: Deploy Qwen2 7B Instruct with Text Generation Inference (TGI) from a GCS Bucket on GKE
type: inference
---

# Deploy Qwen2 7B Instruct with Text Generation Inference (TGI) from a GCS Bucket on GKE

Qwen2 is the new series of Qwen Large Language Models (LLMs) built by Alibaba Cloud, with both base and instruction-tuned language models ranging from 0.5 to 72 billion parameters, including a Mixture-of-Experts model; the 7B variant sitting in the second place in the 7B size range in the Open LLM Leaderboard by Hugging Face and the 72B one in the first place amongst any size. Text Generation Inference (TGI) is a toolkit developed by Hugging Face for deploying and serving LLMs, with high performance text generation. And, Google Kubernetes Engine (GKE) is a fully-managed Kubernetes service in Google Cloud that can be used to deploy and operate containerized applications at scale using GCP's infrastructure. This post explains how to deploy an LLM from a Google Cloud Storage (GCS) Bucket on a GKE Cluster running a purpose-built container to deploy LLMs in a secure and managed environment with the Hugging Face DLC for TGI.
Expand Down
5 changes: 5 additions & 0 deletions examples/gke/trl-full-fine-tuning/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
---
title: Fine-tune Gemma 2B with TRL on GKE
type: training
---

# Fine-tune Gemma 2B with TRL on GKE

Gemma is a family of lightweight, state-of-the-art open models built from the same research and technology used to create the Gemini models, developed by Google DeepMind and other teams across Google. TRL is a full stack library to fine-tune and align Large Language Models (LLMs) developed by Hugging Face. And, Google Kubernetes Engine (GKE) is a fully-managed Kubernetes service in Google Cloud that can be used to deploy and operate containerized applications at scale using GCP's infrastructure. This post explains how to full fine-tune Gemma 2B with TRL via Supervised Fine-Tuning (SFT) in a multi-GPU setting on a GKE Cluster.
Expand Down
5 changes: 5 additions & 0 deletions examples/gke/trl-lora-fine-tuning/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
---
title: Fine-tune Mistral 7B v0.3 with TRL on GKE
type: training
---

# Fine-tune Mistral 7B v0.3 with TRL on GKE

Mistral is a family of models with varying sizes, created by the Mistral AI team; the Mistral 7B v0.3 LLM is a Mistral 7B v0.2 with extended vocabulary. TRL is a full stack library to fine-tune and align Large Language Models (LLMs) developed by Hugging Face. And, Google Kubernetes Engine (GKE) is a fully-managed Kubernetes service in Google Cloud that can be used to deploy and operate containerized applications at scale using GCP's infrastructure. This post explains how to fine-tune Mistral 7B v0.3 with TRL via Supervised Fine-Tuning (SFT) and Low-Rank Adaptation (LoRA) in a single GPU on a GKE Cluster.
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!-- ---\ntitle: Deploy BERT Models with PyTorch Inference on Vertex AI\ntype: inference\n--- -->"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,15 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!-- ---\n",
"title: Deploy Embedding Models with TEI on Vertex AI\n",
"type: inference\n",
"--- -->"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,15 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!-- ---\n",
"title: Deploy FLUX with Hugging Face PyTorch DLCs for Inference on Vertex AI\n",
"type: inference\n",
"--- -->"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,15 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!-- ---\n",
"title: Deploy Gemma 7B from GCS with TGI on Vertex AI\n",
"type: inference\n",
"--- -->"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,15 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!-- ---\n",
"title: Deploy Gemma 7B with TGI on Vertex AI\n",
"type: inference\n",
"--- -->"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,16 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "d7c432cf-dc16-4bd8-89bd-7c1c0eb58d37",
"metadata": {},
"source": [
"<!-- ---\n",
"title: Deploy Meta Llama 3.1 405B on Vertex AI with Hugging Face DLCs\n",
"type: inference\n",
"--- -->"
]
},
{
"cell_type": "markdown",
"id": "e4e7faed-c34a-4f01-84ec-eefbfb65506d",
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,15 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!-- ---\n",
"title: Fine-tune LLMs using SFT with TRL's CLI on Vertex AI\n",
"type: training\n",
"--- -->"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,15 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"<!-- ---\n",
"title: Fine-tune LLMs using SFT + LoRA with TRL's CLI on Vertex AI\n",
"type: training\n",
"--- -->"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down

0 comments on commit 0fc35ea

Please sign in to comment.