Reproducible benchmark recipes for GPUs

Welcome to the reproducible benchmark recipes repository for GPUs! This repository contains recipes for reproducing training and serving benchmarks for large machine learning models using GPUs on Google Cloud.

Overview

Identify your requirements: Determine the model, GPU type, workload, framework, and orchestrator you are interested in.
Select a recipe: Based on your requirements use the Benchmark support matrix to find a recipe that meets your needs.
Follow the recipe: each recipe will provide you with procedures to complete the following tasks:
- Prepare your environment
- Run the benchmark
- Analyze the benchmarks results. This includes not just the results but detailed logs for further analysis

Benchmarks support matrix

Training benchmarks

Models	GPU Machine Type	Framework	Workload Type	Orchestrator	Link to the recipe
GPT3-175B	A3 Mega (NVIDIA H100)	NeMo	Pre-training	GKE	Link
Llama-3-70B	A3 Mega (NVIDIA H100)	NeMo	Pre-training	GKE	Link
Llama-3.1-70B	A3 Mega (NVIDIA H100)	NeMo	Pre-training	GKE	Link
Llama-3.1-70B	A3 Ultra (NVIDIA H200)	MaxText	Pre-training	GKE	Link
Mixtral-8-7B	A3 Mega (NVIDIA H100)	NeMo	Pre-training	GKE	Link
Mixtral-8-7B	A3 Ultra (NVIDIA H200)	NeMo	Pre-training	GKE	Link

Repository structure

training/: Contains recipes to reproduce training benchmarks with GPUs.
src/: Contains shared dependencies required to run benchmarks, such as Docker and Helm charts.
docs/: Contains supporting documentation for the recipes, such as explanation of benchmark methodologies or configurations.

Getting help

If you have any questions or if you found any problems with this repository, please report through GitHub issues.

Disclaimer

This is not an officially supported Google product. The code in this repository is for demonstrative purposes only.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Reproducible benchmark recipes for GPUs

Overview

Benchmarks support matrix

Training benchmarks

Repository structure

Getting help

Disclaimer

Files

README.md

Latest commit

History

README.md

File metadata and controls

Reproducible benchmark recipes for GPUs

Overview

Benchmarks support matrix

Training benchmarks

Repository structure

Getting help

Disclaimer