Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stop slandering H100s #992

Merged
merged 1 commit into from
Nov 22, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion 06_gpu_and_ml/llm-serving/trtllm_llama.py
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,7 @@ def download_model():
# NVIDIA's Ada Lovelace/Hopper chips, like the 4090, L40S, and H100,
# are capable of native calculations in 8bit floating point numbers, so we choose that as our quantization format (`qformat`).
# These GPUs are capable of twice as many floating point operations per second in 8bit as in 16bit --
# about a trillion per second on an H100.
# about two quadrillion per second on an H100 SXM.

N_GPUS = 1 # Heads up: this example has not yet been tested with multiple GPUs
GPU_CONFIG = modal.gpu.H100(count=N_GPUS)
Expand Down
Loading