From 3ece88d2a880d963708784adbe4733c3bd04024c Mon Sep 17 00:00:00 2001 From: golechwierowicz Date: Mon, 27 Nov 2023 15:45:35 +0000 Subject: [PATCH 1/3] Add benchmark noise reducing info. Add info about knobs making benchmarks more stable across different runs. --- benchmarks/README.md | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/benchmarks/README.md b/benchmarks/README.md index 51ab754400b..0665ae6ccb3 100644 --- a/benchmarks/README.md +++ b/benchmarks/README.md @@ -5,6 +5,27 @@ The two main benchmarking scripts are - `result_analyzer.py` to aggregate the benchmark result in CSV form. +## Reducing benchmark noise + +It is important to keep the benchmark runs safe from external effects +to reduce noise. Run: + +``` +# Sets the CPU statically to the highest tuneable frequency. +# Prevent energy saving features to kick in. +sudo cpupower frequency-set --governor performance + +# Lock GPU clocks to lower frequency to reduce the chance of extra throttling. Choose +# FREQ based on your GPU info. For example A100 operates on 765MHz (up to 1410 MHz), +# with memory operating on 1215MHz. Setting the clock a couple hundrend MHz below +# will most likely prevent thermal effects. +FREQ=... +nvidia-smi --lock-gpu-clocks=$FREQ,$FREQ + +# Disable autoboost selecting clock rate based on thermal, and power budget effects. +CUDA_AUTO_BOOST=0 +``` + ## Experiment runner Run the `experiment_runner.py` from the `pytorch` directory, which should be the From 83de850efe7680cfb8d48b207f0894101fce4496 Mon Sep 17 00:00:00 2001 From: golechwierowicz Date: Tue, 28 Nov 2023 08:02:09 +0000 Subject: [PATCH 2/3] Add more general info about setting clock freq. --- benchmarks/README.md | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/benchmarks/README.md b/benchmarks/README.md index 0665ae6ccb3..d616b9ae6d0 100644 --- a/benchmarks/README.md +++ b/benchmarks/README.md @@ -15,10 +15,11 @@ to reduce noise. Run: # Prevent energy saving features to kick in. sudo cpupower frequency-set --governor performance -# Lock GPU clocks to lower frequency to reduce the chance of extra throttling. Choose -# FREQ based on your GPU info. For example A100 operates on 765MHz (up to 1410 MHz), -# with memory operating on 1215MHz. Setting the clock a couple hundrend MHz below -# will most likely prevent thermal effects. +# Lock GPU clocks to lower frequency to reduce the chance of thermal throttling. Choose +# FREQ based on your GPU info. To find out clock frequency on your device run: +# `nvidia-smi -q -d CLOCK`, and look for Graphics/SM in Max Clocks section. +# Setting the clock a couple hundrend MHz below, or ~80% of max +# will most likely prevent thermal throttling effects. FREQ=... nvidia-smi --lock-gpu-clocks=$FREQ,$FREQ From 3b0725e56d2fff46f387ea0bc05ea76cec7ab2c8 Mon Sep 17 00:00:00 2001 From: golechwierowicz Date: Wed, 29 Nov 2023 13:45:54 +0000 Subject: [PATCH 3/3] Move comments out of the code --- benchmarks/README.md | 33 ++++++++++++++++----------------- 1 file changed, 16 insertions(+), 17 deletions(-) diff --git a/benchmarks/README.md b/benchmarks/README.md index d616b9ae6d0..8cf556aefa7 100644 --- a/benchmarks/README.md +++ b/benchmarks/README.md @@ -8,24 +8,23 @@ The two main benchmarking scripts are ## Reducing benchmark noise It is important to keep the benchmark runs safe from external effects -to reduce noise. Run: +to reduce noise. Do the following: -``` -# Sets the CPU statically to the highest tuneable frequency. -# Prevent energy saving features to kick in. -sudo cpupower frequency-set --governor performance - -# Lock GPU clocks to lower frequency to reduce the chance of thermal throttling. Choose -# FREQ based on your GPU info. To find out clock frequency on your device run: -# `nvidia-smi -q -d CLOCK`, and look for Graphics/SM in Max Clocks section. -# Setting the clock a couple hundrend MHz below, or ~80% of max -# will most likely prevent thermal throttling effects. -FREQ=... -nvidia-smi --lock-gpu-clocks=$FREQ,$FREQ - -# Disable autoboost selecting clock rate based on thermal, and power budget effects. -CUDA_AUTO_BOOST=0 -``` +Sets the CPU statically to the highest tuneable frequency. +Prevent energy saving features to kick in. + +```sudo cpupower frequency-set --governor performance``` + +Lock GPU clocks to lower frequency to reduce the chance of thermal throttling. Choose +FREQ based on your GPU info. To find out clock frequency on your device run: +`nvidia-smi -q -d CLOCK`, and look for Graphics/SM in Max Clocks section. +Setting the clock a couple hundrend MHz below, or ~80% of max +will most likely prevent thermal throttling effects. + +```FREQ=... nvidia-smi --lock-gpu-clocks=$FREQ,$FREQ``` + +Disable autoboost selecting clock rate based on thermal, and power budget effects. +```CUDA_AUTO_BOOST=0``` ## Experiment runner