Skip to content

Commit

Permalink
Merge pull request #214 from aws-samples/smph-fix-dcgm-exporter-gpu-util
Browse files Browse the repository at this point in the history
Bump dcgm exporter version to correctly capture GPU utilization
  • Loading branch information
mhuguesaws authored Mar 18, 2024
2 parents e40523d + 5a331d9 commit 20bc096
Showing 1 changed file with 3 additions and 3 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,18 @@
if nvidia-smi; then
echo "NVIDIA GPU found. Proceeding with script..."
# Set DCGM Exporter version
DCGM_EXPORTER_VERSION=2.1.4-2.3.1
DCGM_EXPORTER_VERSION=3.3.5-3.4.0-ubuntu22.04

# Run the DCGM Exporter Docker container
sudo docker run -d --rm \
--gpus all \
--net host \
--cap-add SYS_ADMIN \
nvcr.io/nvidia/k8s/dcgm-exporter:${DCGM_EXPORTER_VERSION}-ubuntu20.04 \
nvcr.io/nvidia/k8s/dcgm-exporter:${DCGM_EXPORTER_VERSION} \
-f /etc/dcgm-exporter/dcp-metrics-included.csv || { echo "Failed to run DCGM Exporter Docker container"; exit 1; }

echo "Running DCGM exporter in a Docker container on port 9400..."
else
echo "NVIDIA GPU not found. DCGM Exporter was not installed. If this is controller node, you can safelly ignore this warning. Exiting gracefully..."
exit 0
fi
fi

0 comments on commit 20bc096

Please sign in to comment.