A short post discussing the installation of Nvidia Datacenter GPU Manager on Amazon Linux 2023. I recently had to figure this out using sketchy documentation, so I'm hoping this helps some folks out there doing similar head scratching.
NOTE: This blog post does not include all steps to install Nvidia drivers. I assume you already have driver package(s) required for your application installed.
What is the Datacenter GPU Manager
Nvidia Datacenter GPU Manager is a suite of tools used for managing and monitoring Nvidia GPUs in the data center.
In our environment we use Nvidia Datacenter GPU Manager as a prerequisite for the DCGM Prometheus metrics exporter used to send metrics to Datadog, named dcgm-exporter. I won't go into detail on installing dcgm-exporter
, the instructions in the Github README are fairly straightforward to follow.
Amazon Linux 2 End Of Life
The Amazon Linux 2 (AL2) EOL date is 6/30/2025 (June 30, 2025). The replacement is Amazon Linux 2023 (AL2023) [3].
Installing DCGM on AL2023
Installing Nvidia Datacenter GPU Manager on AL2 was documented, but applying that documentation to AL2023 wasn't clear what value to use for the <distro>
[4].
In order to understand what distribution Amazon Linux 2023 most closely relates to, you have to look in the Amazon Linux 2023 User Guide. It describes that AL2023 is "sourced from multiple versions of Fedora...including CentOS 9."
That means when following the Nvidia installation instructions, they suggest enabling the repository with this command:
sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/<distro>/x86_64/cuda-<distro>.repo
Because AL2023 is derived from Fedora/CentOS 9, the <distro>
parameter is going to be rhel9
. So our command to install the repository instead becomes:
sudo dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo
Now you can run the install command for the datacenter-gpu-manager
package:
sudo dnf install -y datacenter-gpu-manager
Conclusion
The documentation was not clear which distribution should be used for AL2023. Replace <distro>
with rhel9
in the repo installation command and we're now able to install the datacenter-gpu-manager
package.
Cover photo by Christian Wiediger on Unsplash
Top comments (0)